HTTP Protocol
Learning objectives
- You know the key principles of the HTTP protocol.
- You can identify key parts of HTTP protocol requests and responses.
- You know about the existence of HTTP/2, HTTP/3, and HTTPS.
We will next look deeper into the HTTP-protocol, which is at the core of the communication between browsers and web-servers.
Structure of a HTTP request
The messages in the HTTP-protocol are text-based. Each message consists of rows that form the request header and a set of optional rows that form the request body. The request body is not mandatory. The end of a message is indicated by two subsequent line breaks.
When sending a request, the first row of the request contains the request method (discussed later), the path of the resource, and the version number of the used protocol. The subsequent rows contain headers, where each row corresponds to one header, represented as a key-value -pair separated by a colon. After headers, there exists an empty line, which is followed by the request body.
request-method /path/to/resource HTTP/version
header1: value1
header2: value2
header3: value3
optional body with content
The request method indicates the method in the HTTP-protocol that is used (e.g. GET
or POST
). The path to resource contains the path to the resource on the server, containing parts of the URI path, query parameters, and anchor (e.g. /news/index.html?limit=10#newest
). The HTTP-version outlines the used version of the HTTP-protocol (e.g. HTTP/1.1
).
The example below outlines a simple request with no headers.
GET / HTTP/1.0
The example contains a GET-request that asks for the data at the root path of the server (/
). The protocol version in the example is HTTP/1.0
.
Note that forming a connection with the server is not a responsibility of the HTTP-protocol. As such, the HTTP-protocol does not contain the address of the server.
The most commonly used HTTP/1.1 -protocol makes it possible to have multiple servers in the same address using virtual machines. In such a case, a single server can in reality contain multiple servers -- in practice, a server responding to requests made to an address can itself contain multiple servers, or it can work as a router, redirecting requests to, e.g., other servers within the network of that server.
Because a single address can contain multiple servers in HTTP/1.1, solely the path to the requested resource (as shown in the above simple example using HTTP/1.0) does not suffice. In case of addresses that contain multiple servers, the requested resource could be related to any server at that address. Due to this, the HTTP/1.1 -protocol demands that requests must also contain the address of the server as a header (Host
).
As an example, the following request asks for the resource at the root of the server myserver.net
, using the HTTP/1.1 -protocol.
GET / HTTP/1.1
Host: myserver.net
In practice, this means that a single IP-address can contain multiple servers, while multiple servers can have the same IP-address.
Structure of a HTTP response
For every request made to the server, the server returns a response. The response is structured as follows. The first row contains the HTTP-protocol version, a HTTP-status code corresponding to the response, and a clarification of the status code. This is followed by a set of headers each on its own line, similar to the HTTP-request. The headers are followed by an empty line and a body of the response, which again similar to the HTTP-request, is not mandatory.
HTTP/version status-code status-code-clarification
header1: value
header2: value
optional body with content
A simple response could be as follows, which essentially states that the request was received and all is well.
HTTP/1.1 200 OK
HTTP status codes
HTTP status codes are numeric representations through which the server indicates whether there were issues when the request was processed and whether there are any (pre-specified) actions that the requester (browser) should take. The most common status code is 200
, which indicates that everything went well.
The status codes can be divided into five high-level categories which are as follows.
- 1**: Information messages (e.g. 100 "Continue")
- 2**: Succesful events (e.g. 200 "OK")
- 3**: Additional actions required from the client (e.g. 301 "Moved Permanently", which often is accompanied by a header that tells the new location, which the client then can retrieve)
- 4**: Error in the request or other issues (e.g. 401 "Not Authorized" and 404 "Not Found")
- 5**: Error on the server e.g. 500 "Internal Server Error")
A list of all HTTP-status codes is found on Wikipedia at https://en.wikipedia.org/wiki/List_of_HTTP_status_codes.
HTTP request methods
HTTP-protocol defines eight request methods, where the most commonly used ones are GET
and POST
. The request methods define restrictions and recommendations to the structure of the message and their processing on the server.
Retrieving information with GET
Request method GET is used for retrieving content. Whenever one writes an address to the browser address bar and presses enter, or clicks a link on a page, the browser will make a GET-request to the address. GET-requests do not require any information in addition to the Host-header required by the HTTP/1.1 -protocol. Possible query parameters are sent to the server as a part of the path after a question mark.
GET /index.html?parameter=value HTTP/1.1
Host: myserver.net
Sending information with POST
POST method is used for sending content. The practical difference between GET and POST methods is that POST requests typically contain content within the body and the headers of a POST request contain information about the format of the content in the body (Content-Type
) as well as the length of the content (Content-Length
).
Query parameters can be sent as a part of the body. As an example, the following snippet outlines a HTTP-request that has form data within the body of the request
POST /index.html HTTP/1.1
Host: myserver.net
Content-Type: application/x-www-form-urlencoded
Content-Length: 15
parameter=value
Other request types
GET and POST requests are most commonly used in the communication between the browsers and servers. A web page, or any associated content of a page such as an image, is practically always retrieved using a GET request. Similarly, information is practically always sent using the POST request.
HTTP-protocol defines also other request types, some of which are the following ones:
- DELETE: asks for removing of a resource
- HEAD: asks for the header information related to a resource, but not the resource itself (can be e.g. used for checking if an already downloaded resource has changed)
- OPTIONS: requests information about the possible options available regarding a request (e.g. could the resource be removed with a request)
Brief notes about HTTP/2 and HTTP/3
Web pages (and web applications) are typically composed of multiple parts. These parts include the main content (e.g. a HTML-document), images, style files, script files, music, video, and so on.
When a web page is retrieved, each part is retrieved separately. In practice, the browser first retrieves the HTML-document, which has links to the other parts, which are then requested one by one. In the HTTP/1.1 -protocol, a new connection is formed for each request -- in practice, retrieving a single web page may actually consist of tens or even hundreds of requests.
The HTTP/2, which is outlined in RFC 7540, attempts to address this issue. One of the improvements is to provide the server the opportunity for sending multiple resources as a part of the response to a request. This reduces the time spent in opening and closing the connections, and also potentially reduces the time that loading a site takes. The HTTP/2 protocol includes also other improvements, most of the improvements are implemented on the servers that run web applications, not web applications per se.
The third version of the HTTP-protocol, HTTP/3 (RFC draft), attempts to improve the functionality of dynamic web sites. For example, while the previous HTTP-protocol versions use the TCP-protocol, which tries to ensure that messages are received, HTTP/3 will potentially work over the UDP-protocol. The UDP protocol, which is often used e.g. in video games, does not try to ensure that messages will be received.
Question not found or loading of the question is still in progress.
HTTPS
HTTPS (HyperText Transfer Protocol Secure) is an extension to HTTP, which aims to make the requests secure. In practice, when making a request using HTTP, anyone who has access to a router that transmits the messages can listen to and record the content of the messages that are being transmitted in clear text. When using HTTPS, a secure connection between the client and the server is established before sending content, leading to a situation where the messages are transmitted in an encrypted format -- even if someone would be able to access a router and record the transmitted messages, they would only have access to encrypted content.
In practice, addresses that have https://
as the prefix use HTTPS, while addresses that have http://
as the prefix use HTTP.
For servers to support HTTPS, they must have a public key certificate. Current browsers support checking the status of the certificate -- for example, in Chrome, there's a lock on the left hand side of the URI. Clicking the lock shows the certificate details.