Introduction to the HTTP and HTTPS Protocol

HTTP protocol is the foundation of the Internet, and HTTPS is the secure version of HTTP. HTTP is an application layer protocol based on TCP/IP protocol. It does not involve packet (packet) transmission, mainly specifying the communication format between the client and the server, and the default port is 80.

HTTP/0.9

In the 1991, HTTP 0.9 was released, there is only one method: GET. When the tcp connection is established, the client sends a GET request to the server, and the server only returns the HTML resource. Then closes the connection.

HTTP/1.0

In the 1996, HTTP 1.0 was released, besides the data parts, every request and response should have a HTTP header. It added the following features:

  • Multiple methods: GET, POST, HEAD
  • Status codes: 100, 200, 300, 400, 500
  • Headers: Content-Type(text/plain, text/html, image/jpeg, video/mp4, etc. You can also define your own), Content-Encoding(Accept-Encoding, you can specify you can accept the encoding format, such as gzip, deflate, etc.), Date, Server, Last-Modified, ETag, Expires, Cache-Control.

But you can only request once in a TCP connection. It will be closed after the request is completed. If you want to request multiple resources, you need to open multiple TCP connections.

HTTP/1.1

persistent connection

In the 1997, HTTP 1.1 was released. It introduced the persistent connection, which means that the TCP connection will not be closed after the request is completed. You can request multiple resources in the same connection. You can establish 6 connections at most to the same domain. When the server and client are idle for a period of time, the connection can be closed actively (The client send Connection: close).

But you can just send one request in the TCP connection at the same time(which is serial).

pipelining

It added the pipelining mechanism, client can send multiple requests without waiting for the response of the previous request, the requests are clarified by Content-Length: the length of request body. But the server still needs to respond in the order of the request. But the server needs to wait for all the operations to be completed before sending the response. Hence introduces the chunked transfer encoding Transfer-Encoding: chunked, which means the server will send the response in chunks. Just as the streaming mode.

Host header and methods

Besides, it added the Host: example.com header, which means the server can distinguish different domains, along with other methods, such as PUTPATCHHEADOPTIONSDELETE.

Cons

The cons is that the response will cause the Head-of-line blocking problem.

HTTP/2

frame

In the 2015, HTTP 2 was released. It is a binary protocol, which means the data is transferred in binary format, which is so-called "frame". It uses the SPDY protocol, which is a protocol for improving the performance of the web, it can implement the multiplexing mechanism, that means the client can send multiple requests in the same TCP connection at the same time. And because the packet is divided into frames, the server can send the response in any order.

stream

A request or response is called as stream . And every frame has a stream identifier , which can be used to identify the stream. The id of client streams are odd, and the id of server streams are even. And the client or server can dismiss the stream by sending RST_STREAM frame. The client can specify the priority of the stream.

header compression

Besides, in order to improve the performance of the web, it added the HPACK compression algorithm, which can compress the HTTP header. And establishs a header table, which can be used to index the header between the client and the server. So they can only send the index in the following requests.

server push

Meanwhile, the server can implement the server push mechanism, which means the server can push the resources to the client before the client requests.

eg. You can see the Network tab in the browser, some headers of requests are provisional headers are shown, which means the server is pushing the resources to the client. Normally, the Size column is from disk cache or from memory cache, which means the resources are from the browser cache. You can also check the Protocol column, it is h2, which means the protocol is HTTP/2.

SSE

Server-Sent Events (SSE) is a technology where a browser receives automatic updates from a server via HTTP connection(which is normally requests by the browser). It is a server push mechanism, which means the server can push the resources to the client without the client requests.

Cons: The SSE is the one-way serial communication, which means the server can only push the resources to the client Web APP.

Cons

  • H2 Server push cons: The server push of h2 can only push the resources that are in the cache, which means you cannot push the message directly to the client Web APP. The Web does not have the api to query the push events. So if you want to implement the message push to the client, you can combine the h2 and SSE.
  • TCP Head-of-Line (HOL) Blocking: HTTP/2.0 experiences delays known as TCP head-of-line blocking, which occurs within the TCP layer. If a single packet in the TCP stream gets lost, all streams using that connection have to wait for it to be retransmitted.
  • Packet Sequence Requirement: Each TCP packet must be received in a specific sequence due to assigned sequence numbers. If any packet is lost, subsequent packets are stalled.
  • TCP Buffer Holding: Lost packets cause subsequent packets to wait in the TCP buffer until the missing packet is retransmitted and received.
  • Impact on HTTP Layer: The HTTP layer, built on top of TCP, does not handle these TCP retransmissions. It only notices a delay when attempting to retrieve data from the socket.
  • Inability to Process Received Data: Even if received packets contain a complete HTTP request or response, they cannot be processed until the lost packet is retrieved.

HTTP/3

UDP

In the 2021, HTTP 3 was released. It is a protocol based on UDP, which means the data is transferred in UDP packets(when specific stream lost packet, other streams are not affected, which will not cause the head-of-line blocking problem).

QUIC

Because the UDP is the unreliable transmission. So it uses the QUIC(Quick UDP Internet Connections) protocol to ensure the reliability. The QUIC is also a protocol that needs three handshakes to establish a connection, the main purpose is to determine the connection ID. And it can implement the multiplexing mechanism. And because the packet is divided into frames, the server can send the response in any order.

Pros

  • The HTTP based on TCP, it use the source IP, source port, destination IP, destination port to determine a TCP connection. So if the IP or port is changed, the connection will be closed. And needs to establish a new connection through the TCP three-way handshakes and the TLS handshake.
  • But the QUIC protocol uses the connection ID to determine the connection. So it can implement the multiplexing mechanism. So even the IP or port is changed, the message has the same connection ID, it can be identified as the same connection.

HTTPS

Now the most browers recommend HTTPS instead of HTTP.

Https is not a new protocol, it is the secure version of HTTP. It is based on the HTTP protocol, and adds a layer of security. The http protocol will cause eavesdropping, tampering, and pretending issues. The SSL/TLS protocol is used to solve these problems: encrypted, verification, and certificates.

The process of https is as follows:

  • the client requests the public key from the server, and uses the public key to encrypt the data.
  • the server uses the private key to decrypt the data after receiving the request.

how to ensure the public key is not tampered?

Use the CA to sign the public key. If the CA is reliable, then the public key is reliable.

how to optimize the performance?

Use the symmetric encryption key(session key) to encrypt the data. And the public key used to encrypt the symmetric encryption key.

The TLS handshake process is as follows (the handshake process is plaintext).

ClientHello

  1. the protocol version, eg TLS 1.0
  2. a random number(Client random) generated by the client, which will be used to generate the "session key"
  3. the supported encryption methods, such as RSA public key encryption
  4. the supported compression methods

ServerHello

  1. Confirm the encrypted communication protocol version used, eg TLS 1.0
  2. a random number(Server random) generated by the server, which will be used to generate the "session key"
  3. confirm the encryption method, such as RSA public key encryption
  4. send the server certificate , and the server's public key in the certificate

If the server needs to confirm the client's identity, it will include an additional request to require the client to provide a "client certificate". For example, financial institutions often only allow authenticated customers to connect to their network, and will provide a USB key to formal customers, which contains a client certificate.

Client Response

verify the server certificate. If the certificate is signed by a trusted CA, or the domain in the certificate is the same as the domain in the request or the certificate is expired, the client will warn the user. If the certificate is signed by a trusted CA, the client will use the public key in the certificate to encrypt the data.

  1. a random number(which is so-called "pre-master secret"). The random number is encrypted by the server's public key, which is used to prevent eavesdropping. (The three random numbers are used to generate the "session key", which maximally guarantees randomness to prevent the key from being eavesdropping).
  2. encoding change notification, which means the subsequent information will be sent using the encryption method and key agreed by both parties.
  3. client handshake end notification, which means the client's handshake phase has ended. This item is also the hash value of all the previous sent content, which is used to verify the server.

Server Response

Decrypt the "pre-master secret" using the server's private key. Use the three random numbers to generate the "session key".

  1. encoding change notification.
  2. server handshake end notification, which means the server's handshake phase has ended. This item is also the hash value of all the previous sent content, which is used to verify the client.

Now the following communication uses the symmetric encryption key(session key) to encrypt the HTTP data.

The asymmetric encryption is only used in the handshake process.

The whole process is as follows:

And due to the asymmetric encryption keys are only used once, so if someone(eg. bank) wants to use the cdn service, but he doesn't want to submit the private key to the cdn provider, then he can keep it in his server, and use it to encrypt and decrypt, and other process will be done by the cdn provider. As you can see, the bank's server only needs to process the step 4.

Eavesdropping

The TLS handshake process is plaintext, so it is easy to be eavesdropped. The eavesdropper can get the encryption method and the two random numbers. So whether the communication is really secure depends on the "pre-master secret" is not cracked.

In theory, as long as the server's public key is long enough, such as 2048 bits, it is impossible to crack the "pre-master secret". But if you pursue the ultra security, you can use the Diffie-Hellman algorithm, which will only need to exchange the DH parameters. And they can calculate the "pre-master secret" together.

Resume the session

If the session is disconnected, then needs to re-handshake. There are two ways to resume the session:

  1. Session ID : When the client reconnects, it will send the session ID to the server, and the server will use the session ID to find the "session key" in the memory. But it still has problem, the session ID is only stored in one server, if the client sends it to other server, the session won't be resumed.
  2. Session Ticket : When the client reconnects, it will send the session ticket(which is sent by the server at the end of the TLS handshake) to the server, and the server will use the ticket key to decrypt the session ticket and get the "session key".

SSL and TLS

In brief, TLS is the successor of SSL. The SSL(Secure Sockets Layer) protocol is deprecated, and nearly does not be used. And the TLS(Transport Layer Security) protocol is the standard. Due to the historical reasons, the TLS certifcate sometimes is called as SSL certificate, but it is not the same.

References

This article is also posted on my blog, feel free to check the latest revision: Introduction to the HTTP and HTTPS Protocol

相关推荐
为你写首诗ge4 小时前
【Unity网络编程知识】FTP学习
网络·unity
@hdd5 小时前
Qt实现HTTP GET/POST/PUT/DELETE请求
qt·http
神经毒素6 小时前
WEB安全--文件上传漏洞--一句话木马的工作方式
网络·安全·web安全·文件上传漏洞
swift开发pk OC开发6 小时前
如何轻松查看安卓手机内存,让手机更流畅
websocket·网络协议·tcp/ip·http·网络安全·https·udp
慵懒学者7 小时前
15 网络编程:三要素(IP地址、端口、协议)、UDP通信实现和TCP通信实现 (黑马Java视频笔记)
java·网络·笔记·tcp/ip·udp
itachi-uchiha7 小时前
关于UDP端口扫描概述
网络·网络协议·udp
liulilittle7 小时前
Linux 高级路由策略控制配置:两个不同路由子网间通信
linux·网络·智能路由器
swift开发pk OC开发8 小时前
flutter框架中文文档,android智能手机编程答案
websocket·网络协议·tcp/ip·http·网络安全·https·udp
RadNIkMan9 小时前
Python学习(二)操作列表
网络·python·学习
HX科技9 小时前
Debian系统_主板四个网口1个配置为WAN,3个配置为LAN
linux·运维·网络·debian