Making sense of HTTPS

Pandula Weerasooriya
12 min readMay 20, 2023

--

Image by Freepik

I’ve tried to understand the HTTPS protocol properly for years. Videos, blog posts, threads, you name it. Yet, it took me such a long time to develop the correct intuition and the mental model for what happens behind the scene of the HTTPS protocol. The simple reason for this attributes to being overwhelmed by numerous unfamiliar concepts all at once, like encryption, certificates, hashes, SSL/TLS, public/private keys, and certificate authorities. Even if you manage to understand each concept individually, it becomes difficult to form a comprehensive mental framework that encompasses all of them.

In this article, I’ve tried to build the intuition methodically by introducing only what is required at each stage. We’ll try to unravel the onion that is HTTPS and PKI (public key infrastructure) once and for all by starting from the innermost layer and working ourselves up towards the outer layers.

First, we need to briefly understand few key ideas.

Encryption

It’s a method of securing information by transforming data into an unreadable format using an algorithm and a secret key. This process makes the data incomprehensible to unauthorized individuals who may gain access to it. Only those with the correct key can decrypt the data and convert it back into its original form. Note that the secret key here is simply a variable length text.

For example, when you encrypt the word “HELLO” with a the key “K3Y12345” and algorithm AES, the resulting data comes as “A2B89C7D1E23F4G6”. Now, only someone who knows what the exact key and algorithm used, will be able to decrypt it back.

Why do we need need Encryption

Let’s visualize the reason why we need encryption. If I try to access an unsafe website that still use HTTP protocol instead of HTTPS, anyone on my network will be able to sniff the data packets that are transmitted to and from, from my device to the internet. Below is an example of sniffing data coming from http://www.example.com using wireshark.

Request payload is fully readable

However, when trying to access websites where the communication is encrypted (HTTPS), both requests and responses will be encrypted. Here’s an example using wireshark for Amazon.

Check how the highlighted bytes are all gibberish text

Hashing

Hashing is a process of converting data into a fixed-size, unique sequence of characters called a hash value or hash code. It is a one-way function, meaning it is easy to compute the hash value from the data, but extremely difficult (nearly impossible) to retrieve the original data from the hash value alone. Hashing is commonly used to verify the integrity of data, securely store passwords, and quickly search and compare large sets of data.

A common example is when you receive a checksum with a downloaded file. The download manager will hash the data on its own and will compare the sent hash with the computed hash. If the two hashes match, it’s a sign that the data has been fully received by the user.

Notice that, one of the differences between hashing and encryption is that encrypted data can be decrypted but hashed values cannot be used to retrieve the original data. We’ll soon see how hashing fits into the HTTPS flow.

Symmetric encryption

The definition of symmetric encryption is as same as the generic encryption that we described earlier, where data can be encrypted/decrypted using a single shared key and an algorithm. So, why call it symmetric?. Well, there is an asymmetric version of encryption which uses a key-pair, which we will be discussing next. But before we do that, let’s check why we need another form of encryption.

If the client(browser) and the server have access to the same key and knowledge of the encryption algorithm, nothing stops them from encrypting their end to end communication using this shared key. However, the million dollar question would be on how to share this symmetric key between the two. Physically sharing the key between the two parties will not be a scalable solution when a server has to deal with millions of unknown users everyday. Therefore, the key exchange needs to happen over the Internet. This is where asymmetric encryption comes in.

Asymmetric encryption

Asymmetric encryption, also known as public-key cryptography, is a cryptographic method that uses two distinct but mathematically related keys: a public key and a private key. The public key is freely distributed and used for encryption, while the private key remains secret and is used for decryption. Below is an example of a private key and its public key pair generated using openssl. Remember that the public key will be generated using the private key when the latter has been generated first.

command: openssl genrsa -aes256 -out private.pem
command: openssl rsa -in private.pem -outform PEM -pubout -out public.pem

The following three points of asymmetric encryption should be ingrained in your memory:

  • Data will be encrypted by one key and will be decrypted by another. For example, a file encrypted by a private key can only be read using its corresponding public key and vise versa.
  • Private keys are expected to be kept confidential and securely stored, akin to being hidden away in a highly protected location reminiscent of Mount Doom. On the other hand, public keys are openly accessible and visible to all, much like being displayed in an agora, allowing anyone to utilize them for encryption or other purposes.
  • You can securely and crypto-graphically verify that a piece of data has been signed by a private key using its public key. Thus, validate the integrity of that data.

Let’s look at the 3rd point more thoroughly. Below is a diagram of a typical signature flow utilizing asymmetric encryption and hashing.

Before transferring the data, the data owner would hash the data using a hash function and subsequently encrypts the resulting hash value using their private key. This encrypted hash is then appended to the transmitted data. The third party who receives this data utilizes the same hashing algorithm to compute a hash on the received data and compares this computed hash with the decrypted version of the received encrypted hash value. Notice how we are using the public key for the decryption. If the hashes do not match, it serves as a clear indication that the data has been tampered with, and its integrity has been compromised.

Now that we have briefly discussed some of the core PKI components. It’s time to look at the foundation of HTTPS, which is the key exchange and the subsequent encryption process.

Key exchange

Here’s a diagram to describe the initial process of HTTPS connection establishment (also known as TLS handshake). Keep in mind that we are omitting a critical piece (certificates) at the moment, but it’s is vital to learn this before delving into the missing piece.

Created using Mermaid.JS

The TLS (Transport Layer Security) handshake is a process that occurs at the beginning of a TLS session between a client (such as a web browser) and a server. It establishes a secure connection and facilitates the exchange of cryptographic information necessary for secure communication. Let’s dissect this diagram step by step.

  1. Client initializes the connection with a hello message.
  2. Server replies a hello back with its public key.
  3. Client generates a new symmetric encryption key and encrypt that new key with the server’s public key.
  4. Client send this newly encrypted key to the server. The server takes this encrypted key and decrypt it using its corresponding private key. Without this private key, even if someone intercepted this encrypted key during transmission, they would not be able to decrypt it.
  5. We now possess a key that was successfully and securely shared between the client and the server. Only those two parties are aware of it throughout the entire world. Thus, this symmetric key will be used for all subsequent communication between the two.

Notice that asymmetric encryption is only used for the key exchange and the actual data sharing will happen over symmetric encryption.

Now that we have completed the key exchange part, can we call it a day and go home? Let’s examine some critical flaws in the preceding sequence and how certificates/certificate-authorities can be used to mitigate them.

Man-in-the-Middle (MITM) Attacks

Imagine an attacker eavesdropping on your initial handshake with a public bank sever (let’s say HSBC bank) and do the below.

  • Intercept the server’s response and send their own public key to the client.
  • The client will have no way of verifying that this public key belongs to the domain (https://www.hsbc.com) they want to access. So, the client will create and send the encrypted symmetric key back to the server.
  • The attacker will then intercept this encrypted key and use their own private key to decrypt, read the data and send back it’s own response.
  • Now you are viable to send critical information to an unknown party all the while believing that you are safe under the guise of HTTPS.

How can we make sure the domain we are accessing is legitimate?. In other words, how can we trust a website/organization? This is when certificate authorities come into play.

Certificate Authorities

Certificate Authorities (CAs) are trusted entities that issue digital certificates to verify the authenticity and integrity of data exchanged over the internet. They play a crucial role in establishing secure communication by confirming the identity of individuals, organizations, or websites through cryptographic techniques. A few examples include GlobalSign, Comodo, DigiCert, GoDaddy, etc.

CAs issue digital certificates to websites or organizations. Keep in mind that, this certificate issuance procedure is a static one-time event. We’ll cover the issuing process in more detail later.

So, what’s included in these mysterious certificates. Here’s the most important fields,

  • Issuer Details — issuer name and country. Issuer in this case is the CA.
  • Subject Details — Subject is the organization that the certificate will be issued. Organization name, DNS name, address, etc.
  • Validity period —Certificates are set to expire after few months or years depending on the type of the certificate.
  • Certificate signature — The certificate will be crypto-graphically signed by the CA using its own private key. The browser has information on CA’s public key, and will use it to verify that the signature and the certificate are valid.
  • Domain’s public key — When the browser verifies the integrity of the certificate, it will extract the domain’s public key from the certificate to do the key exchange that we discussed previously.

The certificate for the website where you are viewing this from, medium.com, is shown below.

You can view the certificate of any HTTPS website through the browser.

Complete TLS exchange

Here’s an improved version of the TLS handshake.

Created using Mermaid.JS

The above diagram is very similar to the previous one except for the change introduce with certificates. Instead of directly sharing the public key, the server will do it via the certificate to establish the trust between the two parties.

Take a moment to understand how certificates prevent man in the middle attacks. Attackers can’t forge new certificates with the domain that you want to access because they wouldn’t be able to sign the certificate without a valid CA private key. If they intercept the request and send you an incorrect certificate for a different domain, browser can quickly catch this and reject the request.

If you’ve read this far, you should have a solid understanding of how HTTPS operates at this point. However, there are still a few small items that require additional coverage for brevity. So, keep reading if you’re interested!.

Root CA and chain of trust

Because customers trust CAs, we have demonstrated that they may trust organizations through certificates. How did clients, though, come to trust CAs in the first place?

If you have checked a digital certificate in any website, you’d see that the certificate has links to more than one CA. Typically the certificate of an organization will be signed by what’s known as an intermediary CA. These intermediary CAs will carry their own certificate and this certificate has an issuer called a Root CA.

In the below image, we can see the certificate of the intermediary CA (Cloudflare’s) certificate is issued by another CA called Baltimore CyberTrust Root CA.

Intermediary CAs certificate

Let’s look at how this Root CA’s certificate looks like.

Root CA’s certificate

Notice how the subject name and the issuer name are identical. indicating that this is a self signed certificate . In other words, this certificate has been signed by the Root CA’s own private key. Now, why should our browser trust this certificate, as anyone can create a certificate on their own and sign it by themselves. The answer is deceptively simple, The Root CA’s public key is pre-installed and trusted by default in web browsers, operating systems, and other software.

Here’s how the above Root CA’s certificate is shown in my Mac device.

In Mac-OS, you can check it via Keychain Access program

So, the browser will first verify the validity of the Root CA using its public key. And then later verify the validity of the intermediate CA also using the Root CA’s public key. A certificate may contain several intermediate CAs, and the chain of trust will be built by working its way up from the Root CA.

Cipher suite

The PKI is vast and contains many nuances and advanced cryptography algorithms. Cipher suite is a way of communicating between the client and server, the combination of encryption algorithms and cryptographic protocols used to secure communication between them. It determines the methods and parameters for key exchange, authentication, encryption, and message integrity.

In simpler terms, a cipher suite is like a toolbox containing different tools (algorithms) that can be used to secure and protect information during transmission. It specifies how data will be encrypted, how the encryption keys will be exchanged, and how the integrity of the data will be verified.

For example, a common cipher suite used in the SSL/TLS protocol might include the following components:

  1. Key Exchange Algorithm: Determines how the client and server will securely exchange encryption keys, such as Diffie-Hellman (DH) or Elliptic Curve Diffie-Hellman (ECDH).
  2. Encryption Algorithm: Specifies the algorithm used to encrypt the data, such as Advanced Encryption Standard (AES) or Triple Data Encryption Standard (3DES).
  3. Message Authentication Code (MAC) Algorithm: Provides integrity checking for the transmitted data, ensuring that it has not been tampered with. Common MAC algorithms include HMAC-SHA256 or HMAC-MD5.
  4. Hash Algorithm: Determines the hashing function used for various purposes, such as certificate signatures or message integrity checks. Examples include SHA-256 or SHA-3.

The specific combination of these algorithms in a cipher suite depends on the security requirements and compatibility of the communicating entities. The chosen cipher suite must be supported by both the client and server to establish a secure and mutually understood way of protecting the data exchanged between them.

Given our newfound understanding of key-exchange, certificates, and cypher suites, the TLS handshake is shown in its entirety here.

The only new parts are when,

  • The client sends a ChangeCipherSpec message to inform the server that subsequent messages will be encrypted.
  • When the server receives the ChangeCipherSpec message, switches to the negotiated encryption parameters, and acknowledges the client’s Finished message.

SSL and TLS

Finally, before wrapping up, what is SSL (Secure Sockets Layer) and TLS (Transport Layer Security)? Are they the same or two different technologies?

Simply put, they are cryptographic protocols used to secure communication over networks and SSL was the predecessor to TLS. Due to SSL’s known security vulnerabilities and weaker encryption algorithms, it is generally recommended to use TLS instead. TLS provides stronger security measures, improved protocol versions, and better compatibility with modern systems and applications.

PKI encompasses everything in the modern internet and it can be tricky to wrap your head around it. I hope you are now equipped with enough knowledge to handle the internet security-related problems that come your way.

--

--

Pandula Weerasooriya
Pandula Weerasooriya

Written by Pandula Weerasooriya

A fullstack engineer who's passionate about building data intensive products and distributed systems. My stack includes Golang, Rust, React, NodeJS and Python.

No responses yet