The Magic Behind Visiting Websites

The Magic Behind Visiting Websites

There were several seasons in our lives when we were struggling to visit some websites. I have the same stuff to deal with. Sometimes I find a specific website that cannot be opened, and I have no idea why it happens.

This is going to be a TL;DR write-up introduction. I am going to talk about my frustration with the internet complexity and my motivation to share this post, brave.


When I was accessing my site: https://alfianfirmansyah.com via mobile browser during the earlier development, sometimes I found it not working at first time, but after refreshing several times, it worked. Even though I am a DevOps professional, It was still making me a bit dizzy to find the root cause, or It indicated that I am a real noob, somehow.

I remember that time when I was trying to change my mobile browsers: Chrome, a native Samsung browser, and some browsers that are all available within the PlayStore I had tried it, but still I wasn't able to access it. I changed my internet data on my phone several times to Telkomsel, XL, Telkomsel, XL, twenty times in a minute. But no, It doesn't seem to indicate that it is accessible. Furthermore, it also happened when I shared it via a LinkedIn post, the thumbnail image was somehow missing, and there was no meta image or any OpenGraph tag in it.

I need to fix this at all costs!, or I might ruin my reputation!

(It definitely sounds frustrating, really, honestly đŸ˜č)

Nonetheless, after researching for almost 2 consecutive days. it turned out there was a problem with my TLS version. I am using a Cloudflare TLS certificate manager and the problem is TLS 1.3 is having a problem with the mobile browser, I had no idea why. After I changed it again to TLS 1.2, and dang.. it worked. According to my gut feeling, my mobile browser version is not supporting the TLS 1.3, and pretty much obsolete at that time, so I need to update the browser. And turns out there was a WordPress guy who had the same problem as I did, but the object is linkedin bot: https://noobient.com/2023/04/05/linkedin-bot-just-lost-tls-1-3-support/

Please don't get intimidated with my complex problem above, even if I had solved many problems related to DevOps, but this encourages and motivates me to go back to boost fundamental knowledges, again. Let's be a noob together with me.

The Internet Runs on Magic

Let's interpret the simplified animation above, Internet has such a complex mechanisms, lots of layers, just like a sandwich đŸ„Ș, but when we type "https://www.google.com" in our browser works seamlessly, how does it happen?

DNS

The magic of internet starts from the DNS. If we run our own server, usually it will only have a public IP address, such as 34.5.6.7. Since we are a lazy species and find it difficult to remember numbers, DNS helps us use easily memorable domain names instead. If we type www.google.com, DNS servers will translate website names into IP addresses, helping our browser find and load it. It is called a DNS resolution. As we know, www is a sub-domain, .google is a domain name, and .com. is a top-level domain name, so-called TLD as its abbreviation.

Well, here's how:

Fig. DNS Resolution Process

Tap here to know the detailed process of DNS

  1. User Enters a Web Address: We type a website address into our browser.
  2. Browser Sends a Query: Our browser asks the network for the IP address of that website.
  3. Recursive DNS Server: The request goes to a DNS server managed by our internet provider. If this server knows the IP address, it sends it back to our browser.
  4. Querying Other Servers: If the DNS server doesn’t know the IP address, it asks other servers in this order:
    • Root Servers: Direct the request to the correct top-level domain server.
    • TLD Servers: Direct the request to the correct authoritative server.
    • Authoritative Servers: Provide the actual IP address.
  5. Caching: The DNS server saves the IP address for future requests by an ISP, so it can respond faster next time.
  6. Error Handling: If the authoritative server can’t find the IP address, an error message is returned.
  7. Quick Process: This entire process happens almost instantly.

Now we got IP address of google.com from DNS server.

TCP/IP

Second, we will focus on TCP/IP, because we already got IP address from the previous DNS resolution, but what on earth TCP/IP actually is?

IP (Internet Protocol)

IP is like the address on the letters, making sure they go to the right places.

TCP (Transmission Control Protocol)

Think of TCP as the mailman who makes sure all our letters and replies get delivered safely and in the right order to our house.

Spongebob is way too excited, though.

Tap here to detailed explanation of TCP/IP

  1. Requesting a Connection: When we type "google.com" in our browser, it asks the server to connect using TCP/IP.
  2. Agreeing to Connect: The server agrees and sends a message back, like a digital handshake. SYN >> SYN-ACK >> ACK packets.
  3. Asking for the Webpage: After the handshake, our browser requests the webpage we want (like Google’s homepage), using TCP to make sure the request is sent and received correctly.
  4. Receiving the Webpage: The server sends back the webpage’s code, also using TCP to make sure it arrives safely.
  5. Putting It All Together: Our browser takes the code and shows we the webpage on our screen. It also uses TCP/IP to get any other things the webpage needs, like images.

Firewall

We agree that we have consent of our own house, as a landlord, there will be set of rules or security in our house with implicit or explicitly visible to all visitors, therefore firewall is simply a security guard in our house. If the request follows the security rules set by the firewall, our browser can access the website.

However, if it doesn’t meet these rules, access is blocked, and our browser cannot reach the site. Firewalls play an essential role in securing our network from unwanted or harmful content. Only the toughest ones can pass, like Spongebob.

Tap here to detailed explanation about firewall

Usually firewall divided to two levels:

  • VPC firewall level: This will have a set of ACL (Access Control Lists) rules which will block the group of IP addresses or spammers.
  • WAF (Web Application Firewall) levels: This will block based on the IP or any custom rules such as path/query string, header, user agent, and associated with its regex or custom matcher.

SSL/TLS encryption

We will likely familiar with http or https, but we will focus on https. Since we visit the "https://www.google.com". HTTPS (Hypertext Transfer Protocol Secure) is just like an extended version of http protocol, but both were using different paradigm in terms of security. HTTPS port is 443, and http is 80, but we can ignore them by default, it won't explicitly added to the https://www.google.com:443 as modern browser set it as default and hide it for simplicity.

Fig. Browser https and http protocol illustration

As I mentioned, the difference between them is the encryption mechanisms. Our payload over HTTPS (message or information) will likely be encrypted while being sent to the server. This means the traffic will only be visible to the client and the server. If our traffic is intercepted by hackers, they wouldn't be able to read it; hence, it is unreadable. Since encrypting and decrypting take time, it will affect overall performance. On the other hand, HTTP doesn't have this mechanism, and this use-case is usually needed to improve performance for internal service communication.

Moreover, sometimes we are confused about the SSL and TLS terms. In fact, they both actually the same. SSL (Secure Sockets Layer) and TLS (Transport Layer Security) are encryption protocols that are used to secure the data transmitted over HTTPS. Both are the same, however, SSL is quite old and TLS is having a newer cryptographic algorithm, different creator and project, but share the same mission. However, SSL is now deprecated.

💡
Taher Elgamal led the development of SSL and released SSL 2.0 publicly in 1995. The purpose of SSL was to keep communication secure over the World Wide Web. After SSL moved through various iterations, Tim Dierks and Christopher Allen created TLS 1.0 in 1999 as the successor to SSL 3.0. (Ref: https://www.ssldragon.com/blog/ssl-vs-tls-certificates/#:~:text=Without%20an%20SSL%2FTLS%20certificate,while%20SSL%20is%20now%20deprecated.)

We ship them by using a certificate. The certificate can be located on different levels, such as Load Balancer level, Reverse Proxy, or Webserver of our application level. Therefore it can be terminated (validated) on a different level as well.

When our browser connects to Google's server via HTTPS, both our browser and Google's server initially agree on the version of SSL/TLS to utilise. Following this agreement, they establish a secure, encrypted channel through which data can be transmitted.

Load Balancer

In real scenario, there are lots of traffic is happening on google.com, it has billion of traffic every single day, or million traffic per second. One server will be impossible to handle, then we need to find a way how to distribute the traffic, so the work is shared evenly among all the servers.

Fig. Load Balancer forwards traffic to the healthy server

If one server dies, the traffic will failover to the healthy server by load balancer, meanwhile the self-healing will be run at that time to the unhealthy server and then it will recover to receive the traffic again from a load balancer instruction or configuration.

0:00
/0:01

Fig. Load Balancer helps server to scale the traffic correctly to a brand-new server

Load Balancer works in a 3-7 OSI layer, meaning it will works on network-transport to application layer, which means it will work from TCP/UDP to HTTP/S. However, remember that HTTP load balancer (L7) has more CPU and memory intensive since there is a routing and any feature computation under-the-hood, such as rate limiter, buffer, data logging etc, meanwhile the (L4) load balancer doesn't have a precise configuration other than forwarding, redirecting and distributing the TCP traffic to the target machine/port.

Web Server and Reverse Proxy

A web server functions like a restaurant host. It handles our order (request for a web page) and ensures we receive the correct dish (web page) in return.

When we want to visit "google.com," Google's server receives the request from the load balancer. The server then assembles the web page we requested, including the HTML, CSS, and JavaScript that make it functional directly.

0:00
/0:05

Fig. Website components

Once the web server has prepared the page, it sends it back to the load balancer. The load balancer then delivers it to our browser, which uses these elements (HTML, CSS, and JavaScript) to display the web page we wanted.

All the modern front-end frameworks such as next.js, react.js, nuxt.js doesn't need a separated web server, as they listen and act as web server right away, so-called application server. For instance, a Node.js native uses express.js to listen the port. However, for the interpreter-based language such as php and python, we need an additional external web server to serve the contents such as apache/httpd and gunicorn.

In essence, a web server is very close to the web application itself as it helps the application to serve the contents to the end user.

How about the reverse proxy?

A Reverse proxy share the same idea, it also have a function or configuration set to serve the web page in html, css, js, or any other mime-types. However, reverse proxies are more sophisticated on caching, compression and often responsible to perform an SSL termination more than Web Server does.

What is a Reverse Proxy Server | Reverse vs. Forward Proxy | CDN Guide |  Imperva
Fig. A Reverse Proxy sits before Web Servers (right)

A reverse proxy is a server that act as a gatekeeper for our server, sitting in front of the web server. Reverse proxy typically has more complex options to implement a security module, such as NGINX which acts as a security layer by masking the identity of the backend servers and protecting them from direct attacks, such as a rate limiter, and other security rules.

Both reverse proxy and web server are like siblings.

Application Server and Database

Let's talk about the intersection between Frontend and Backend in Application server and database.

As the request comes to the application server, it will be processed by the frontend first. To get the dynamic data, the frontend will send a request to the backend, which will query the database for the data.

Fig. CSS Jenga War between frontend and backend

The fetched data format usually depends on the design of the application. It can be plain text/string, an array, JSON (the popular format), or YAML (the less popular format).

Fig. Backend has been successfully debugged a CSS 😃

Then this data will be processed again by the frontend to form a table or any content which comprises of some data to the end-user. Of course we need to beautify the data with CSS, but backend will definitely hate it.

End-user Rendering

When our browser gets a response from the web server, it’s like getting a box with all the parts of a puzzle. Inside are the HTML, CSS, and JavaScript files that make up the webpage.

Here’s what happens next:

  1. Arranging Content: Our browser places text and images where they belong.
  2. Styling Everything: It makes everything look good using CSS.
  3. Making It Interactive: If there’s JavaScript, our browser runs it, enabling buttons and dynamic features.
0:00
/0:05

Fig. A website rendered successfully

After doing all this, we see the complete webpage on our screen. Now we can click links, fill out forms, and interact with the page, just like finishing a puzzle and bringing it to life,

The life of website is a miracle, akin to magic.

Final Thoughts

Finally, this lengthy post must come to an end here.

I personally think that, all of those things are like a magic, with just a fingertip, we can now easily access everything on the internet within milliseconds, it is almost instant. The complexities of the internet can feel like a modern puzzle, as I discovered with my site, https://alfianfirmansyah.com, facing issues ranging from browser compatibility to TLS version complications. This journey highlights the sophisticated processes behind the scenes, from DNS resolution translating domain names into IP addresses, to TCP/IP protocols ensuring reliable data transmission, firewalls securing networks, SSL/TLS encryption safeguarding data exchanges, and load balancers managing traffic efficiently.

As long as we understand most of the internet's layers, we will appreciate the importance of fundamental knowledge and a learning mindset to truly grasp the "magic" of the internet.

I appreciate your visit! Cheers~

References

Understanding TCP/Ip layering through internet flow?
I have gone through below resources , got high level understanding but not able to map the data flow through TCP/IP to data flow through internet in real world ? TCP/IP layering video HTTP vs TC

What Happens When You Type ‘www.google.com’ in Your Browser and Press Enter?
In the vast realm of the internet, one URL stands out as the most universally recognized and visited: “https://www.google.com.” We’ve all

What happens when you type google.com in your browser and press Enter
We take for granted how effortless it feels to open our browser, type a web address like google.com, and press “Enter” to instantly view

What happens when you type google.com in your browser and press Enter?
In the realm of software engineering, understanding the underlying mechanisms that power our daily

Alfian Firmansyah

Alfian Firmansyah

Jakarta, Indonesia