What happens when you type google.com in your browser and press Enter
So you heard about this school called and you wanted to learn more about it so you do what every smart human being would do..you google it. Or perhaps you already have the web address beforehand and you decide to type google.com into your web browser and after a very short wait….
Simple right? Not exactly.
There are several processes that happen behind the scene after you hit the enter button. These can be divided as follows:
STEP ONE: DNS RESOLUTION
What is DNS resolution?
Every domain name like google.com
has an IP address associated with it. It is easier for humans to remember a string of names than a string of numbers and this is where DNS is an important part of our everyday browsing. The content of any domain name we search for is always hosted on a remote server somewhere and Each web server (and indeed any host connected to the internet) has a unique IP address in textual form, translating it to an IP address (in this case, 13.35.121.36) is a process known as DNS resolution or DNS lookup; here DNS stands for Domain Name Service. During DNS resolution, the program that wishes to perform this translation (in our case, a component of the web crawler) contacts a DNS server that returns the translated IP address. (source)
When we hit enter after tying the domain name we are searching for, the web browsers first checks the cache of the operating system to see if the ip address of the required domain is available. If the IP isn’t found:
- The web browser queries the Resolving Name Server (RNS) for the IP. If the RNS doesn’t know the IP, it proceeds to what is called the ROOT.
- The ROOT also might not know the IP of the domain name but gives the RNS the location for the Top Level Domain(TLD) Name servers for the domain requested. In this case, the TLD required is the
.com
. The RNS will store the information it gets from the ROOT in its cache and proceeds to the TLD to request for the IP required. - If the TLD doesn’t know, it will however use the Domain Name registrars to identify which Authoritative Name Server (ANS) the RNS needs to query in order to find the IP required.
- The ANS is the last stop where the RNS gets the correct IP for the server hosting the domain name required. The RNS will take the IP address back to the browser which also stores it in the Operating systems cache so that the next time you type the a domain name like the example, it doesn’t need to go through all the process stated above again.
Now that the web browser has an address, it still needs to find a way to get to this address first. Which takes us to the next step
STEP TWO: TCP/IP
TCP (Transmission Control Protocol) is a standard that defines how to establish and maintain a network conversation via which application programs can exchange data. TCP works with the Internet Protocol (IP), which defines how computers send packets of data to each other. Together, TCP and IP are the basic rules defining the Internet. (source).
The TCP/IP model can also be defined as a way data package leaves one computer (called the client) and retrieves required data from another computer (called the server). These packages need to have a way to know where they are going and how to get there, and also how to return to its origin safely (technically speaking). Let’s take a deeper dive into the TCP/IP model:
- Application Layer: This is the layer where the web browser interacts with. This layer has different protocols like the HTTP, SMTP, FTP etc. For this write up, we are using the HTTP which is the protocol that is used for visiting websites.
- Transport Layer: This is where the TCP lives, along with USer Datagram Protocal (UDP: which is used for programs like games and videos). The application layer talks to the TCP layer through what is called ports. The port for HTTP is mostly 80. Once TCP gets the data from the application layer, it chops this data up into packets which are set over the internet throught the faster routes via routers. The TCP puts headers on each packet which contains information the receiving browser will use to assemble the packet and ensure the complete data is received.
- Internet Layer: This layer used the intenet protocol to attach the origin and destination IP address.
- Network Layer: These individual packets are then sent over the network layer back to the originating address and the information doesn’t end up on another computer even if they originate from the same network.
Even though STEP TWO above looks like the completion of the cycle, we can take a deeper dive into some of the concepts raised in STEP TWO and some other layers/concepts that are worth considering. These include:
HTTP
Stands for Hyper Text Transport Protocols. As mentioned earlier, it is the protocol responsible for serving website content over the internet. HTTP functions as a request–response protocol in the client–server computing model.
When a client wants to browse a web site, the first thing that happens is that a request is sent to the server known as HTTP message. The server will prepare a response and send it back.
GET /pages/index.html HTTP/1.0HOST: www.google.com
Accept: text/html
Accept-language: en-us
Sample Response
pages/index.htmlHTTP/1.1 200 OK
Cache-Control: max-age=0, private, must-revalidate
Content-Length: 40063
Content-Type: text/html; charset=utf-8
Date: Mon, 06 Aug 2018 03:37:20 GMT
ETag: W/"9bcc557b638e3603a22705356988b51a"
Server: nginx/1.10.2
The biggest challenge with HTTP however is that information sent over this protocol is unsecured. Remember that a request to a web server will pass through different routers before reaching its destination and back, any man in the middle can highjack such information and use it against the owner.
SSL/TLS/HTTPS
Secure Sockets Layer (SSL) is a cryptographic protocol that enables secure communications over the Internet. SSL was originally developed by Netscape and released as SSL 2.0 in 1995. A much improved SSL 3.0 was released in 1996. Current browsers do not support SSL 2.0.
Transport Layer Security (TLS) is the successor to SSL. TLS 1.0 was defined in RFC 2246 in January 1999. The differences between TLS 1.0 and SSL 3.0 were significant enough that they did not interoperate. TLS 1.0 did allow the ability to downgrade the connection to SSL 3.0. TLS 1.1 (RFC 4346, April 2006) and TLS 1.2 (RFC 5246, August 2008) are the later editions in the TLS family. Current browsers support TLS 1.0 by default and may optionally support TLS 1.1 and 1.2.
The Transport Layer Security (TLS) Handshake Protocol is responsible for the authentication and key exchange necessary to establish or resume secure sessions. When establishing a secure session, the Handshake Protocol manages the following: Cipher suite negotiation. A more detailed discussion can be found here
Hypertext Transfer Protocol Secure (HTTPS), or “HTTP Secure,” is an application-specific implementation that is a combination of the Hypertext Transfer Protocol (HTTP) with the SSL/TLS. HTTPS is used to provide encrypted communication with and secure identification of a Web server.
FIREWALL
In computing, a firewall is a network security system that monitors and controls incoming and outgoing network traffic based on predetermined security rules. A firewall typically establishes a barrier between a trusted internal network and untrusted external network, such as the Internet. (sources: Oppliger, Rolf (May 1997). “Internet Security: FIREWALLS and BEYOND”. Communications of the ACM. 40 (5): 94. doi:10.1145/253769.253802., https://en.wikipedia.org/wiki/Firewall_(computing) )
These were designed to increase privacy on the internet when communicating sensitive data over the internet. It was initially reserved for financial transactions but now are more avaiable for other websites.
LOAD BALANCER/WEB SERVERS
Finally, we take a look at what happens on the server side of this request. If there are multiple users trying to access information ongoogle.com at the same time, there could still be potential issues aside from the security portion we have covered so far. The web server on the server side is what is responsible for serving the content requested by the client. Domains expecting high traffic prevent a break down of communication by using multiple web servers so that no one server is overloaded with requests. This works by having a load balancer which will contain the IP address that all clients will query. The load balance in turn retrieves information from one of the servers that serve clients the data.
Web Server vs Application Server[2]
A web server‘s fundamental job is to accept and fulfill requests from clients for static content from a website (HTML pages, files, images, video, and so on). The client is almost always a browser or mobile application and the request takes the form of a Hypertext Transfer Protocol (HTTP) message, as does the web server’s response. Some examples of web server include nginx and apache2.
An application server’s fundamental job is to provide its clients with access to what is commonly called business logic, which generates dynamic content; that is, it’s code that transforms data to provide the specialized functionality offered by a business, service, or application. An application server’s clients are often applications themselves, and can include web servers and other application servers. Communication between the application server and its clients might take the form of HTTP messages, but that is not required as it is for communication between web servers and their clients. Many other protocols are popular, including the variants of CGI.[2] Examples of app servers include PHP app servers.
DATABASE
This writeup is in no way exhaustive but one key part that we haven’t discussed is the database. A database is a data structure that stores organized information. Most databases contain multiple tables, which may each include several different fields. For example, a company database may include tables for products, employees, and financial records. Each of these tables would have different fields that are relevant to the information stored in the table. (source:https://techterms.com/definition/database).
Types of Database Management Systems
There are four structural types of database management systems:
- Hierarchical databases.
- Network databases.
- Relational databases.
- Object-oriented databases
A database server is a server which houses a database application that provides database services to other computer programs or to computers, as defined by the client–servermodel. Examples includes MySQL, a free open source relational database management system that can be found in a lot of web infrastructue set ups. You can learn more about this here.
References: