The overly complex website security

Introduction to the website infrastruction

It is unarguable that the web is the most proliferate field in computer science. No matter what technology will come in the future, the web is sure to stick around. The web provide us with convenient applications, so called website, that changes how we work and entertain. However, the design of the web was flawed. Throughout the years, the industry and the academic came up with plenty of solutions to mitigate the flawed design of web. In this post, I try to introduce the overly complex website security. I discuss how websites are made secured or insecured and what can we do to protect our websites.

Static websites and HTML#

A website is built using the HTML, a descriptive language instructing the browser how to draw objects. These objects can be shapes, or texts. HTML was designed relatively easy to understand, but their syntax is utterly hard to parse correctly. For context, HTML are constructed by nest-able tags. A tag comes as a pair of form <tag>body</tag>, the pair must have the same text inside, while the ending of a tag must starts with </. The body inside is usually texts or nested tag or list of nested tags.

It seems clean and easy to read, but there could be some inconsistency if not done correctly. Take the below example, how would it make sense?

<a>
<b>
<c>
<d>
</a>
</c>
</b>

HTML content is supposed to be in the form of tree (as in tree data structure). Where the parent node contains multiple children. Inconsistency like the above example code cannot reduce the content into a tree. However, nothing sucks more than having a website not working, that is why browsers allow these inconsistency in the HTML by having some assumptions.

The dynamic web#

The website was static in its early days. The web was merely text files transfer with an interface. The text files are HTML, transfered through network and the browser draws the HTML to the screen. Writing a series of HTML files are inefficient and can only serve available HTML content. So instead of transfering available HTML files, people created programs that can generate HTML files and transfer them. This is the first form of dynamic web, and the rise of PHP with server side rendering.

In another perspective, the web is extended with another Turing-complete language called Javascript. The purpose of this language is to re-draw the HTML or perform complex drawing tasks. jQuery was invented and widely used to modify rendered HTML. Following the website evolution, people started to favor the client rendering technique, where the website is generated by running Javascript on the client side. The web cannot work without data. In the era of client side rendering, data is delivered through the use of APIs. These APIs are designed to be lightweight and fast respond.

In the following sections, we go through the details of both server side rendering and client side rendering.

PHP and the rise of server side rendering

PHP is the first (industrial) form of web server, a program generating HTML files and transfering them. The program logic is very simple, it takes the request submitted by the user and respond a HTML file corresponding to the request. Requests are usually contains a piece of information related to the user to inform the web server who is making the request.

Generating HTML files on the server side is resource heavy. It may not suitable for websites with thousands visits per second without brute-force protection. However, designing/building server side rendering website is very easy. Because the web server requires quick access to all information for efficient generation of HTML files, all resources are stored closely usually at the same server. This results in a simpler design and probably easier to setup a website with server side rendering.

Javascript and the rise of client side rendering

Javascript enables the website to edit itself. The idea for client side rendering is to build the HTML content tree. For generic content, the tree can be built following a certain rule, but for personalized content, things get a little tricky. Because personalized content matches the user, the website must generate pages with user's data. So the Javascript script must either stores all users data or they must fetch the user's data on page rendering. Of course, storing all users data in a small script is impossible and fetching the user's data is the definitive way for client side rendering.

Fetching user data is provided by another service which is commonly known as API server. This service provides additional data for rendering requested by the client side. In the past, there was no "correct" (standard) way for communication between the client and the server to ask for these additional data. Fortunately, we now have a more standard and secured way of communicating called Restful API.

Restful API is not the only way of communication, but it is widely used. The concept is very simple, resources are categorized and accessed through an endpoint for each category. This endpoint then uses HTTP methods (GET, POST, PUT, DELETE, ...) to get or modify the resources.

Requests and Respond#

The whole web is a bunch of requests and responds going forward and back. The client asks for something (request) and the server provides (respond). The concept is simple, but the implementation is complex. Let's examine these in details to understand more about the communication between the client and the server.

This might skip a lot of background materials, but the least we should know is that the communication in a network (Internet included) is done through transmission of bits/bytes. These bytes are usually arranged in a more comprehensible manner, most commonly using ASCII or UTF8.

The most common form of request and respond is HTTP. HTTP is designed as a form of text for ease of inspecting. However, HTTP is not the only form of request and respond. Other forms, we call them protocol from now on, are FTP for file transfering, SSH for remote server connection to name a few. Most protocols outside of HTTP is designed with bytes.

In general, requests and responds are structured so that both side can understand the message. There are many ways to design them. The rule of thumb when designing such protocol is "a request can only be understand in a single meaning". Single meaning so the server can respond correctly to a certain request.

Database#

SQL is the most used kind-of-database. These databases are designed as tables with connections and easily queried by the use of a declarative language. The database is hosted on a server and has ways of connection to allow quering by another process, usually the webserver (backend).

The "upgrade" of SQL is NoSQL, where data are not structured in tables anymore. Because SQL is strictly associtate with tables, NoSQL has to use a different syntax for querying and modifying data in the database.

Recently, another kind of database is on the rise that is GraphQL. GraphQL allows combining multiple databases as a big graph rather than "tables with connections".

Regardless of the database type, they must expose their database through some methods. If the process they talk to are in the same server (same machine), then methods can be used such as IPC (inter-process-communication), file based sockets (file read/write as communication), port sockets (internal internet). But usually databases are hosted at some other places (machine), in these cases, the only way of connection is through network.

Server#

Let say we build a "functional" client-side-rendering website. We need these components:

Backend
Frontend
Database

We start small by having only one server for these three components. It works by having the database running, the backend running, and the frontend hosted through a server. Stop for a bit and talk about frontend hosting.

Frontend hosting

So for dynamic website with client-side-rendering, the frontend is a bunch of HTML, CSS, and Javascript files. They must be hosted to send these files to the client. This can be achieved by setting up a webserver, through one of these ways:

Running a "simple" webserver code

Most programming langauge has a "simple" webserver that can be used to host these files. Anyway, this is the basic feature of a webserver.

Running a "complex" webserver

More generalized webserver can be utilized such as those like NGINX. These webserver allows for more sophisticated setup, but as said previously, the basic form of webserver is file transfering and is supported by most (all) webserver. These webservers allow for location address to have separation between api.website.com and website.com

Embed together with the backend webserver

The backend itself is also a webserver, and the frontend resources can be embedded to be found at a path after /.

Multiple servers

Of course, for high demand applications, one server hosting everything is not optimal. In these cases the resources are scattered in multiple servers (machines). For frontend and (sometimes) backend, the content is usually the same so they can be replicated. But for databases (which the backend uses to query/modify data), scattered data are a big problem. However, we now having the solution which is often referred to as sharding.

Although multiple servers are used, only one server can be used at a time for a request. To know which server should be handling the request, an intermediate server is used. This intermediate server is often called "load balancer", which counts the number of concurrent request on a server and redirects the request to the closest server or a server that is more free.

CDN

Contend Delivery Network, is often used to host static resources over a big network. This big network allows for fast delivery of these static resources.

Between the server and the client#

The backend is often designed as a functional program (no state). Basically, whatever the input (request) is, the output (response) will be reflecting that input only. Following this design, if the request does not include the user data, the server has no way identifying the user.

function process(request) {
    let user = getUser(request);
    let query = getRequestData(request);
    let response = performWebLogic(user, query);
    return response;
}

function webserver(request) {
    let respond = process(request);
    return response;
}

In the past, cookies have been used. A series of characters (usually digits) is given to the user and subsequent requests include this series to identify the user making the request.

However, due to security in implementation, right now the most commonly used method to identify a user is Json Web Token (JWT).

This is a short introduction to a bigger problem of Authentication and Authorization in web applications.

More#

DNS
Browser
Javascript
Web Assembly
Web 3
Cryptography
Authentication and Authorization

UwU. See you again in other posts.