Productionizing Backend Development - Infrastructure glossary

Introduction

This page lists the various terms that relate to infrastructure and other applications needed to support the deployment and hosting of a web application. Since this book focuses on backend development, it may not use all of the terms. Nonetheless, it is important for a backend software developer to be aware of these terms and what they mean.

Server

See article about server on Wikipedia. It is a combination of software and hardware that reads an incoming request and returns a response. The entity that sends a request to a server is called a "client", which is most likely either the web browser or some other software application making a server-to-server web request. A server listens for incoming client request on the server port and creates a dedicated connection between itself and the client, using client address and port (reference: here). This connection is available only to a specific client, and different connections are concurrently made for various client requests. When available, the server sends a response back to the client over the dedicated client connection. Server can be configured to forward any incoming request onto a "bundled" web application which is deployed on a server. This way, the application provides the business specific logic on how to handle the request, while the server just provides a common framework to handle incoming web request and outgoing web response.

Cloud computing

See article about cloud computing on Wikipedia and on Microsoft Azure. Cloud computing a relatively recent development where the cloud provider can "lease" out computers or any other hardware, or computers with an operating system and some pre-installed software, network capacity, etc. A business can now lease these resources, deploy the business-specific web application code over these servers and make it open for public use. The advantage of doing so is that the business is now free from the responsibility of buying, updating and maintaining the server hardware and software, or from starting a new server if some other server goes down. This "leasing" of resources is itself done over the internet, and so the business does not know, nor does it need to know, where the actual servers are physically located. This is why the resources are said to be "in the cloud". There are usually three models of cloud service that can be used: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). The key difference between these depend on how much of the resource management is being done by the business compared to how much is being done by the cloud provider (reference: here, here and here). One way to get an idea of various cloud product that are available and can be used, checkout the cloud-based offering from AWS here

Load balancer

See article about load balancer on Wikipedia, Citrix and Nginx. A load balancer, simply put, balances load of incoming request to different backend servers. The main goal is to have each server get an uniform workload for request processing. This ensures optimal user experience by not having them wait simply becaue the request got routed to a busy server while another server is free. Different routing strategies can be configured to distribute load and to enable various response caching strategies suitable to business needs. Additionally, HTTPS termination can also be done at this layer before sending the request to server. It can also send periodic health checks to backing servers and remove routing traffic to inactive servers.

Content delivery network, or CDN

See articles about content delivery network, or CDN at Wikipedia and CloudFlare. A CDN is a geographically distributed network of servers and that serve static web resources (like, HTML files, style, javascript files, images, etc.) to geographically distributed end users. In doing so, it enables high availability of the data (i.e., ensuring data remains available even if certain servers beomes unavailable for whatever reason) and iimproves speed of fownloading the resource (because it is faster for users to download the content from a CDN server which is physically closer to the user). By offloading the task of serving static web resources which by definition do not change over time, or do so very infrequently, the servers hosting the business application free up additional capacity it can use to handle user request. For this reason, certain web application frameworks directly suggest using CDNs for serving static content, example, Django framework making the suggestion in its docs.

Denial of Service, or DoS attack

See articles about Denial of Service or DoS attack at Wikipedia, this article and at AWS. A denial-of-service, or DoS attack is a cyber-attack in which the attacker seeks to make a web resource unavailable to its intended users by encumbering the server hosting the resource to such a degree that it is unable to respond to valid requests. It is typically accomplished by flooding the targeted machine or resource with superfluous requests in an attempt to overload systems and prevent some or all legitimate requests from being fulfilled. As an example, think of it like driving with a passenger who keeps asking "are we there yet?" - to which you, the driver, needs to respond. However, the passenger's intention is not to get the answer but to keep asking it so frequently that you cannot focus on driving, thereby causing you to stop driving. A distributed denial of service, or DDoS attack is when there are many such systems performing the attack which itself could have been initiated by a single entity. Since the requests in DDoS attack come from different ip-addresses, it is much harder to deal with it compared to DoS attack. Unlike other attacks that use weakness in code or user authentication/authorization (see security glossary), the DoS or DDoS attack seeks to make the entire service unavailable for all users. Combating DDos attack is something that cannot be done within the web application and involves invest in infrastructure components to enable handling such attacks. For this reason, it is discussed under the infrastructure glossary. One way to handle the attack is using load balancers and using criteria to drop requests that are likely spurious, like ones coming from certain ip-address, or physical region, and at a much higher than baseline rate. Mitigation of DDoS attack is another service that are provided by cloud-based providers or CDNs.

Data storage

Since the topic of data storage is big enough, it is discussed separately under the data storage glossary.

Code repository

See article about code repository on Wikipedia, BitBucket and in a blog at HUSPI. As a business evolves, the code for the business application will also change. The simplest way is to think of a code repository is as a database for enterprise codebase, such that every instance of a code change is versioned in chronological order. It promotes collaboration by enabling different team members to simultaneously retrieve the codebase, make updates to it, and then push the updates back to repository without undoing the changes made by other team members. Particularly, note that the version control system can either be distributed (like, Git or Mercurial) or centralized (like, Subversion or CVS). Having worked on both, I personally prefer the distributed system.

Continuous Delivery (or, CD), and Continuous Integration (or, CI)

See article about continuous develivery on ContinuousDelivery, Wikipedia and AWS. For a team to provide continuous delivery of the web application, it must ensure that the code available in repository is always in a state where it can be deployed to the production environment without any issues. An important step needed to achieve this goal is to have the developers merge their code in same code repository and for the code to be free of conflicts or errors, i.e., have continuous integration. See article about continuous integration on Wikipedia, Atlassian and AWS. Within the repository itself, sub-folders could be made to handle parallel development, merging features and identifying code for release. For example, using trunk, branch and tag folders in subversion, or using Git flow, etc. Having automated tests helps in ensuring that the new code is free of unexpected issues and also does not break any existing behavior. Jenkins can be used to build "artifacts" (like, bundled javascript code, or .war files, etc.) with every commit of new code, and verify that the build process works without any failure, and all automated tests pass successfully. Once an artifact is built, it can be deployed in server, from where it is made available for public use. Static analysis, mentioned in the testing glossary, also helps with the CI/CD process, ensuring that erroneous and non-standard new code does not make it to the main codebase.

Virtualization versus Containerization

For articles on virtualization, see Wikipedia, VMWare and OpenSource. To enable, full resource utilization, virtualization creates multiple copies of same/different OS on same hardware. As a side benefit, the system software and all installations can now be copied and easily moved to new hardware. Contrast this to containerization where different applications are deployed on a hardware having same OS, but in separate, independent "containers" that are isolated from each other. For articles on containeriation, see Wikipedia, Docker and Citrix. Docker and Kubernetes (which is a container manager) are two common applications used in containerization. With the development in containerization, it has clawed off some of the functionalities that were previously done via virtualization. However, they both can be used in tandem. See one of the blogs discussing the similarities an differences between the two.

System Monitoring

In the web application housekeeping glossary section on healthcheck monitoring, it is mentioned that various healthcheck information can be collected for various resources used by the web application. This is an example of system monitoring, wherein a vast array of tools are instrumented / configured to monitor system performance. This helps to identify if the servers are performing in an expected manner, i.e. expected levels of CPU usage, network usage, no memory leaks, etc. Commonly used tools are Unix statsd, Graphite, Nagios, Datadog, etc. Monitoring memory usage helps identify if the code has memory leaks. Monitoring network activity is the one of the best way to identify the onset of DDoS attacks.

SLO, SLI and Error budget

See articles on Atlassian about SLO, SLI and Error budget. These are performance indicators to identify if the business application (i.e. the service) is behaving in a manner expected for the business. Once the application makes it to production environment and is available to users, the service level objectives (SLOs) must be identified such that it mirrors business objectives. System monitoring tools can be used to obtain service level indicators (SLIs), i.e. the acual performance level for a service. The difference between SLI and SLO can be used to define error budget, i.e. how much error a service can take before it fails to meet the SLO. Depending on business requirement, there can be multiple SLOs.