Productionizing Backend Development - Web application housekeeping glossary

Introduction

This page lists the various terms that relate to web application housekeeping tasks. These are one-off or regular tasks run as cron jobs that don't directly contribute to processing of the current request and forming a response. However, they may prepare the application for future business processing or clear stale data from past processing, and so, are essential to the application.

Cron job

See article about cron job at Wikipedia. Cron jobs are periodic tasks that are executed based on a schedule. These can be used to run arbitrary housekeeping tasks on demand or periodically.

Daemon thread

Demon threads, by definitions, are threads that are generally started when the application starts for the first quietly, they continuously run in background and are automatically closed when the application exits, or the thread has no code to run. Repeating for emphasis, note that the daemon thread closes when application exits and not when a request, that created the thread, completes. Since they are long-running threads, so, they can be used to continuously poll for new tasks in some queue and execute any available tasks. At the other side of the queue, the application only needs to push new tasks and they will be asynchornously executed by the daemon thread.

Startup tasks

These refer to tasks that must be executed when an application starts up. Most common tasks are caching data that are known to not change over time but are read multiple times. If such tasks are set, then do ensure that the overall infrastructure is also set up in a manner that the application does not start receiving incoming request until after the startup tasks are complete.

Healthcheck monitoring

A web application may use multiple external services or resources when it is running. The most common is the use of a database. It may also use third party APIs, file storage system, message queues, Kafka queue, etc. It is a good practice to continuously monitor the health status of all these resources to ensure that they are up and running, and are not having a degraded performance. This monitoring task can be set up when the web application starts for the first time. A monitor can also be set up to periodically check the health of the server itself that is hosting the web application. Alerts can be set up to notify related personnel if the healthcheck performance of any of the resources fall below an expected level.

Application monitoring

Application monitoring refers to the various utilities that allow someone to understand if the web application is processing the request in an expected manner, or to reconstruct the commands that were executed the web application before it errored. Since software code is opened for public use only after it has been extensively tested, so it becomes necessary to identify the root cause of any unexpected issues observed in the web application, so that it can soon be remediated. Generally, different type of machine "logs" are used for such monitoring. Note that these are diferent from healthcheck monitors discussed above.

Audit log or Audit trail

See article about audit trail on Wikipedia. These are collection of changes done to database table entries to prove an evidence of modifications done on a record by corresponding user at some time. Although the Wikipedia article identifies it as a security related task, it is more in line with "compliance" and "dispute resolution"; Security is something that must happen much before the code flow comes to point where an audit entry is made. For a business that is outside of the startup phase, it should also define the time period up to which the audit entries are kept otherwise these entries will keep on increasing in an unbounded manner.

Logging

Generating application logs are one of the best ways to understand the state of request processing. Logs can be made with different logs levels ranging from TRACE to FATAL. See these articles at StackOverflow, IBM and at Medium to understand different log levels and when to use them. Generally, a log entry is made when an error occurs, at which time, the corresponding error stacktrace must also be captured and logged. It is also a common and an important practice to only enable INFO and higher level logging in Production environments to prevenet unnecessary slowdowns and to preent creating multiple logs. It is important from security perspective because DEBUG or TRACE logs can let off too much information about inner working of the application, in case an unauthoerized user gets hold of the logs.

Distributed tracing

See articles about distributed tracing at Opentracing, Splunk and Lightstep. More often than not, modern web applications involve a single request getting processed by multiple components, including different microservices, theird party api and UI calls. Distributed tracing refers to methods of observing a request as it propagates through distributed systems. This reveals how a set of services coordinate to handle individual user requests and is extremely useful to identify cause of request failure or even a slowdown as it moves through distributed systems. To this end, OpenTelemetry can be incorporated in the code. OpenTelemetry consists of software and tools that assist in generating and capturing telemetry data from cloud-native software. It provides the libraries, agents, and other components needed to capture "telemetry" from your business application to enable its management and debugging. Specifically, OpenTelemetry captures metrics, distributed traces, resource metadata, and logs and then sends this data to configured backends for processing. It is a relatively new project formed through a merger of the OpenTracing and OpenCensus projects.

Data privacy update

Moderns data laws like the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) allows, among other things, for a user to get all data that an organization has collected and for that data to be cleared. These procesing are important but they can take a long time to complete. This a good example of a processing that can be done in the background.

OpenAPI and Swagger

REST, being an architecture and not a specification, does not require for there be fixed interfaces and URLs, but instead promotes API discovery via HATEOAS. However, this paradigm interferes with a lot of use cases where a client may want to have an advanced knowledge of available REST endpoints which is agreed to be provided as part of a service by the business but without an access to the source code. OpenAPI specifications standardizes the format in which this information can be shared. Swagger is a set of open-source tools built around the OpenAPI specification that can help with the designing, building, documenting and consuming of REST APIs. Thus, for modern web applications exposing service via REST endpoints, it should be seen as almost a requirement to also provide Swagger documents and to keep them up to date.