A lightweight auditable config system

Any complex and flexible algorithm reacts to configs to customize software behavior. In this article we look at different types of config and when to use each. We also touch on a light weight auditable runtime configuration system that has been in production for several years configuring millions of web requests per day.

Static config

The simplest and most predictable configs are the ones that are statically baked into the code. These are usually easy to unit test because they don’t change after the software is deployed.

Static configs provide great predictability. For example in the case of an incident, one can one can deploy a the last known good commit with the guarantee that the last known good configs will be uses as well. This reduces the mean time to resolution (MTTR).

Whenever possible, we recommend this type of config, but they have their limitations.

Deployment config

Deployment configs work best for variables that are tightly coupled to the concept of a deployment. For example:

The 12 factor app recommends using environment variables for everything that is likely to vary between deployments.

Deployment configs provide great reliability. For example one can point the software to a different database that is used for testing purposes without having to rebuild the software. Testing the service against a sample load can reduce mean time between failures (MTBF).

Runtime config

The business logic may need to change on the fly without having to rebuild and redeploy the software. This can include:

There are many ways to configure the software at runtime. For example:

Runtime configs provide great flexibility at the expense of predictability and reliability. It is best to minimize their usage and resort to static or deploy configs whenever possible.

For example if a feature flag is set for all requests, it’s better to convert it to static config. Some tools make it easy to spot such redundant configs while others require manually digging in the code.

A lightweight implementation

Although configurability can increase software complexity, the config system is not a very complex piece of software.

Let’s say we have a few microservices all relying on a common config. They need to fetch the config at runtime with the following requirements:

Tech Stack

JSON is a common format that works across architectures. It is safe to assume that our microservices will be able to fetch and parse configs in this format.

Git is the most common version control system, which is familiar to most developers who build services and set those configs. If the configs are a bunch of JSON files, it is easy to keep track of who changed what and when. The config can be put in a git repo and all changes can come in the form of a PR.

AWS is one of the most popular cloud platforms and Simple Storage Storage is one of their oldest and most reliable services. It is a global service that works across regions and can be configured to expose a simple web server exposing JSON files. ETAG is supported out of the box which allows the clients (those microservices) to save resources by only fetching and parsing the config when it is actually changed. With WAF one can control which services have access to the configs.

Architecture

At a high level the architecture looks like this:

A rough sketch of the high level architecture

The workflow for changing a config is:

In our experience from the moment the config is merged to master till it is available in production takes less than 1 minute and if the microservices poll for an ETAG change every 3 minutes, the maximum time it takes for a config to be “live” is 4 minutes. Obviously the poll interval can be reduced or even a SQS queue can be used together with SNS to notify the consumers of the change, which dramatically reduces the time it takes for the config to be “live”.

Practical tips

Conclusion

In general it is best to reduce the runtime configuration because it is hard (if not impossible) to guarantee correctness for all permutations of config.

If the config is growing wild it is usually a symptom of organizational issues. One can hardly solve organization issues with technical solutions. An experienced PM who is good with stakeholder management should be able to shield the team from unnecessary flexibility. It is PM’s job to settle conflicting requirements and distill the implementation requirements. Keep the configs limited to what moves the business metric needle and justify it against the cost of flexibility.

Knowledge Worker, MSc Systems Engineering, Tech Lead, Web Developer