Thomas Step

← Blog

Writing Load Tests

My recent work has revolved around writing load tests, and I wanted to take a few minutes to distill my thoughts and learnings about the topic.

Having successfully implemented load tests for multiple workloads at this point I feel like I can say load tests are not strictly necessary but they are very telling. The “not strictly necessary” part comes from the lack of load tests I see surrounding production workloads and the lack of blog posts of this nature. Load tests seem to be one of those tech mysteries that are written about by large tech companies but most non-FAANG companies do not implement them. Probably because they are strictly viewed as overhead by finance. They are not required to test functionality or deploy new code to production, but while they may not tell developers if their feature works properly, they tell developers if their feature works at scale. Just about anyone can write code that works given enough time, but writing code in a way to scales to the point we want it is different. That is where load tests become just as valuable as other types of testing like functional or end-to-end tests

Testing how far to scale is also an important point to dwell on. Do not configure load tests to test where you expect traffic to be in production. Test a system far beyond that and see how it reacts. If getting to production scale is challenging, then on-call engineers are going to be utilized more than they should. Servers should not fall over, databases should not lock up, and end-user latency should still be within acceptable ranges when you hit production load. Each product and workload will be different, but let’s say you can test up to 150% of what you expect in production before your system starts having problems. That is better than 100% and gives breathing room to further optimize before those levels of traffic are hit. The goal is to find limits before they happen in production.

As far as what to test under load, I prefer functional tests and user flows. Make sure repeatedly hitting endpoints does not cause a decrease in latency or affect other parts of the system. Testing user flows is the better of the two because it will test individual endpoints and the functionality between them. If one endpoint updates an entity but that update is not seen in a subsequent GET, then we need to mark a failure and investigate why it happened. Testing user flows is probably more difficult than testing individual functionality though, so under a time crunch, simple functionality load testing would work better than nothing. Either way, I like to try sending requests to the endpoints I am testing in the same ratios that I see in production or expect to see in production. Knowing those ratios would require metrics in production or estimates from leaders; they are likely to be fairly intuitive though.

Another style of testing that can be ramped up to load testing is end-to-end testing (which might look similar to the aforementioned user flows). If load testing a system end-to-end and you do not own all services, make sure that any external services know that you are testing. I have taken down other services because they could not handle the load that I wanted to support. Granted this only happened in testing environments. Doing something like this could turn up as a net positive though if those external services need to optimize and did not know it beforehand, but common practice is to purposefully scale down lower environments to save costs. Just make sure everyone is in the loop and is not surprised by a sudden flood of traffic. If downstream components do not want to handle the load in lower environments, create mocks if possible. Mocks will isolate the test to your service and eliminate bottlenecks that are not caused by your service.

Some useful concepts and tools to utilize while testing and optimizing are profiling and tracing. Profiling can (and probably will) slow down your services. Use profiling to find inefficient code, but do not expect to scale as well as normal whenever a service is being actively profiled. Having a good tracing service in place will provide loads of good information even if profiling is skipped. Most tracing libraries and services can be configured to take samples so that not every request has the added overhead, which makes tracing in production possible as well as during testing. Tracing both service response times and outgoing requests can reveal some of the largest slow-downs. Both tools should be used where needed, but there is too much information for me to go into more depth without exploding the scope of this post. There are many posts online that would love to tell you more about these subjects.

So far, I have mostly talked about running load tests to optimize code, but there is another important piece to the scaling puzzle: infrastructure. Load tests are just as much for infrastructure configuration as they are for code. Look into scale-up and scale-down thresholds to make sure your system is scaling early enough but not poorly utilizing resources. If your system runs in the cloud, make sure that instances are correctly sized. Check that network calls are not slow due to a networking error and make sure you are correctly handling connections. For example, do you close connections in a timely manner or leave them hanging after they are no longer in use? Also, check your database. Turn on proper logging to see if any queries are bogging your database down and need to be optimized. Every aspect of a system can be scrutinized during load tests.

Lastly, make sure you can clean up what you produce during a load test. Lots of database entries and files might be written. If there is an easy way to clean these up, then I suggest doing it or you will soon be paying for storage you might not need. Make sure that extra instances are not still up and running that do not need to be. Better yet, tear down the whole load testing environment if possible to maximize cost savings. I am sure there are other things that I can not think of that we do not want lingering about after we stop a test.

After your load test has been run, document statistics for the run somewhere. Having benchmarks to look back at are important because they can tell us if we introduced unoptimized code or show us how far we have come. Also having some sort of success metric and tolerance around these tests can help run them in an automated sense and produce a success or failure signal as part of a CI/CD pipeline or code health metric.

As I said in the beginning, this is what I have mostly been focused on for the past few months which is why there has been a slowing of code-related blog posts. I hope I have helpfully summarized my learnings, and please let me know if you have further questions.

One more iteration down the line from this, I would be curious to implement a load testing service using serverless infrastructure. I have not heard about this being done before and I wonder how it would do. The hardest part I foresee is tracking requests and responses and continually triggering requests. One option I have thought about is having a central API/service that would track metrics around requests and responses. On startup, that central service could kick off the initial flood of requests by putting messages on an SQS queue that is read by a Lambda. Each Lambda would then perform a test, like a full user flow, and report back to the central service with its metrics. Putting another message to the queue after a test could be managed by either the central service (since it already keeps track of other metrics) or it could be the final step of a worker Lambda (there would be danger here with it being “recursive” in a sense). The cyclical messages would be stopped at the end of a timer or number of requests reached.

Categories: dev | meta | ops