Automated deployment pipelines empower teams to ship better software faster. The best pipelines do more than deploy software; they also ensure the entire system is regression-free. Our deployment pipelines must keep up with the shifting realities in software architecture. Applications are increasingly composed of more moving parts across more complex infrastructures. As a result, they must be thoroughly tested before going into production, and this process should support rather than inhibit speed, reliability, and maintenance efforts.

This post outlines the processes for evaluating where, what, and how to test different aspects of software throughout the deployment pipeline with the ultimate goal of reducing production regressions. Before we dive in, you might also want to watch our on-demand webinar on the Recipe for DevOps Success.

Scaling the Abstraction Ladder to Applications

The deployment pipeline’s goal is simple: ensure regression-free code deploys to production. This goal’s corollary requires proving correctness at the earliest possible stage. Unit tests verify correctness at the class level. Correct units are a prerequisite to integration tests. This follows because a broken unit naturally breaks an integrated system. The same logic applies to independent components (imagine big blocks in your architecture diagram) in a larger application.

Unit testing is useful, but integration tests hold the real power. Integration tests and all their aliases such as “acceptance”, “end-to-end”, or “system”, ultimately assert that the entire system works as expected. Integration tests are not a panacea. They are more difficult to write, execute slower, and are more prone to failure at higher abstraction levels.

The test pyramid encapsulates this idea. Put simply, lower level abstractions are easier and faster to test. Higher level tests are more difficult and the tests run slower. Different aspects of the software may only be tested at different levels in the pyramid. A small utility library may only require unit tests at the bottom level. A SPA (single-page application) written in JavaScript requires integration tests running in a real browser (potentially headless). The pyramid helps relate these concepts to prepare a balanced test portfolio. The pyramid is akin to a deployment pipeline in that way. The goal for each level is to maximize the likeliness of success at the subsequent level. Martin Fowler explains this:

I always argue that high-level tests are there as a second line of test defense. If you get a failure in a high level test, not just do you have a bug in your functional code, you also have a missing or incorrect unit test. Thus I advise that before fixing a bug exposed by a high level test, you should replicate the bug with a unit test. Then the unit test ensures the bug stays dead.

This logic applies to individual systems and larger applications. Thus, the question facing developers is: how do we identify and eliminate regressions at each stage in the deployment pipeline?

The answer lies in the relationship between components, applications, and their exposed interfaces. Applications are composed of one or more components. A SPA has two components: the browser application and the network services it requires. In this sense, the components are “units” and the application is an “integration”, so the individual components must be vetted at their boundary before testing an integrated application. So then, what’s the interface between these components and the application’s user interface? Let’s consider a typical application.

How to Design a Deployment Pipeline for an Application

Consider a typical product architecture. There’s a backend API that’s consumed by one or more web applications and a mobile application. This scenario describes one application with three components. Here’s a top-down outline for this application’s deployment pipeline:

  • Verify the backend, web, and mobile applications are functionality correct in isolation
  • Deploy the backend to a test environment
  • Deploy the web application to a test environment
  • Verify the web and backend applications integrate correctly in the test environment
  • Verify the mobile application functions correctly against the backend running in the test environment
  • If everything works, then promote to the next step in the pipeline

Unpacking the first point reveals the second layer of testing. Verifying each component requires:

  • Its own mix of integration and unit tests
  • Static linting and analysis
  • Tests for behavior in subsequent pipeline stages

Let’s assume the web and mobile applications use HTTP for backend calls. The application’s user interface is a browser or mobile device. Adding these facts to the component’s functional requirements defines the pipeline stage and its associated tests.

Step One: Vet Components

Every component requires its own test portfolio. The hypothetical web service requires tests to hit each endpoint and verify a proper response. This may or may not happen over a network. This process also requires the low-level tests for internal helpers views, models, database queries, and whatever other internals are needed to produce a functioning web service. The choices of stack, language, or framework drive the test suite implementation.

Vetting components also requires static analysis and linting. This can eliminate regressions that may not be caught otherwise. Pay close attention to both static configuration files that may only be parsed at runtime or in a particular stage, and syntax or structural errors which can accidentally slip into these types of files and for which can cause failures much later in the pipeline.

Lastly, there are tests to verify fitness for the next step in the pipeline. This area should cover boot scripts (e.g. does the server script actually start the server on the correct port?), smoke tests (e.g. does the web server start and handle incoming requests), and functionality (e.g. does the configured load balancer health check request work as expected?). Testing regressions at this level can save costly failures much later on in the pipeline. Pay extra close attention to code or behavior that’s not explicitly consumed by the component in question, but is consumed further down the pipeline (such as a load balancer health check request).

Scaling the Ladder to Applications

Now that the pipeline has verified all the components, it’s time to integrate them in a test environment in topological order. The deployment pipeline for the hypothetical application would deploy the backend services and run a test against that environment.

The test pyramid is relevant here. These tests should be kept to a minimum in number and should focus on known regressions or minimum viable functionality. Also bear in mind that tests should capture regressions only. They should be run at this level in the pyramid and because this is the first point at which component code executes on production-like infrastructure, the tests should confirm that each component integrates correctly with the hosting infrastructure. Firing off some curls to key endpoints may be enough for a quick thumbs-up or -down. Those requests ensure the application is at least running correctly in that stage’s infrastructure. Note: this is a great place to use a shared API client library. These tests are not an exhaustive functional suite since that’s already been verified at a component level.

Next, it’s time to integrate the browser and mobile components with the backend running in the test environment. Again, the goal is to test drive the minimum viable functionality and known regressions through the user-facing interface. For the browser application, it may be enough to run tests via Selenium to log in and navigate through the most critical flows. The same approach applies to the mobile application.

A second series of tests may run through flows across the browser and mobile application. This test may log in into the mobile application, create some data, then access it from the web application. However, such a test sits atop the test pyramid so it should only be introduced if there are regressions only testable at this level. This type of test would likely be designed to find data contract violations or inconsistencies that should have been tested at the unit level.

There are always trade-offs. Initially, tests at this level may be kept to a minimum, but the business may decide that other critical flows (such as purchasing a product on an e-commerce site) should be covered across all integrated components. The trade-off is more effort in writing tests and slower pipeline runs, but also in ultimately fewer regressions. Every case is different, but the guiding principle of starting with as few tests as possible and adding tests when production regressions are found is a great starting point.

Future Considerations

Building a robust deployment pipeline is challenging, particularly in setting shared ownership for the automation and tests themselves. DevOps requires shared ownership and a robust pipeline cannot function without it. Developers need to agree on shared tools and approaches before proceeding. Progress will stall without buy-in.

Teams must also consider ways to manage test data. In a local development environment, it’s easy to wipe a database between tests or clear an internal data store. The story isn’t as straightforward in an integration testing environment. Teams need to decide how they’ll load and manage test data for integration tests in later pipeline stages.

Reliability and speed are ongoing considerations. Test flakes are more likely when there are more moving parts, especially across networks, cloud providers, and through GUIs. Flakey tests should be taken seriously and thus repaired and scraped. Keeping them as-is will undermine confidence in the entire process. The same goes for keeping things speedy. Engineers love faster pipelines, so teams must stay vigilant on cutting down times by removing unneeded tests and parallelizing pipeline steps and tests suites.