When I first join or inherit a team, I look at the engineering capabilities that the team and applications have. There are several ways to slice and dice this list, but for the purposes of this article, I’ll talk about what I’ve found to be the best groupings for importance and team maturity.
Level 0 – table stakes
- Version control – all code should be in a centralized code repository with versioning
- Trunk-based development – simplify development by avoiding long-running environment or release branches
- Pull requests and code reviews – all code changes should be made via peer-reviewed pull requests
- Automated unit and integration tests – always make time for writing automated tests as they self-document the code’s expected behavior and allow for easier refactoring
- Automated tests on pull requests – engineers should still run tests locally but it’s important to have some basic continuous integration that runs all tests and reports pass/fail on the pull request to ensure they are exercised often
Level 1 – intermediate continuous integration
- Automated regression tests – automated user functionality ensures you’re not introducing feature regressions; start with the most important happy paths, add to your continuous integration process, and grow from there
- Code coverage measured and enforced – implement a mechanism to measure your automated testing code coverage and set the minimum coverage to whatever the current number is to ensure it doesn’t decrease
- Code coverage > 95% – continue adding tests to close coverage gaps and raising the minimum coverage until you’re at a great place for each application codebase
- Static code analyzer – implement a tool that reviews your code and enforces a community-based style guide and consistent code formatting
Level 2 – continuous delivery
- Automated staging deployment – once pull requests are merged, they should be considered safe for release and auto-deploy to a pre-production environment
- Automated smoke tests – some tests should be run against the staging environment as a final check before being ready to release (these can often be a subset of the automated regression configured to run against staging)
- Automated production build process – if you’re able to perform any of the production release steps without affecting production, you should automate those to occur after a successful smoke test run
- Automated production deployment – releases should be able to occur at the push of a button
- No-downtime deployments – once all the steps are automated, work towards being able to release at any time of day during the workweek
Level 3 – observability
- Infrastructure dashboard – build out dashboards that monitor your ecosystem’s infrastructure; there are lots of tools that do this out of the box these days
- Infrastructure alerts – determine what metrics you care about, what thresholds you need to know about, and set up alerts to the proper people so no on has to stare at dashboards
- Application dashboard – create dashboards that monitor your API ins and outs as well as custom application events
- Application alerts – after watching metrics move or not move for a while, create similar thresholds and alerting for your applications
- Centralized logging – being able to triage your application ecosystem is critically important, and putting all your logs in one location makes that exponentially easier; it also makes it easier to use log parsing and security tools
Level 4 – advanced continuous integration
- Dependency scanning – assuming your applications have external dependencies, you need an automated way to know if those dependencies need to be updated for security or other reasons
- Vulnerability scanning – implement a tool that scans your code for common vulnerabilities (some static code analyzers handle this though specialized tools are often better)
Level 5 – advanced continuous delivery
- Dynamic code analyzer – rather than scanning your code itself, dynamic analyzers are hitting your websites and APIs while they are running to catch different types of issues and exploits; this should be done quarterly or annually until it’s been fully integrated into your continuous delivery process
- Automated deployment notification – start with a simple chat notification of success/failure and work towards including information about the pull requests included in the release
- ChatOps production diff – when applications can release independently and without a schedule, it’s important to know when production is falling behind; create a simple chat command to display what changes aren’t in production yet to make this easier to keep track of
- ChatOps release trigger – rather than push-button deployments, the ideal is just a chat command that performs the release (with the proper people-restrictions)
- Server Configuration Management – managing a stable and consistent application ecosystem will be much easier with centralized server control and configuration