Engineering capabilities

When I first join or inherit a team, I look at the engineering capabilities that the team and applications have. There are several ways to slice and dice this list, but for the purposes of this article, I’ll talk about what I’ve found to be the best groupings for importance and team maturity.

Level 0 – table stakes

Version control – all code should be in a centralized code repository with versioning
Trunk-based development – simplify development by avoiding long-running environment or release branches
Pull requests and code reviews – all code changes should be made via peer-reviewed pull requests
Automated unit and integration tests – always make time for writing automated tests as they self-document the code’s expected behavior and allow for easier refactoring
Automated tests on pull requests – engineers should still run tests locally but it’s important to have some basic continuous integration that runs all tests and reports pass/fail on the pull request to ensure they are exercised often

Level 1 – intermediate continuous integration

Automated regression tests – automated user functionality ensures you’re not introducing feature regressions; start with the most important happy paths, add to your continuous integration process, and grow from there
Code coverage measured and enforced – implement a mechanism to measure your automated testing code coverage and set the minimum coverage to whatever the current number is to ensure it doesn’t decrease
Code coverage > 95% – continue adding tests to close coverage gaps and raising the minimum coverage until you’re at a great place for each application codebase
Static code analyzer – implement a tool that reviews your code and enforces a community-based style guide and consistent code formatting

Level 2 – continuous delivery

Automated staging deployment – once pull requests are merged, they should be considered safe for release and auto-deploy to a pre-production environment
Automated smoke tests – some tests should be run against the staging environment as a final check before being ready to release (these can often be a subset of the automated regression configured to run against staging)
Automated production build process – if you’re able to perform any of the production release steps without affecting production, you should automate those to occur after a successful smoke test run
Automated production deployment – releases should be able to occur at the push of a button
No-downtime deployments – once all the steps are automated, work towards being able to release at any time of day during the workweek

Level 3 – observability

Infrastructure dashboard – build out dashboards that monitor your ecosystem’s infrastructure; there are lots of tools that do this out of the box these days
Infrastructure alerts – determine what metrics you care about, what thresholds you need to know about, and set up alerts to the proper people so no on has to stare at dashboards
Application dashboard – create dashboards that monitor your API ins and outs as well as custom application events
Application alerts – after watching metrics move or not move for a while, create similar thresholds and alerting for your applications
Centralized logging – being able to triage your application ecosystem is critically important, and putting all your logs in one location makes that exponentially easier; it also makes it easier to use log parsing and security tools

Level 4 – advanced continuous integration

Dependency scanning – assuming your applications have external dependencies, you need an automated way to know if those dependencies need to be updated for security or other reasons
Vulnerability scanning – implement a tool that scans your code for common vulnerabilities (some static code analyzers handle this though specialized tools are often better)

Level 5 – advanced continuous delivery

Dynamic code analyzer – rather than scanning your code itself, dynamic analyzers are hitting your websites and APIs while they are running to catch different types of issues and exploits; this should be done quarterly or annually until it’s been fully integrated into your continuous delivery process
Automated deployment notification – start with a simple chat notification of success/failure and work towards including information about the pull requests included in the release
ChatOps production diff – when applications can release independently and without a schedule, it’s important to know when production is falling behind; create a simple chat command to display what changes aren’t in production yet to make this easier to keep track of
ChatOps release trigger – rather than push-button deployments, the ideal is just a chat command that performs the release (with the proper people-restrictions)
Server Configuration Management – managing a stable and consistent application ecosystem will be much easier with centralized server control and configuration