Change failure rate
What you define as a change failure can vary from project to project. It can be as broad as a change causing a hard-down incident or as fine as a business metric deviating from its norm. Sleuth allows users to flexibly define what failure means to their projects via deploy verification and impact tracking.
Change failure rate measures the percentage of deployed changes that cause their target environments to end up in a state of failure. Along with MTTR, Change failure rate is a measure of the quality, or stability of your software delivery capability.
"Failure" is defined differently for different organizations (and even within an organization), and Sleuth allows you capture your own unique definition of failure for each project you manage in Sleuth (see Setting up Change failure rate below for additional information on capturing your organization's unique definition of "failure" within Sleuth). At a high level, Sleuth evaluates Change failure rate by evaluating the specific Impact Source integrations you've set up for a given project and then calculates Change failure rate by dividing the number of deploys that were within your change failure sensitivity by the total number of deploys in the period.
For example, if you've setup the PagerDuty integration as an impact source and your team has one incident during the report period that spanned two deploys and you made a total of 20 deploys in that period, your change failure rate will be: 2 / 20 = 10%.
For more on how Sleuth measures Change failure rate and for best practices for determining what failure means to you, check out Sleuth CTO, Don Brown, explaining it in detail in this SleuthTV episode!
Sleuth CTO Don Brown explains how Sleuth measures Change failure rate
- Incidents - any deploy with a status of
Incident- Sleuth provides integrations with PagerDuty, Statuspage, and many more, and we're continuously adding new integrations per customer demand. See Integrations for an up-to-date list of those we currently support.
Sleuth supports feature flags as a first class form of change. Because feature flag changes have just as much power to affect failure as code changes, feature flag changes are included in your change failure rate calculations. Sleuth's deploy verification applies to flag changes in the same way it applies to code deploys.
Every deployment, feature flags included, has an advanced setting that allows you to exclude it from impact collection. If this is enabled, then feature flags will not affect your change failure rate.
Sleuth's Change failure rate is configured and calculated at the Project level, and Sleuth also provides visibility into change failure for individual Teams (i.e. across all projects to which a team has contributed). By default Sleuth considers any deploys marked as
Unhealthyas a failure. You can change the failure level in your project settings. If you would like to count only Incidents as failure, for example, then set the failure level to
Sleuth's deploy verification allows you to integrate error trackers, such as Sentry and Rollbar, metrics trackers, like AWS CloudWatch and Datadog, and incident trackers, like Statuspage and Pagerduty (see Integrations for a full list of currently supported integrations). When Sleuth auto-verifies a deploy as
Unhealthythat deploy is considered a failure. Setting a deploy to
Unhealthymanually will also be considered a failure. Sleuth also supports code deploy rollbacks.
Rolled backdeploys also count as change failure.
When configuring Change failure rate you'll want to determine what failure means to your project. Sleuth is flexible and allows you to define whatever failure criteria works for your projects. Once configured at the project level, change failure rate is also viewable by contributing teams. Just keep in mind that the failure data Sleuth provides is only as good as the data coming in.