Change failure rate
What a team defines as change failure is very specific to the team. It can be as broad as a change causing a hard-down incident or as fine as a business metric deviating from its norm. Sleuth allows teams to flexibly define what failure means to them via deploy verification and impact tracking.

Change failure breakdowns

The Sleuth project metrics dashboard shows the total number of deploys that were deemed a failure in the period. We also provide a detailed breakdown of deploys by the type of failure. Failure types currently supported in Sleuth are:

Feature flags and change failure rate

Sleuth supports feature flags as a first class form of change. Because feature flag changes have just as much power to affect failure as code changes feature flag changes are included in your change failure rate calculations. Sleuth's deploy verification applies to flag changes in the same way it applies to code deploys.
Every deployment, feature flags included, has an advanced setting that allows you to exclude it from impact collection. If this is enabled then feature flags will not affect your change failure rate.

Setting up change failure

Sleuth's change failure is calculated at the Project level. By default Sleuth considers any deploys marked as Unhealthy as a failure. You can change the failure level in your project settings. If your team would only like to count Incidents as failure then set the failure level to Incident.
Sleuth's deploy verification allows you to integrate error trackers, such as Sentry and Rollbar, metrics trackers, like AWS CloudWatch and Datadog and incident trackers, like Statuspage and Pagerduty (coming soon). When Sleuth auto-verifies a deploy as Unhealthy that deploy is considered a failure. Setting a deploy to Unhealthy manually will also be considered a failure. Sleuth also supports code deploy rollbacks. Rolled back deploys also count as change failure.
When configuring change failure rate you'll want to determine what failure means to your team. Sleuth is flexible and supports most definitions your team can conceive of. But keep in mind the data you get out about failure is only as good as that that you put in.