What a team defines as change failure is very specific to the team. It can be as broad as a change causing a hard-down incident or as fine as a business metric deviating from its norm. Sleuth allows teams to flexibly define what failure means to them via deploy verification and impact tracking.
The Sleuth project metrics dashboard shows the total number of deploys that were deemed a failure in the period. We also provide a detailed breakdown of deploys by the type of failure. Failure types currently supported in Sleuth are:
Incidents - any deploy with a status of
Incident - integrations with PagerDuty, Statuspage and more are coming soon to automate the discovery of incidents
Rolled back - any code deploys that were detected to be rolled back
Sleuth supports feature flags as a first class form of change. Because feature flag changes have just as much power to affect failure as code changes feature flag changes are included in your change failure rate calculations. Sleuth's deploy verification applies to flag changes in the same way it applies to code deploys.
Sleuth's change failure is calculated at the Project level. By default Sleuth considers any deploys marked as
Unhealthy as a failure. You can change the failure level in your project settings. If your team would only like to count Incidents as failure then set the failure level to
Sleuth's deploy verification allows you to integrate error trackers, such as Sentry and Rollbar, metrics trackers, like AWS CloudWatch and Datadog and incident trackers, like Statuspage and Pagerduty (coming soon). When Sleuth auto-verifies a deploy as
Unhealthy that deploy is considered a failure. Setting a deploy to
Unhealthy manually will also be considered a failure. Sleuth also supports code deploy rollbacks.
Rolled back deploys also count as change failure.