Sleuth Documentation
HomeBlogSupportSign up
  • Getting started
  • Navigating Sleuth
  • DORA metrics
    • Deploy frequency
    • Change lead time
    • Change failure rate
    • MTTR
    • Interpreting Metrics in Sleuth
  • Deployment tracking
    • Organization
      • Labels
      • Trends
      • Compare
      • Search
      • Status
    • Projects
      • Issue trackers
    • Environments
    • Code deployments
      • Creating a deployment
      • How to register a deploy
      • Rollbacks
      • Automatic tagging
      • Deployment locking
      • Environment drift
      • Move code deployments
      • Search everything
    • Feature flags
    • Manual changes
    • Deploys
    • Teams
  • Work in Progress
  • Goals
  • Sleuth Automations
    • Automations Marketplace
      • Installing Automations
        • Installing PR "Update" Automations
      • Editing and uninstalling Automations
      • Smart suggestions
      • Understanding efficacy
    • Custom Automations
      • Automations Cookbook
      • Webhook Actions
      • Trigger Build Actions
        • Bitbucket Pipelines
        • CircleCI
        • Github Actions
        • Jenkins
  • Slack & Email Notifications
  • Auto-verify deploys
    • Anomaly detection
    • Error impact
    • Metric impact
  • Ignoring pull requests
  • Slack mission control
    • Approvals
    • Project notifications
    • Personal notifications
    • Search Sleuth in Slack
    • Project/Deployment history
    • Developer standup
  • Sleuth API
    • Deploy Registration
    • Deploy import
    • Manual Change
    • Custom Incident Impact Registration
    • Custom Metric Impact Registration
    • Deprecation information
    • GraphQL Queries
    • GraphQL Mutations
    • Query batching
  • Integrations
    • About Integrations...
    • Code integrations (read-only)
      • Azure DevOps
      • Bitbucket
      • GitHub
      • GitLab
      • Custom Git
      • Terraform Cloud
    • Code integrations (write)
    • Feature flag integrations
      • LaunchDarkly
    • Impact integrations
      • Error trackers
        • Bugsnag
        • Honeybadger
        • Rollbar
        • Sentry
      • Metric trackers
        • AppDynamics
        • AWS CloudWatch
        • Custom
        • Datadog
        • Jira metrics (Cloud / Data Center)
        • NewRelic
        • SignalFx
      • Incident tracker integrations
        • Blameless
        • PagerDuty
        • Datadog Monitors
        • Statuspage
        • Opsgenie
        • Jira (Cloud/Data Center)
        • FireHydrant
        • Rootly
        • ServiceNow
        • Custom
          • Grafana OnCall
      • CI/CD builds
        • Azure Pipelines
        • Bitbucket Pipelines
        • Buildkite
        • CircleCI
        • GitHub Actions
        • GitLab CI/CD Pipelines
        • Jenkins
    • Sleuth DORA App for Slack
    • Microsoft Teams integration
    • CI/CD integrations
      • Azure Pipelines
      • Bitbucket Pipelines
      • Buildkite
      • CircleCI
      • Github Actions
      • GitLab CI/CD Pipelines
      • Jenkins
    • Issue tracker integrations
      • Jira Cloud
      • Jira Data Center
      • Linear
      • Shortcut
    • Fixing broken integrations
  • Pulse
    • Welcome to Pulse docs
    • Quick Start setup guide
    • Beginner tutorials
      • 1. How to create a Teamspace
      • 2. How to create a Review
      • 3. How to create a Survey
  • Features
    • Reviews
      • Review workflow
      • Review templates
      • Widgets and Sections
        • Widget type
      • Review settings
    • Surveys
      • Survey Workflow
    • Teamspaces
    • Inbox
    • AI assistant
    • General settings
      • Users and Teams
      • Investment mix
  • Settings
    • Organization settings
      • Details
      • Authentication
        • SAML 2.0 Setup
          • Okta Configuration
          • Azure AD Configuration
          • PingIdentity Configuration
      • Access Tokens
      • Members
      • Team Settings
      • Billing
    • Project settings
      • Details
      • Slack settings
      • Environment settings
      • Code deployment settings
      • Feature flag settings
      • Impact settings
    • Account settings
      • Account settings
      • Notifications settings
      • Identities settings
    • Role Based Access Control
  • Resources
    • FAQ
    • Sleuth TV
    • Purchasing
    • About Sleuth...
Powered by GitBook
On this page
  • MTTR breakdowns
  • Feature flags and MTTR
  • Setting up MTTR
  • Further Reading

Was this helpful?

  1. DORA metrics

MTTR

PreviousChange failure rateNextInterpreting Metrics in Sleuth

Last updated 2 years ago

Was this helpful?

Mean time to recovery or MTTR is defined in Sleuth as the time a project spends in a failure state. Along with , MTTR is a measure of the quality, or stability of your software delivery capability.

When Sleuth detects that an Impact Source is failing (e.g. an incident in PagerDuty or an elevated metric in Datadog), it creates a failure period that tracks the details of that failure along with its start and end period. When calculating the MTTR for a date range, Sleuth accounts for all of the failure periods that occurred in that range and produces the average.

For example, if you have three incidents that happened in the date period you're inspecting, one lasting 1 hour, one lasting 2 hours and one lasting 3 hours. Your MTTR will be: (1 + 2 + 3) / 3 = 2 hours.

For more on how Sleuth measures MTTR, check out Sleuth CTO, Don Brown, explaining it in detail in this SleuthTV episode!

MTTR breakdowns

Feature flags and MTTR

Every deployment, feature flags included, has an advanced setting that allows you to exclude it from impact collection. If this is enabled then feature flags will not affect your MTTR.

Setting up MTTR

Further Reading

For a real-world example of how Sleuth helps you measure and drive down your MTTR, let's say you make a deploy that adds 25% to your database CPU. Assume that Sleuth is tracking this impact and determines that the deploy is Unhealthy. Your team has setup in Sleuth, and as a result your mean time to discovery (or MTTD) is basically zero. Your team jumps into action and initiates a rollback which takes 25 minutes to complete. Once your rollback is deployed, Sleuth sees that your database CPU has gone back down to normal and auto-verifies the deploy as Healthy. Your MTTR in this scenario would be 25 minutes, the amount of time it took for your team to return your project to a healthy state.

Sleuth's and dashboards show the total time spent in a failure state in the period. We also provide a detailed breakdown of the time spent in each type of failure. Failure types currently supported in Sleuth are:

Incidents - any deploy with a status of Incident - Sleuth provides integrations with PagerDuty, Statuspage, and many more, and we're continuously adding new integrations per customer demand. See for an up-to-date list of those we currently support.

Rolled back - any code deploys that were

Unhealthy - any configured and that has determined a deploy is Unhealthy

Ailing - any configured and that has determined a deploy is Ailing

Sleuth as a first class form of change. Because feature flag changes have just as much power to affect failure and recovery as code changes feature flag changes are included in your MTTR calculations. Sleuth's applies to flag changes in the same way it applies to code deploys.

Because MTTR is so closely tied to Change failure rate, please see to configure MTTR.

For additional information on how Sleuth calculates and presents MTTR and other DORA metrics throughout its various dashboards and views, see .

slack notifications
Project Metrics
Team Metrics
Integrations
detected to be rolled back
impact sources
deploy verification
impact sources
deploy verification
supports feature flags
deploy verification
Interpreting metrics in Sleuth
Change failure rate
Sleuth CTO Don Brown explains how Sleuth measures MTTR
setting up change failure rate