Imagining better flaky test management

30 June 2019

If we imagined better tooling around managing flaky tests what could we come up with? One thing I have noticed in working with polyglot that often come with docker, is that while each language/framework has its own tools for testing, there are some recurring issues around flaky tests that are common in all. If it is a common problem, maybe one day there will be tooling around it that is as common as run of the mill unit testing tools.

There are costs of flaky test, some are easy to measure:

Direct cost of engineering time, every rerun is 30 minutes you are paying an engineer to stare at a screen.
Indirect cost of risk, engineers start ignoring warning signals that could be real issues, this could mean ignoring a warning sign that a plan is about to crash
Indirect cost of tech debt, a flaky test suite is much more likely to become derelict and be discarded

The upsides are hard to measure, but part of building high velocity teams:

Culture of quality, a really great developer experience around testing makes it that much easier to build in quality
Faster time to ship, reducing the barriers to getting code to production is how we get value to customers hands

If we take the approach that everyone wants to write awesome tests, but there are barriers that stop us, what are some that come to mind?

Hard to collect data
Hard to identify the root cause
Incorrectly ‘resolving’ (e.g. its been working this week)
Quarantining is hard/becomes permanent

These are some of the features that a solution could have, that seem achievable:

Hard to collect data
- Groups data from flaky failures
- Collects data from all branches
Hard to identify the root cause
- Automatically diagnoses some issues (e.g. order dependance)
Incorrectly resolve
- Tracks issues over time
- Assign flaky tests to engineers
Quarantining
- Automatically remove from quarantine

If some of these might be interesting to you check out this website: http://testrecall.com

Do you have any tools or patterns for addressing flaky tests?