an operation or procedure carried out under controlled conditions in order to discover an unknown effect or law, to test or establish a hypothesis, or to illustrate a known law
Experiments are a great way to try new things. To get the most out of experiments, teams need to apply some rigour to ensure accurate results.
An experiment should aim to answer a question. Do more users buy with call to action button labeled “buy now” or “add to cart”? Are there fewer tickets needing rework with 30 minute or 1 hour discovery sessions? Do we shorten developer cycle times if we require engineers to review pull requests at least twice a day?
Each experiment needs to answer one question. A laundry list of questions suggests the scope is too broad. Pick a question and decide the fastest way to answer it.
The conditions an application runs under are constantly changing. Tightly controlled conditions improve the likelihood of reliable results. Control what you can and document what you can’t, including the impact this may have on results.
Each experiment should be run in isolation. Changing multiple things at the same time makes it very difficult to measure the impact of each change. This may mean your results will be unreliable.
Each experiment should be timeboxed. Some may run for a short period of time, like an A/B that runs for a few days. Changes to team workflows might take a few weeks before the results are clear. An experiment needs enough time to gather enough data to evaluate the results, but it shouldn’t drag on. The team needs to agree on the timeline before the experiment starts.
The most important component of any of experiment is the evaluation criteria. To keep the team honest, this definition needs to be documented before the experiment starts. This ensures the whole team is aligned and everyone can participate in the evaluation.
For an A/B test, the criteria may be as simple as did sales increase by 10%? Workflow based experiments are likely to be assessed against DORA metrics. The definition of success needs to include which metrics will be used when making the decision.
There are three possible decisions the team can make based on the data. If the experiment fails, the team abandons the failing activity. If it passes, the team continues doing it. If the results are inconclusive, the team can elect to extend the experiment, but this should be for a shorter period and should only happen once.
Sometimes useful lessons come out of an experiment even though it fails to pass the definition of success. This is a good opportunity to define a new experiment. Lessons from one experiment can help create a new, better defined experiment.
Properly structured experiments are a powerful tool to help teams and organisations improve. Without some rigour around experiments, it is difficult, if not impossible to determine if they’re successful. This leads to people making decisions based on gut reactions and their own biases. 🌊