Shaking out test interdependence

Tags: Software, Python, Testing, Intermediate Python

I’ve been battling bugs recently, this came up and it seemed like a fun exercise in test suite hygiene. I take the view that test hygiene is everyones responsability regardles of the level of the engineer, this is such a simple subject that everyone regardless of their skill level can implement the ideas here.

Flaky tests are the bane of many a developers existence, even worse than this however is test inter-dependencies. Let’s say we’ve got the following code, some function that marks a data structure as being completed.

from datetime import datetime

def notify(id, name):
	...

def mark_complete(data):
	data['complete'] = True
	data['completed_at'] = datetime.utcnow()
	notify(data['id'], data['name'])
	return data

And then an accompanying test;

from uuid import uuid4

DATA = {
	"id": uuid4()
	"name": "potato"
}

def test_mark_complete():
	completed_data = mark_complete(DATA):
	assert completed_data['complete'] is True
	assert completed_data['completed_at'] is not None

The above tests check the functionality fine. We take an existing data structure and test to see if we can insert data into it successfully. The problem here is we’re leaving state in the application, the DATA structure now includes a complete and completed_at key for the rest of the duration of the run. Right now this has no intended side affects, in the future, maybe it will…

Let’s add some more functionality…

from datetime import datetime

def clear_name_and_rerandomise(data):
	data['id'] = uuid4()
	del data['name']
	return data

And a test again…

def test_clear_name_and_rerandomise():
	starting_id = DATA['id']
	completed_data = clear_name_and_rerandomise(DATA):
	assert completed_data['id'] != starting_id
	assert completed_data.get('name') is None

This mutates the data object that is passed in by removing an existing field. Our mark_complete will now no longer work if this is run against the DATA object before. This is a contrived example that wouldn’t really fool anyone, but these things happen in data driven applications where state is stored remotely and application behaviour depends upon the data.

My favourite cheap trick to help identify test interdependence is to randomise the order. In Python this is so simple that you can install a test runner plugin to do it for you, if you’re using pytest then pytest-random-order is probably a good place to get started with this.

If you run the tests above with this plugin installed you’ll find the tests fail about 50% of the time. In this case where you have two tests it’s pretty clear what the problem is, but in a test suite of 1000 tests it might just appear to be another flaky test. You might want to pass in a seed and run the tests twice to validate this is a problem with your data versus a test dependency issue.

# Create a seed.
export RANDOM_SEED=$(shuf -i 0-100000 -n 1)
# Run with the seed.
pytest --random-order-seed=$RANDOM_SEED

This isn’t the most beautiful solution, but it’s better than the alternative which is building an app or test suite with implicit dependencies and having to fix them down the line…