Essential Python Tools #2: Factoryboy
Tags: Python, Software Engineering, Testing, factoryboy, Essential Python Tools, fixtures
This is the first in an ongoing series of essential tools that I use day to day as a python developer which I couldn’t be as productive without. This episode is on factoryboy…
What do you use for fixtures?
Traditionally I built database fixtures one of two ways:
- Manually crafted artisan SQL commands which populated required fields into the database.
- ‘Live’ or environmental data that was dumped from the DB into a SQL file and is imported into the test database at test run startup.
I normally would use method 1 for testing repository or service methods where I needed a specific example which had some required properties, and method 2 for general testing where I was more testing rough results rather than specific outputs.
The problem is, both of these suck. Method one is time consuming, the artisan crafted SQL queries would break frquently after database updates and the second sucked because you constantly had to keep updating this ‘dev’ database file that was tested against AND find somewhere to store it.
In comes factoryboy, a tool that simplifies building pre configured on demand objects while allowing overrides at a call level. Rather than having to hand craft objects for each test, you can create a factory that uses your base class while providing helps to override individual fields.
Let’s build some examples, for now to keep things simple we’re going to ignore the connections to the database and just focus on building objects.
import uuid
from dataclasses import dataclass
@dataclass
class Item:
id: uuid.UUID
name: str
price: float
quantity: int
The above is a representation of an item and it’s inventory metadata. We need to link this up to a factory for creating new instances of it.
import factory
from uuid import uuid4
class ItemFactory(factory.Factory):
class Meta:
model = Item
id = uuid4()
name = "potato"
price = 1.0
quantity = 100
Now we’ve got a factory which is linked to our item through this meta class,
we’ve also set some defaults for all of the fields. The factory takes care
of all the class instantiation and creation. Factoryboy gives us a neat little
API for building instances of this with build
and build_batch
. Overriding
of any of the attributes is available through passing that keyword into the
creation API.
>>>ItemFactory.build()
InventoryItem(
id=UUID('1081b64f-2584-4eb0-83ed-5f9a8a355c74'),
name='potato',
price=1.0,
quantity=100
)
Creates one. Creating a batch of 5 examples however gives us 5 of the same object, really that’s probably not what we want. We can override fields like so.
>>>ItemFactory.build(name='carrot', price=0.5)
InventoryItem(
id=UUID('1081b64f-2584-4eb0-83ed-5f9a8a355c74'),
name='carrot',
price=0.5,
quantity=100
)
Now you want your test data to be representitive of your data and manually setting fields is going to get annoying fast. If your developing a sorting algorithm then you probably want data that’s different, not 5 instances of the same object, that’s not going to be useful for testing your function. In comes faker.
import factory
class ItemFactory(factory.Factory):
class Meta:
model = Item
id = uuid4()
name = factory.Faker('name')
price = 1.0
quantity = 100
And running it again…
>>>ItemFactory.build()
InventoryItem(
id=UUID('1081b64f-2584-4eb0-83ed-5f9a8a355c74'),
name='Ian Gonzalez',
price=1.0,
quantity=100
)
Gives us a human name, not what we want but good enough (if you do want to fix this then there’s [3][faker plugins] out there).
Now if you’re sharp you’ve noticed there’s a bug, our UUID doesn’t update
between invocations. The ItemFactory
we’ve declared will be parsed and the
right hand side of the assignment is evaluated as the module is parsed, the
function is called and the id
value is set for the life of the factory. This
is fine for static values or ones using the inbuilt factory.Faker
callable,
what we want however is a way of passing a callable which is called for every
new piece of data.
In comes LazyFunction
, this takes a callable with no args and assigns whatever
is returned to that value. Let’s update our factory to use that:
import random
import factory
class ItemFactory(factory.Factory):
class Meta:
model = Item
id = factory.LazyFunction(uuid4)
name = factory.Faker('name')
price = 1.0
quantity = factory.LazyFunction(lambda: random.randint(0, 100))
We’re now going to say for every Item
that there’s a random quantity between
0 and 100. This is great, but sometimes we might want to base our outputs
upon some already existing data. In comes LazyAttribute
, this works the same
way except it takes a callable with a single arg, which is the half built item
itself.
Let’s update this and produce a price based upon the name of the item.
class ItemFactory(factory.Factory):
class Meta:
model = Item
id = factory.LazyFunction(uuid4)
name = factory.LazyFunction(lambda: random.choice(['loafers', 'trainers', 'oxfords']))
price = factory.LazyAttribute(lambda x: 1000 if x.name == 'loafers' else 100)
quantity = factory.LazyFunction(lambda: random.randint(0, 100))
Now loafers are a [4][bougie shoe], so we’re now charging 1000 for loafers, and 100 for everything else. What can we say, we’re generous. Between these tools we can setup fairly representive test data and lots of it, our only real limitation is our source dataset1 and the time we want our tests to take. Where as tools like hypothesis can run based upon a period of time, here we have no such luxury as we’re expicitly setting values.
Now in the real world our types aren’t always this simple, thankfully factoryboy
has an answer for tha as well. Subfactories are a way of chaining relationships
together. Given our above ItemFactory
example, we might want to extend it to
include parent orders.
Adding a supplier field to our item.
import uuid
from dataclasses import dataclass
@dataclass
class Supplier:
id: uuid.UUID
name: str
@dataclass
class Item:
id: uuid.UUID
name: str
price: float
quantity: int
supplier: Supplier
Now we can join these onto our existing ItemFactory
.
import random
class SupplierFactory(factory.Factory):
class Meta:
model = Supplier
id = factory.LazyFunction(uuid4)
name = factory.Faker('name')
class ItemFactory(factory.Factory):
class Meta:
model = Item
id = factory.LazyFunction(uuid4)
name = factory.LazyFunction(lambda: random.choice(['loafers', 'trainers', 'oxfords']))
price = factory.LazyAttribute(lambda x: 1000 if x.name == 'loafers' else 100)
quantity = factory.LazyFunction(lambda: random.randint(0, 100))
supplier = factory.SubFactory(SupplierFactory)
And testing it out.
>>>ItemFactory.build()
Item(
id=UUID('ec0a3e57-ff9f-4124-a5e8-e8cbbba23ba5'),
name='loafers',
price=1000,
quantity=20,
supplier=Supplier(
id=UUID('cd1d780c-407e-4047-a8e9-bf9e0564d210'),
name='Lori Reyes'
)
)
Boom. Now it’s important to recognise that supplier here is the parent object
and is created first, with item being the child object. If your relationship
is the other way round then factory.RelatedFactory
is likely what you want,
the ‘relatedfactory’ is created after the main factory. Depending upon the
relationship you’re mapping normally subfactory will get you all you need. If
you’ve got a one to many relationship then you’ll likely want
RelatedFactoryList
, this does what it says on the tin, and generates multiple
objects.
Now say you want to create a Supplier that has the name ‘potato’, you could create a instance of the subfactory, then pass that in as a named argument to be used. Thankfully there’s a handy syntax for shortening this.
>>>ItemFactory.build(supplier__name='Jimbob Jones')
Item(
id=UUID('ec0a3e57-ff9f-4124-a5e8-e8cbbba23ba5'),
name='loafers',
price=1000,
quantity=20,
supplier=Supplier(
id=UUID('cd1d780c-407e-4047-a8e9-bf9e0564d210'),
name='Jimbob Jones'
)
)
This works for multi nested objects, so you can represent fairly complex values quickly.
Now we probably need to integrate our datastore to persist this data, all the
way through this we’ve been using the build
method. There is another method
available, create
doesn’t just build out an object, it also persists it in
the datastore.
from uuid import UUID
from sqlalchemy import create_engine
from sqlalchemy.orm import DeclarativeBase, Mapped, Session, mapped_column
engine = create_engine("sqlite://", future=True)
session = Session(engine)
class Base(DeclarativeBase):
pass
class Supplier(Base):
__tablename__ = "suppliers"
id: Mapped[UUID] = mapped_column(UUID, primary_key=True)
name: Mapped[str]
Now we’ve got an actual persistant store, we can apply this to our data.
class SupplierFactory(factory.alchemy.SQLAlchemyModelFactory):
id = factory.LazyFunction(uuid4)
name = factory.Faker('name')
class Meta:
model = Broker
sqlalchemy_session = session
sqlalchemy_session_persistence = "commit"
Now when we call SupplierFactory.create()
an object with a new id and name
will be created, added to the session
which is tied to the in memory
database, finally the session will be committed once all objects have been
created. This is SUPER handy when you’re using something like celery where a
worker might not share the same database session as your test process.
Finally it’s worth talking about factory.post_generation
which can be used
to perform extra steps once your data is created, a pretty common use case
for this is when you’re storing data in two places. Say you store metadata
in a DB and a binary blob in s3. We can extend our Supplier
object above.
import boto3
s3_client = boto3.client('s3')
class SupplierFactory(factory.alchemy.SQLAlchemyModelFactory):
id = factory.LazyFunction(uuid4)
name = factory.Faker('name')
class Meta:
model = Broker
sqlalchemy_session = session
sqlalchemy_session_persistence = "commit"
@factory.post_generation
def add_contract(obj, create, extracted, **kwargs):
# We only want to add this data on a full create.
if not create:
return
client.put_object(
Body='contract'.encode('utf-8'),
Bucket='my_contract_bucket',
Key=obj.id # name it after the ID
)
Once our Supplier factory is called, now we’ll create a matching remote contract in s3.
As you can see, factoryboy has pretty much everything you need to build fully featured representitive replacements for ‘real’ data. Now you just need to go out and use it.
-
Entropy can be a problem in some
faker
attributes, so much so that I often result in using random strings combined with the attribute. This is doubly a problem when you end up using unique constraints on table columns.Faker('email')
I’m looking at you here. ↩︎