Essential Python Tools #2: Factoryboy

Wed, May 1, 2024

Tags: Python, Software Engineering, Testing, Factoryboy, Essential Python Tools, Fixtures

This is the first in an ongoing series of essential tools that I use day to day as a python developer which I couldn’t be as productive without. This episode is on factoryboy…

What do you use for fixtures?

Traditionally I built database fixtures one of two ways:

Manually crafted artisan SQL commands which populated required fields into the database.
‘Live’ or environmental data that was dumped from the DB into a SQL file and is imported into the test database at test run startup.

I normally would use method 1 for testing repository or service methods where I needed a specific example which had some required properties, and method 2 for general testing where I was more testing rough results rather than specific outputs.

The problem is, both of these suck. Method one is time consuming, the artisan crafted SQL queries would break frquently after database updates and the second sucked because you constantly had to keep updating this ‘dev’ database file that was tested against AND find somewhere to store it.

In comes factoryboy, a tool that simplifies building pre configured on demand objects while allowing overrides at a call level. Rather than having to hand craft objects for each test, you can create a factory that uses your base class while providing helps to override individual fields.

Let’s build some examples, for now to keep things simple we’re going to ignore the connections to the database and just focus on building objects.

import uuid
from dataclasses import dataclass

@dataclass
class Item:
    id: uuid.UUID
    name: str
    price: float
    quantity: int

The above is a representation of an item and it’s inventory metadata. We need to link this up to a factory for creating new instances of it.

import factory
from uuid import uuid4

class ItemFactory(factory.Factory):
  class Meta:
      model = Item

    id = uuid4()
    name = "potato"
    price = 1.0
    quantity = 100

Now we’ve got a factory which is linked to our item through this meta class, we’ve also set some defaults for all of the fields. The factory takes care of all the class instantiation and creation. Factoryboy gives us a neat little API for building instances of this with build and build_batch. Overriding of any of the attributes is available through passing that keyword into the creation API.

>>>ItemFactory.build()
InventoryItem(
  id=UUID('1081b64f-2584-4eb0-83ed-5f9a8a355c74'),
  name='potato',
  price=1.0,
  quantity=100
)

Creates one. Creating a batch of 5 examples however gives us 5 of the same object, really that’s probably not what we want. We can override fields like so.

>>>ItemFactory.build(name='carrot', price=0.5)
InventoryItem(
  id=UUID('1081b64f-2584-4eb0-83ed-5f9a8a355c74'),
  name='carrot',
  price=0.5,
  quantity=100
)

Now you want your test data to be representitive of your data and manually setting fields is going to get annoying fast. If your developing a sorting algorithm then you probably want data that’s different, not 5 instances of the same object, that’s not going to be useful for testing your function. In comes faker.

import factory

class ItemFactory(factory.Factory):
  class Meta:
      model = Item

    id = uuid4()
    name = factory.Faker('name')
    price = 1.0
    quantity = 100

And running it again…

>>>ItemFactory.build()
InventoryItem(
  id=UUID('1081b64f-2584-4eb0-83ed-5f9a8a355c74'),
  name='Ian Gonzalez',
  price=1.0,
  quantity=100
)

Gives us a human name, not what we want but good enough (if you do want to fix this then there’s [3][faker plugins] out there).

Now if you’re sharp you’ve noticed there’s a bug, our UUID doesn’t update between invocations. The ItemFactory we’ve declared will be parsed and the right hand side of the assignment is evaluated as the module is parsed, the function is called and the id value is set for the life of the factory. This is fine for static values or ones using the inbuilt factory.Faker callable, what we want however is a way of passing a callable which is called for every new piece of data.

In comes LazyFunction, this takes a callable with no args and assigns whatever is returned to that value. Let’s update our factory to use that:

import random
import factory

class ItemFactory(factory.Factory):
  class Meta:
      model = Item

    id = factory.LazyFunction(uuid4)
    name = factory.Faker('name')
    price = 1.0
    quantity = factory.LazyFunction(lambda: random.randint(0, 100))

We’re now going to say for every Item that there’s a random quantity between 0 and 100. This is great, but sometimes we might want to base our outputs upon some already existing data. In comes LazyAttribute, this works the same way except it takes a callable with a single arg, which is the half built item itself.

Let’s update this and produce a price based upon the name of the item.

class ItemFactory(factory.Factory):
  class Meta:
      model = Item

    id = factory.LazyFunction(uuid4)
    name = factory.LazyFunction(lambda: random.choice(['loafers', 'trainers', 'oxfords']))
    price = factory.LazyAttribute(lambda x: 1000 if x.name == 'loafers' else 100)
    quantity = factory.LazyFunction(lambda: random.randint(0, 100))

Now loafers are a [4][bougie shoe], so we’re now charging 1000 for loafers, and 100 for everything else. What can we say, we’re generous. Between these tools we can setup fairly representive test data and lots of it, our only real limitation is our source dataset¹ and the time we want our tests to take. Where as tools like hypothesis can run based upon a period of time, here we have no such luxury as we’re expicitly setting values.

Now in the real world our types aren’t always this simple, thankfully factoryboy has an answer for tha as well. Subfactories are a way of chaining relationships together. Given our above ItemFactory example, we might want to extend it to include parent orders.

Adding a supplier field to our item.

import uuid
from dataclasses import dataclass

@dataclass
class Supplier:
    id: uuid.UUID
    name: str

@dataclass
class Item:
    id: uuid.UUID
    name: str
    price: float
    quantity: int
    supplier: Supplier

Now we can join these onto our existing ItemFactory.

import random

class SupplierFactory(factory.Factory):
    class Meta:
        model = Supplier

    id = factory.LazyFunction(uuid4)
    name = factory.Faker('name')

class ItemFactory(factory.Factory):
    class Meta:
        model = Item

    id = factory.LazyFunction(uuid4)
    name = factory.LazyFunction(lambda: random.choice(['loafers', 'trainers', 'oxfords']))
    price = factory.LazyAttribute(lambda x: 1000 if x.name == 'loafers' else 100)
    quantity = factory.LazyFunction(lambda: random.randint(0, 100))
    supplier = factory.SubFactory(SupplierFactory)

And testing it out.

>>>ItemFactory.build()
Item(
    id=UUID('ec0a3e57-ff9f-4124-a5e8-e8cbbba23ba5'),
    name='loafers',
    price=1000,
    quantity=20,
    supplier=Supplier(
        id=UUID('cd1d780c-407e-4047-a8e9-bf9e0564d210'),
        name='Lori Reyes'
    )
)

Boom. Now it’s important to recognise that supplier here is the parent object and is created first, with item being the child object. If your relationship is the other way round then factory.RelatedFactory is likely what you want, the ‘relatedfactory’ is created after the main factory. Depending upon the relationship you’re mapping normally subfactory will get you all you need. If you’ve got a one to many relationship then you’ll likely want RelatedFactoryList, this does what it says on the tin, and generates multiple objects.

Now say you want to create a Supplier that has the name ‘potato’, you could create a instance of the subfactory, then pass that in as a named argument to be used. Thankfully there’s a handy syntax for shortening this.

>>>ItemFactory.build(supplier__name='Jimbob Jones')
Item(
    id=UUID('ec0a3e57-ff9f-4124-a5e8-e8cbbba23ba5'),
    name='loafers',
    price=1000,
    quantity=20,
    supplier=Supplier(
        id=UUID('cd1d780c-407e-4047-a8e9-bf9e0564d210'),
        name='Jimbob Jones'
    )
)

This works for multi nested objects, so you can represent fairly complex values quickly.

Now we probably need to integrate our datastore to persist this data, all the way through this we’ve been using the build method. There is another method available, create doesn’t just build out an object, it also persists it in the datastore.

from uuid import UUID
from sqlalchemy import create_engine   
from sqlalchemy.orm import DeclarativeBase, Mapped, Session, mapped_column
 
engine = create_engine("sqlite://", future=True)
session = Session(engine)


class Base(DeclarativeBase):
  pass

class Supplier(Base):
    __tablename__ = "suppliers"

    id: Mapped[UUID] = mapped_column(UUID, primary_key=True)
    name: Mapped[str]

Now we’ve got an actual persistant store, we can apply this to our data.

class SupplierFactory(factory.alchemy.SQLAlchemyModelFactory):
    id = factory.LazyFunction(uuid4)
    name = factory.Faker('name')

    class Meta:
        model = Broker
        sqlalchemy_session = session
        sqlalchemy_session_persistence = "commit"

Now when we call SupplierFactory.create() an object with a new id and name will be created, added to the session which is tied to the in memory database, finally the session will be committed once all objects have been created. This is SUPER handy when you’re using something like celery where a worker might not share the same database session as your test process.

Finally it’s worth talking about factory.post_generation which can be used to perform extra steps once your data is created, a pretty common use case for this is when you’re storing data in two places. Say you store metadata in a DB and a binary blob in s3. We can extend our Supplier object above.

import boto3
s3_client = boto3.client('s3')


class SupplierFactory(factory.alchemy.SQLAlchemyModelFactory):
    id = factory.LazyFunction(uuid4)
    name = factory.Faker('name')

    class Meta:
        model = Broker
        sqlalchemy_session = session
        sqlalchemy_session_persistence = "commit"


    @factory.post_generation
    def add_contract(obj, create, extracted, **kwargs):
        # We only want to add this data on a full create.
        if not create:
            return

        client.put_object(
          Body='contract'.encode('utf-8'),
          Bucket='my_contract_bucket',
          Key=obj.id  # name it after the ID
        )

Once our Supplier factory is called, now we’ll create a matching remote contract in s3.

As you can see, factoryboy has pretty much everything you need to build fully featured representitive replacements for ‘real’ data. Now you just need to go out and use it.

Entropy can be a problem in some faker attributes, so much so that I often result in using random strings combined with the attribute. This is doubly a problem when you end up using unique constraints on table columns. Faker('email') I’m looking at you here. ↩︎