Factory injection, combining pytest with factory_boy


Factory injection, combining pytest with factory_boy

The most annoying in writing tests is the setup. All this boilerplate requires so much effort and thinking. You would prefer to focus on taking an action and asserting the result knowing some precondition, than diving again and again into researching how to make it happen. It would be nice to have some idea, some pattern that helps you to get started.

Normally testing tools don't provide any pattern to help you creating the test data. Developers are just writing chunks of code and create entities imperatively. Then they are trying to reuse this code, share it between different functions. But as a matter of fact these fixtures are not refactored too often as too many tests are using them. So the code gets duplicated with lots of flavors.

And in case of BDD it quickly ends up with scripting the creation of an entire hierarchy:

Scenario: Dark Helmet vs Lone Star v1

        Given I am your father's brother's nephew's cousin's former roommate

        When So what does that make us?

        Then Absolutely nothing

This fixture is a very complicated flavor. Flavors have a huge downside of encapsulating the creation code, that makes it impossible to reuse and to parametrize, so let's split it:

Scenario: Dark Helmet vs Lone Star v2
        Given there's a room
        And there's you
        And you have a father
        And your father has a brother
        And your father's brother has a nephew
        And your father's brother's nephew has a cousin
        And there's I am
        And your father's brother's nephew's cousin's lived in the room
        And I lived in the room

        When So what does that make us?

        Then Absolutely nothing

And this is getting even worse, but it is realistic! Imagine how terrible it is to reuse this setup for similar tests. But there's a solution to that - Gherkin Background section:

Feature: Spaceballs

        Background:
                Given there's a room
                And there's you
                And you have a father
                And your father has a brother
                And your father's brother has a nephew
                And your father's brother's nephew has a cousin
                And there's I am

        Scenario: Dark Helmet vs Lone Star v3
                Given your father's brother's nephew's cousin's lived in the room
                And I lived in the room

                When So what does that make us?

                Then Absolutely nothing

How is it now different from the v1? A lot more paper work and the flavor is still there. Anyway you have to address that particular person by some symbolic name, especially because these steps are executed separately and the context is shared between them.

Popular BDD tools are providing this feature, but every step has to come up with certain variable name, put it in the context, make sure to check if it was already created. It also implies that the given steps have to be in a certain order which is again too programmatic.

On the other hand most systems have a statically defined hierarchy. In case you need an object all non-nullable parents in the hierarchy have to be created, otherwise this object cannot exist. All the preconditions that make your system valid are not really worth mentioning. This is the so called just-enough-specification principle of Gherkin where you specify only the facts that you need to observe later assuming everything else simply works.

Certain flavors usually also have statically defined roles in the systems with much shorter names. To achieve that there should be some kind of pattern to make sure all of the object's dependencies are created without mentioning them explicitly.

At Paylogic we gained quite some experience by using pytest for more than 7 years, you can read some previous articles.

Dependency injection

The main difference between pytest and most of the other testing tools is dependency injection.

@pytest.fixture
def you(your_father):
        """You can't be created without your father."""
        ...

Basically the entire hierarchy is configured statically in the code as dependencies. It is easy to follow the flow by looking at definitions. Each fixture cares only about creating itself and ordering all necessary information as dependencies.

In pytest-BDD we implemented dependency injection support for the steps, so that pytest fixtures are shared among them instead of the context object that you have to feed in an imperative way.

Compared to the other testing tools code written using pytest has more declarative way rather than imperative and dependencies allow just-enough-specification.

Ideal test

What would be the ideal test? I would say the one where I mention only those attributes that make the test case, these values are defined as close as possible to the test code where I can compare them with the assertion section.

pytest has an elegant way to parametrize tests:

@pytest.mark.parametrize("book__price", [Amount("EUR", "15.20")])
@pytest.mark.parametrize(
        ("author__fee_percent", "expected_author_fee"),
        (
                (27, Amount("EUR", "4.104")),
                (12.5, Amount("EUR", "1.9")),
        ),
)
def test_unicode(book, expected_author_fee):
        """Test author fee calculation."""
        assert book.author_fee == expected_author_fee

The only downside of using pytest fixtures is finding their implementation since you don't have to import them. Fixtures can be inherited and overridden so you are not always sure in what context you are. It would be nice to have a more compact representation of their definitions or to avoid their manual definition completely.

Factory pattern

There's a great project for Python called factory_boy. It allows creating objects starting on any level of the model hierarchy, also creating all of the necessary dependencies. Factory pattern solves the problem of the encapsulation, so that any value can passed to any level.

It is declarative, compact and easy to maintain. Factories are look-alike models and provide a great overview on what values attributes will get.

There are post-generation declarations and actions that help you solve problems related to circular dependencies and to apply some side effects after the object is created.

import factory
import faker

fake = faker.Factory()


class AuthorFactory(factory.Factory):

        class Meta:
                model = Author

        name = factory.LazyAttribute(lambda f: fake.name())



class BookFactory(factory.Factory):

        class Meta:
                model = Book

        # Parent object is created at first
        author = factory.SubFactory(AuthorFactory)

        title = factory.LazyAttribute(lambda f: fake.sentence())


# Create a book with default values.
book = BookFactory()

# Create a book with specific title value.
book_with_title = BookFactory(title='pytest in a nutshell')

# Create a book parametrized with specific author name.
book_with_author_name = BookFactory(author__name='John Smith')

# Creates books with titles "Book 1", "Book 2", "Book 3", etc.
tons_of_books = BookFactory.create_batch(
        size=100000,
        title=factory.Sequence(lambda n: "Book {0}".format(n)),
)

The double underscore syntax allows you to address the attribute, the attribute of the attribute etc. This is a nice technique that helps creating complex and large datasets for various needs.

These datasets can be totally parallel hierarchies or a common parent can be passed to bind them. You can even configure certain rules about how to obtain the parent on the attribute declaration.

We use it to populate our sandbox environment database with demo data that 3rd parties can use.

Normally for well isolated test you don't need the entire hierarchy. It takes too much time and resource to create everything. It should be like a lightning that is reaching the target following the shortest path, create only what's necessary to make it work.

Also the tests that are separated in steps want to use only relevant nodes of the hierarchy, not navigating through an entire hierarchy to reach a certain instance or attribute. And this is what pytest is good at, also solving the problem of binding objects to the common parent which is an instance of a fixture in the session.

Factory vs dependency injection

So just to recap, Factory is good at compact declarative style with a good overview of what would be the values, flexibility in parametrization on any level of hierarchy.

pytest is good at delivering fully configured fixtures at any point of the test setup and parametrization of the test case.

What if there could be a way to combine those two to take advantage of:

  • Minimum objects creation in the hierarchy path.
  • Easily accessed instances of the hierarchy path by conventional names.
  • Strong convention of naming attributes for the parametrization.
  • Compact declarative notation for the models and attributes.

It is possible to parametrize anything using pytest as long as it is a fixture. It means that we need fixtures not only for the principal entities, but also for all their attributes. Since fixture names are unique within the scope of a test session and represent the same instance, the double underscore convention of factory_boy can be also applied.

pytest-factoryboy

So I made a library where I'm trying to combine the best practices of dependency injection and the factory pattern. The main idea is that the logic of creating attributes is incorporated in the factory attribute declarations and dependencies are represented by sub-factory attributes.

Basically there's no need for manual implementation of the fixtures since factories can be introspected and fixtures can be automatically generated. Only registration is needed where you can optionally choose a name for the fixture.

Factory Fixture

Factory fixtures allow using factories without importing them. Name convention is lowercase-underscore class name.

import factory
from pytest_factoryboy import register

class AuthorFactory(factory.Factory):

    class Meta:
        model = Author


register(AuthorFactory)


def test_factory_fixture(author_factory):
    author = author_factory(name="Charles Dickens")
    assert author.name == "Charles Dickens"

Basically you don't have to be bothered importing the factory at all. The only thing you should keep in mind is the naming convention to guess the name and just request it in your fixture or your test function.

Model Fixture

Model fixture implements an instance of a model created by the factory. Name convention is lowercase-underscore class name.

import factory
from pytest_factoryboy import register

@register
class AuthorFactory(Factory):

    class Meta:
        model = Author

    name = "Charles Dickens"


def test_model_fixture(author):
    assert author.name == "Charles Dickens"

Model fixtures can be registered with specific names. For example if you address instances of some collection by names like "first", "second" or of another parent as "other":

register(BookFactory)  # book
register(BookFactory, "second_book")  # second_book

register(AuthorFactory) # author
register(AuthorFactory, "second_author") # second_author

register(BookFactory, "other_book")  # other_book, book of another author

@pytest.fixture
def other_book__author(second_author):
    """Make the relation of the second_book to another (second) author."""
    return second_author
Attributes are Fixtures

There are fixtures created for factory attributes. Attribute names are prefixed with the model fixture name and double underscore (similar to the convention used by factory boy).

@pytest.mark.parametrized("author__name", ["Bill Gates"])
def test_model_fixture(author):
    assert author.name == "Bill Gates"
SubFactory

Sub-factory attribute points to the model fixture of the sub-factory. Attributes of sub-factories are injected as dependencies to the model fixture and can be overridden via parametrization.

post-generation

Post-generation attribute fixture implements only the extracted value for the post generation function.

Integration

An example of factory_boy and pytest integration.

factories/__init__.py:

import factory
from faker import Factory as FakerFactory

faker = FakerFactory.create()


class AuthorFactory(factory.django.DjangoModelFactory):

    """Author factory."""

    name = factory.LazyAttribute(lambda x: faker.name())

    class Meta:
        model = 'app.Author'


class BookFactory(factory.django.DjangoModelFactory):

    """Book factory."""

    title = factory.LazyAttribute(lambda x: faker.sentence(nb_words=4))

    class Meta:
        model = 'app.Book'

    author = factory.SubFactory(AuthorFactory)

tests/conftest.py:

from pytest_factoryboy import register

from factories import AuthorFactory, BookFactory

register(AuthorFactory)
register(BookFactory)

tests/test_models.py:

from app.models import Book
from factories import BookFactory

def test_book_factory(book_factory):
    """Factories become fixtures automatically."""
    assert isinstance(book_factory, BookFactory)

def test_book(book):
    """Instances become fixtures automatically."""
    assert isinstance(book, Book)

@pytest.mark.parametrize("book__title", ["pytest for Dummies"])
@pytest.mark.parametrize("author__name", ["Bill Gates"])
def test_parametrized(book):
    """You can set any factory attribute as a fixture using naming convention."""
    assert book.name == "pytest for Dummies"
    assert book.author.name == "Bill Gates"
Fixture partial specialization

There is a possibility to pass keyword parameters in order to override factory attribute values during fixture registration. This comes in handy when your test case is requesting a lot of fixture flavors. Too much for the regular pytest parametrization. In this case you can register fixture flavors in the local test module and specify value deviations inside register function calls.

register(AuthorFactory, "male_author", gender="M", name="John Doe")
register(AuthorFactory, "female_author", gender="F")


@pytest.fixture
def female_author__name():
    """Override female author name as a separate fixture."""
    return "Jane Doe"


@pytest.mark.parametrize("male_author__age", [42])  # Override even more
def test_partial(male_author, female_author):
    """Test fixture partial specialization."""
    assert male_author.gender == "M"
    assert male_author.name == "John Doe"
    assert male_author.age == 42

    assert female_author.gender == "F"
    assert female_author.name == "Jane Doe"
Fixture attributes

Sometimes it is necessary to pass an instance of another fixture as an attribute value to the factory. It is possible to override the generated attribute fixture where desired values can be requested as fixture dependencies. There is also a lazy wrapper for the fixture that can be used in the parametrization without defining fixtures in a module.

LazyFixture constructor accepts either existing fixture name or callable with dependencies:

import pytest
from pytest_factoryboy import register, LazyFixture


@pytest.mark.parametrize("book__author", [LazyFixture("another_author")])
def test_lazy_fixture_name(book, another_author):
    """Test that book author is replaced with another author by fixture name."""
    assert book.author == another_author


@pytest.mark.parametrize("book__author", [LazyFixture(lambda another_author: another_author)])
def test_lazy_fixture_callable(book, another_author):
    """Test that book author is replaced with another author by callable."""
    assert book.author == another_author


# Can also be used in the partial specialization during the registration.
register(AuthorFactory, "another_book", author=LazyFixture("another_author"))
Post-generation dependencies

Unlike factory_boy which binds related objects using an internal container to store results of lazy evaluation, pytest-factoryboy relies on the pytest request.

Circular dependencies between objects can be resolved using post-generation hooks/related factories in combination with passing the SelfAttribute, but in the case of pytest request fixture functions have to return values in order to be cached in the request and to become available to other fixtures.

That's why evaluation of the post-generation declaration in pytest-factoryboy is deferred until calling the test function. This solves circular dependency resolution for situations like:

o->[ A ]-->[ B ]<--[ C ]-o
|                        |
o----(C depends on A)----o

On the other hand deferring the evaluation of post-generation declarations makes their result unavailable during the generation of objects that are not in the circular dependency, but they rely on the post-generation action. It is possible to declare such dependencies to be evaluated earlier, right before generating the requested object.

from pytest_factoryboy import register


class Foo(object):

    def __init__(self, value):
        self.value = value


class Bar(object):

    def __init__(self, foo):
        self.foo = foo


@register
class FooFactory(factory.Factory):

    """Foo factory."""

    class Meta:
        model = Foo

    value = 0

    @factory.post_generation
    def set1(foo, create, value, **kwargs):
        foo.value = 1


class BarFactory(factory.Factory):

    """Bar factory."""

    foo = factory.SubFactory(FooFactory)

    @classmethod
    def _create(cls, model_class, foo):
        assert foo.value == 1  # Assert that set1 is evaluated before object generation
        return super(BarFactory, cls)._create(model_class, foo=foo)

    class Meta:
        model = Bar


register(
    BarFactory,
    'bar',
    _postgen_dependencies=["foo__set1"],
)
"""Forces 'set1' to be evaluated first."""


def test_depends_on_set1(bar):
    """Test that post-generation hooks are done and the value is 2."""
    assert depends_on_1.foo.value == 1

All post-generation/RelatedFactory attributes specified in the _postgen_dependencies list during factory registration are evaluated before the object generation.

In case you are using an ORM, SQLAlchemy is especially good with post-generation actions. It is as lazy as possible and doesn't require to bind objects by using constructors or primary keys to be generated. SQLAlchemy is your friend here.

Hooks

pytest-factoryboy exposes several pytest hooks which might be helpful for e.g. controlling database transaction, for reporting etc:

  • pytest_factoryboy_done(request) - Called after all factory based fixtures and their post-generation actions have been evaluated.

Conclusion

As pytest helps you write better programs, pytest-factoryboy helps you write better tests. Conventions and limitations allow you to focus on the test exercise and assertions rather than implementation of complicated test setups.

It is easy to parametrize particular low-level attributes of your models, but it is also easy to identify higher level flags to apply all the side effects for the certain workflows in your system.

In both cases it is not needed to navigate through files and folders, looking for specific (inherited) fixture implementations. It is enough to take a look at the factory classes to get an idea of what to expect, but in most cases it is easy to guess by combining the model name and the attribute name that you know by heart.

It is also a great help in behavioral tests to introduce homogenic given steps and avoid fixture flavors. Since you are not implementing complete fixtures manually, flavors just don't exist.

Given steps are simply specifying attribute values for the models depending on them or applying some mutations after the instance is created. And again you are operating only the attributes declared on the factory level that define your data or dispatch the workflow of your system.

Future

Is there more room to automate? Yes.

Python is an amazing dynamic language with a lot of introspection possibilities. Most of the good libraries provide tools for introspection.

In case of SQLAlchemy and if you are decorating your types with meaningful types, for example by using Email, Password, Address types from the SQLAlchemy-Utils, you could generate base classes for your factories and let faker provide you with realistic human-readable values.

There's a project that I want to continue on called Alchemyboy that allows generation of the base classes for factories based on SQLAlchemy models.


Comments