Step Testing with pytest
Cauldron includes a cauldron.steptest.create_test_fixture function that allows
steps to be "unit" tested using standard Python testing methods in pytest. This function
wraps the functionality needed for automatically setting up and tearing down the Cauldron
project state before and after each test.
A Simple Example
The code for this example can be found in the Cauldron Gallery at:
This example is highly simplified to emphasize the key concepts of step
testing. We start with a notebook containing two steps.
example-notebook
cauldron.json
S01-Load-Data.py
S02-Create-Total.py
The first step loads data from a CSV file into a Panda's DataFrame:
01
02
03
04
05
06
07
08
09
10
11
12
import cauldron as cd
import pandas as pd
df: pd.DataFrame = pd.read_csv('data.csv')
cd.display.table(df)
cd.shared.df = df
The second step adds a new "total" column to that data frame that is based
on the addition of two existing columns in the data frame:
STEP 2:
S02-Create-Total.py
01
02
03
04
05
06
07
08
09
10
11
12
13
import cauldron as cd
import pandas as pd
df: pd.DataFrame = cd.shared.df
df['total'] = df['part_one'] + df['part_two']
cd.display.table(df)
cd.shared.df = df
If the data.csv contains missing values in either the 'part_one' or
'part_two' columns we will end up with a NaN value in the new 'total' column.
That's not the behavior that we want. Instead, any NaN value in the 'part_one'
or 'part_two' columns should be treated as zero during the summation of the
total value.
If you're familiar with Pandas, you probably already have ideas on how to
achieve this. But first we're going to create a step unit test that will
validate our solution and fail given the current code in the second step.
We begin by creating a Python file to contain our unit test.
info_outlineThis file must be placed somewhere within the notebook folder or Cauldron
will be unable to locate the notebook and automatically initialize it
for running the tests.
For this example we will place it within a step_tests subdirectory beneath the root
notebook directory and call it ./step_tests/test_notebook.py. Inside this file
we include the following:
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import cauldron as cd
from cauldron import steptest
import pandas as pd
import numpy as np
test_fixture = steptest.create_test_fixture(__file__)
def test_missing_values(tester: steptest.CauldronTest):
""" should not have NaN values in the total column """
cd.shared.df = pd.DataFrame(dict(
part_one=[None],
part_two=[12]
))
tester.run_step('S02-Create-Total.py')
df = cd.shared.df
assert not np.isnan(df['total'].values[0])
There are many ways to run Python unit tests depending on your choice of
development tools. In this example we'll run the test from the command line
using the command:
$ python -m pytest test_notebook.py
which must be executed from within the step_tests folder within the root directory
of the notebook. The execution of this command yields the following console output:
================================ FAILURES =========================================
____________________________ test_missing_values __________________________________
tester =
...
> assert not np.isnan(df['total'].values[0])
E AssertionError: assert not True
E + where True = (nan)
E + where = np.isnan
test_notebook.py:26: AssertionError
================== 1 failed, 0 warnings in 3.27 seconds ===========================
The test has failed because the total column contains a NaN value. We can
now go back to the second step and change our code so that it handles missing
values within the source columns. The updated code looks like this:
STEP 2:
S02-Create-Total.py
01
02
03
04
05
06
07
08
09
10
11
12
13
import cauldron as cd
import pandas as pd
df: pd.DataFrame = cd.shared.df
df['total'] = df['part_one'].fillna(0) + df['part_two'].fillna(0)
cd.display.table(df)
cd.shared.df = df
Running the test again with these changes yields a successful output:
================== 1 passed, 0 warnings in 2.61 seconds ===========================
We now have a test that validates the desired behavior of avoiding NaN in
the 'total' column. This unit test can be run at any time to confirm that
the code continues to behave properly as changes are made to the notebook.
It is good practice to run unit tests regularly as you make changes to the
notebook to make sure that changes haven't caused unintended issues.