Step Testing with unittest
Cauldron includes a StepTestCase class that allows steps to be "unit"
tested using standard Python testing methods. This class includes the
additional functionality needed for automatically setting up and tearing
down the Cauldron project state before and after each test.
A Simple Example
The code for this example can be found in the Cauldron Gallery at:
This example is highly simplified to emphasize the key concepts of step
testing. We start with a notebook containing two steps.
example-notebook
cauldron.json
S01-Load-Data.py
S02-Create-Total.py
The first step loads data from a CSV file into a Panda's DataFrame:
01
02
03
04
05
06
07
08
09
10
11
12
import cauldron as cd
import pandas as pd
df: pd.DataFrame = pd.read_csv('data.csv')
cd.display.table(df)
cd.shared.df = df
The second step adds a new "total" column to that data frame that is based
on the addition of two existing columns in the data frame:
STEP 2:
S02-Create-Total.py
01
02
03
04
05
06
07
08
09
10
11
12
13
import cauldron as cd
import pandas as pd
df: pd.DataFrame = cd.shared.df
df['total'] = df['part_one'] + df['part_two']
cd.display.table(df)
cd.shared.df = df
If the data.csv contains missing values in either the 'part_one' or
'part_two' columns we will end up with a NaN value in the new 'total' column.
That's not the behavior that we want. Instead, any NaN value in the 'part_one'
or 'part_two' columns should be treated as zero during the summation of the
total value.
If you're familiar with Pandas, you probably already have ideas on how to
achieve this. But first we're going to create a step unit test that will
validate our solution and fail given the current code in the second step.
We begin by creating a Python file to contain our unit test.
info_outlineThis file must be placed somewhere within the notebook folder or Cauldron
will be unable to locate the notebook and automatically initialize it
for running the tests.
For this example we will place it within a step_tests subdirectory beneath the root
notebook directory and call it ./step_tests/test_notebook.py. Inside this file
we include the following:
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import cauldron as cd
from cauldron.steptest import StepTestCase
import pandas as pd
import numpy as np
class TestNotebook(StepTestCase):
""" Test class containing step unit tests for the notebook """
def test_missing_values(self):
""" should not have NaN values in the total column """
cd.shared.df = pd.DataFrame(dict(
part_one=[None],
part_two=[12]
))
self.run_step('S02-Create-Total.py')
df = cd.shared.df
self.assertFalse(np.isnan(df['total'].values[0]))
There are many ways to run Python unit tests depending on your choice of
development tools. In this example we'll run the test from the command line
within the step_tests folder of our project using the command:
$ python -m unittest test_notebook.py
which must be executed from within the step_tests folder within the root directory
of the notebook. The execution of this command yields the following console output:
======================================================================
FAIL: test_missing_values (test_notebook.TestNotebook)
should not have NaN values in the total column
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\Users\scott\cauldron\super-simple-testing\test_notebook.py", line 27, in test_missing_values
self.assertFalse(np.isnan(df['total'].values[0]))
AssertionError: True is not false
----------------------------------------------------------------------
Ran 1 test in 3.708s
FAILED (failures=1)
The test has failed because the total column contains a NaN value. We can
now go back to the second step and change our code so that it handles missing
values within the source columns. The updated code looks like this:
STEP 2:
S02-Create-Total.py
01
02
03
04
05
06
07
08
09
10
11
12
13
import cauldron as cd
import pandas as pd
df: pd.DataFrame = cd.shared.df
df['total'] = df['part_one'].fillna(0) + df['part_two'].fillna(0)
cd.display.table(df)
cd.shared.df = df
Running the test again with these changes yields a successful output:
----------------------------------------------------------------------
Ran 1 test in 3.781s
OK
We now have a test that validates the desired behavior of avoiding NaN in
the 'total' column. This unit test can be run at any time to confirm that
the code continues to behave properly as changes are made to the notebook.
It is good practice to run unit tests regularly as you make changes to the
notebook to make sure that changes haven't caused unintended issues.