Step Testing
Cauldron includes a StepTestCase class that allows steps to be "unit" tested using standard Python testing methods. This class includes the additional functionality needed for automatically setting up and tearing down the Cauldron project state before and after each test.
A Simple Example
This example is highly simplified to emphasize the key concepts of step testing. We start with a notebook containing two steps.
example-notebook
cauldron.json
S01-Load-Data.py
S02-Create-Total.py
The first step loads data from a CSV file into a Panda's DataFrame:
STEP 1:
S01-Load-Data.py
01
02
03
04
import cauldron as cd import pandas as pd cd.shared.df = pd.read_csv('data.csv')
The second step adds a new "total" column to that data frame that is based on the addition of two existing columns in the data frame:
STEP 2:
S02-Create-Total.py
01
02
03
04
05
06
07
08
09
10
import cauldron as cd import pandas as pd # Retrieve the stored data data frame df = cd.shared.df # type: pd.DataFrame df['total'] = df['part_one'] + df['part_two'] # Share the updated data frame cd.shared.df = df
If the data.csv contains missing values in either the 'part_one' or 'part_two' columns we will end up with a NaN value in the new 'total' column. That's not the behavior that we want. Instead, any NaN value in the 'part_one' or 'part_two' columns should be treated as zero during the summation of the total value.
If you're familiar with Pandas, you probably already have ideas on how to achieve this. But first we're going to create a step unit test that will validate our solution and fail given the current code in the second step.
We begin by creating a Python file to contain our unit test.
info_outline
This file must be placed somewhere within the notebook folder or Cauldron will be unable to locate the notebook and automatically initialize it for running the tests.
For this example we will place it in the top-level notebook directory and call it test_notebook.py. Inside this file we include the following:
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import cauldron as cd from cauldron.steptest import StepTestCase import pandas as pd import numpy as np class TestNotebook(StepTestCase): """ Test class containing step unit tests for the notebook """ def test_missing_values(self): """ should not have NaN values in the total column """ # Assign to the shared df variable a fictional data frame with only # a single row and the part_one column value will is missing cd.shared.df = pd.DataFrame(dict( part_one=[None], part_two=[12] )) # Run the step self.run_step('S02-Create-Total.py') # Retrieve the modified data frame from the shared variables df = cd.shared.df # Confirm that the total column value is not NaN self.assertFalse(np.isnan(df['total'].values[0]))
There are many ways to run Python unit tests depending on your choice of development tools. In this example we'll run the test from the command line using the command:
01
$ python -m unittest test_notebook
which must be executed from within the root folder for our notebook. The execution of this command yields the following console output:
====================================================================== FAIL: test_missing_values (test_notebook.TestNotebook) should not have NaN values in the total column ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Users\scott\cauldron\super-simple-testing\test_notebook.py", line 27, in test_missing_values self.assertFalse(np.isnan(df['total'].values[0])) AssertionError: True is not false ---------------------------------------------------------------------- Ran 1 test in 3.708s FAILED (failures=1)
The test has failed because the total column contains a NaN value. We can now go back to the second step and change our code so that it handles missing values within the source columns. The updated code looks like this:
STEP 2:
S02-Create-Total.py
01
02
03
04
05
06
07
08
09
10
import cauldron as cd import pandas as pd # Retrieve the stored data data frame df = cd.shared.df # type: pd.DataFrame df['total'] = df['part_one'].fillna(0) + df['part_two'].fillna(0) # Share the updated data frame cd.shared.df = df
Running the test again with these changes yields a successful output:
---------------------------------------------------------------------- Ran 1 test in 3.781s OK
We now have a test that validates the desired behavior of avoiding NaN in the 'total' column. This unit test can be run at any time to confirm that the code continues to behave properly as changes are made to the notebook. It is good practice to run unit tests regularly as you make changes to the notebook to make sure that changes haven't caused unintended issues.