Pandera (dagster-pandera)

The dagster_pandera library allows Dagster users to use dataframe validation library Pandera for the validation of Pandas dataframes. See the guide for details.

dagster_pandera.pandera_schema_to_dagster_type(schema)[source]

Convert a Pandera dataframe schema to a DagsterType.

The generated Dagster type will be given an automatically generated name. The schema’s title property, name property, or class name (in that order) will be used. If neither title or name is defined, a name of the form DagsterPanderaDataframe<n> is generated.

Additional metadata is also extracted from the Pandera schema and attached to the returned DagsterType as a metadata dictionary. The extracted metadata includes:

  • Descriptions on the schema and constituent columns and checks.

  • Data types for each column.

  • String representations of all column-wise checks.

  • String representations of all row-wise (i.e. “wide”) checks.

The returned DagsterType type will call the Pandera schema’s validate() method in its type check function. Validation is done in lazy mode, i.e. pandera will attempt to validate all values in the dataframe, rather than stopping on the first error.

If validation fails, the returned TypeCheck object will contain two pieces of metadata:

  • num_failures total number of validation errors.

  • failure_sample a table containing up to the first 10 validation errors.

Parameters:

schema (Union[pa.DataFrameSchema, Type[pa.SchemaModel]]) –

Returns:

Dagster Type constructed from the Pandera schema.

Return type:

DagsterType