API Docs#

These docs aim to cover the entire public surface of the core dagster APIs, as well as public APIs from all provided libraries.

Dagster follows SemVer. We attempt to isolate breaking changes to the public APIs to minor versions (on a roughly 12-week cadence) and will announce deprecations in Slack and in the release notes to patch versions (on a roughly weekly cadence).

Core#

APIs from the core dagster package, divided roughly by topic:

  • Software-Defined Assets. An asset is an object in persistent storage, such as a table, file, or persisted machine learning model. A software-defined asset is a Dagster object that couples an asset to the function and upstream assets that are used to produce its contents.

  • Jobs APIs to define jobs that execute a set of ops with specific parameters.

  • Definitions APIs to collect definitions so that tools like the Dagster CLI or UI can load them as code locations.

  • Ops APIs to define or decorate functions as ops, declare their inputs and outputs, compose ops with each other, as well as the datatypes that op execution can return or yield.

  • Graphs APIs to define a logical structure of ops.

  • Resources APIs to define resources which can provide specific implementations of certain functionality within a job.

  • Loggers APIs to define where logs go.

  • Config The types available to describe config schemas.

  • Types The Dagster type system helps describe and verify at runtime the values that ops accept and produce.

  • Dagster CLI Browse repositories and execute jobs from the command line

  • Schedules & Sensors APIs to define schedules and sensors that initiate job execution, as well as some built-in helpers for common cases.

  • Partitions APIs to define partitions of the config space over which job runs can be backfilled.

  • Errors Errors thrown by the Dagster framework.

  • Execution APIs to execute and test jobs and individual ops, the execution context available to ops, job configuration, and the default executors available for executing jobs.

  • Hooks APIs to define Dagster hooks, which can be triggered on specific Dagster events.

  • IO Managers APIs to define how inputs and outputs are handled and loaded.

  • Dynamic Mapping & Collect APIs that allow graph structures to be determined at run time.

  • Job-Level Versioning & Memoization (Deprecated) Code versioning and memoization of previous outputs based upon that versioning.

  • Repositories Older APIs to define collections of jobs and other definitions that tools like the Dagster CLI or UI can load.

  • Utilities Miscellaneous helpers used by Dagster that may be useful to users.

  • Internals Core internal APIs that are important if you are interested in understanding how Dagster works with an eye towards extending it: logging, executors, system storage, the Dagster instance & plugin machinery, storage, schedulers.

Libraries#

Dagster also provides a growing set of optional add-on libraries to integrate with infrastructure and other components of the data ecosystem:

  • Airbyte (dagster-airbyte) Dagster integrations to run Airbyte jobs.

  • Airflow (dagster-airflow) Tools for compiling Dagster jobs to Airflow DAGs, and for ingesting Airflow DAGs to represent them in Dagster.

  • AWS (dagster-aws) Dagster integrations for working with AWS resources.

  • Azure (dagster-azure) Dagster integrations for working with Microsoft Azure resources.

  • Celery (dagster-celery) Provides an executor built on top of the popular Celery task queue, and an executor with support for using Celery on Kubernetes.

  • Celery+Docker (dagster-celery-docker) Provides an executor that lets Celery workers execute in Docker containers.

  • Celery+Kubernetes (dagster-celery-k8s) Provides an executor that lets Celery workers execute on Kubernetes.

  • Dask (dagster-dask) Provides an executor built on top of dask.distributed.

  • dbt (dagster-dbt) Provides ops and resources to run dbt projects.

  • Databricks (dagster-databricks) Provides ops and resources for integrating with Databricks.

  • Datadog (dagster-datadog) Provides an integration with Datadog, to support publishing metrics to Datadog from within Dagster ops.

  • Datahub (dagster-datahub) Provides an integration with Datahub, to support pushing metadata to Datahub from within Dagster ops.

  • Docker (dagster-docker) Provides components for deploying Dagster to Docker.

  • DuckDB (dagster-duckdb) Provides resources for querying DuckDB from Dagster.

  • DuckDB+Pandas (dagster-duckdb-pandas) Provides support for storing Pandas DataFrames in DuckDB.

  • DuckDB+Polars (dagster-duckdb-polars) Provides support for storing Polars DataFrames in DuckDB.

  • DuckDB+PySpark (dagster-duckdb-pyspark) Provides support for storing PySpark DataFrames in DuckDB.

  • Fivetran (dagster-fivetran) Provides ops and resources to run Fivetran syncs.

  • GCP (dagster-gcp) Dagster integrations for working with Google Cloud Platform resources.

  • GCP+Pandas (dagster-gcp-pandas) Dagster integrations for working with Google Cloud Platform resources with Pandas DataFrames. Currently contains integrations for BigQuery.

  • GCP+PySpark (dagster-gcp-pyspark) Dagster integrations for working with Google Cloud Platform resources with PySpark DataFrames. Currently contains integrations for BigQuery.

  • GE (dagster-ge) Dagster integrations for working with Great Expectations data quality tests.

  • GitHub (dagster-github) Provides a resource for issuing GitHub GraphQL queries and filing GitHub issues from Dagster jobs.

  • GraphQL (dagster-graphql) Provides resources for interfacing with a Dagster deployment over GraphQL.

  • Kubernetes (dagster-k8s) Provides components for deploying Dagster to Kubernetes.

  • Microsoft Teams (dagster-msteams) Includes a simple integration with Microsoft Teams.

  • MLflow (dagster-mlflow) Provides resources and hooks for using MLflow functionalities with Dagster runs.

  • MySQL (dagster-mysql) Includes implementations of run and event log storage built on MySQL.

  • PagerDuty (dagster-pagerduty) Provides an integration for generating PagerDuty events from Dagster ops.

  • Pandas (dagster-pandas) Provides support for using pandas DataFrames in Dagster and utilities for performing data validation.

  • Pandera (dagster-pandera) Provides support for validating pandas dataframes using Pandera.

  • Papertrail (dagster-papertrail) Provides support for sending Dagster logs to Papertrail.

  • PostgreSQL (dagster-postgres) Includes implementations of run and event log storage built on Postgres.

  • Prometheus (dagster-prometheus) Provides support for sending metrics to Prometheus.

  • Pyspark (dagster-pyspark) Provides an integration with pyspark.

  • Shell (dagster-shell) Provides utilities for issuing shell commands from Dagster jobs.

  • Slack (dagster-slack) Provides a simple integration with Slack.

  • Snowflake (dagster-snowflake) Provides resources for querying Snowflake from Dagster.

  • Snowflake+Pandas (dagster-snowflake-pandas) Provides support for storing Pandas DataFrames in Snowflake.

  • Snowflake+PySpark (dagster-snowflake-pyspark) Provides support for storing PySpark DataFrames in Snowflake.

  • Spark (dagster-spark) Provides an integration for working with Spark in Dagster.

  • SSH / SFTP (dagster-ssh) Provides an integration for running commands over SSH and retrieving / posting files via SFTP.

  • Twilio (dagster-twilio) Provides a resource for posting SMS messages from ops via Twilio.

  • Weights & Biases (dagster-wandb) Provides an integration with Weights & Biases (W\&B).