Datahub (dagster-datahub)

This library provides an integration with Datahub, to support pushing metadata to Datahub from within Dagster ops.


We use the Datahub Python Library. To use it, you’ll first need to start up a Datahub Instance. Datahub Quickstart Guide.


dagster_datahub.DatahubRESTEmitterResource ResourceDefinition[source]

Config Schema:
connection (dagster.StringSource):

Datahub GMS Server

token (Union[dagster.StringSource, None], optional):

Personal Access Token

connect_timeout_sec (Union[Float, None], optional):

read_timeout_sec (Union[Float, None], optional):

retry_status_codes (Union[List[dagster.IntSource], None], optional):

retry_methods (Union[List[dagster.StringSource], None], optional):

retry_max_times (Union[dagster.IntSource, None], optional):

extra_headers (Union[dict, None], optional):

ca_certificate_path (Union[dagster.StringSource, None], optional):

server_telemetry_id (Union[dagster.StringSource, None], optional):

disable_ssl_verification (dagster.BoolSource, optional):

Default Value: False

Base class for Dagster resources that utilize structured config.

This class is a subclass of both ResourceDefinition and Config.

Example definition:

class WriterResource(ConfigurableResource):
    prefix: str

    def output(self, text: str) -> None:
        print(f"{self.prefix}{text}")

Example usage:

@asset
def asset_that_uses_writer(writer: WriterResource):
    writer.output("text")

defs = Definitions(
    assets=[asset_that_uses_writer],
    resources={"writer": WriterResource(prefix="a_prefix")},
)
dagster_datahub.DatahubKafkaEmitterResource ResourceDefinition[source]

Config Schema:
connection (strict dict):
Config Schema:
bootstrap (dagster.StringSource):

Kafka Boostrap Servers. Comma delimited

schema_registry_url (dagster.StringSource):

Schema Registry Location.

schema_registry_config (dict, optional):

Extra Schema Registry Config.

Default Value:
{}
topic (Union[dagster.StringSource, None], optional):

topic_routes (dict, optional):
Default Value:
{
    "MetadataChangeEvent": "MetadataChangeEvent_v4",
    "MetadataChangeProposal": "MetadataChangeProposal_v1"
}

Base class for Dagster resources that utilize structured config.

This class is a subclass of both ResourceDefinition and Config.

Example definition:

class WriterResource(ConfigurableResource):
    prefix: str

    def output(self, text: str) -> None:
        print(f"{self.prefix}{text}")

Example usage:

@asset
def asset_that_uses_writer(writer: WriterResource):
    writer.output("text")

defs = Definitions(
    assets=[asset_that_uses_writer],
    resources={"writer": WriterResource(prefix="a_prefix")},
)
dagster_datahub.datahub_rest_emitter ResourceDefinition[source]

Config Schema:
connection (dagster.StringSource):

Datahub GMS Server

token (Union[dagster.StringSource, None], optional):

Personal Access Token

connect_timeout_sec (Union[Float, None], optional):

read_timeout_sec (Union[Float, None], optional):

retry_status_codes (Union[List[dagster.IntSource], None], optional):

retry_methods (Union[List[dagster.StringSource], None], optional):

retry_max_times (Union[dagster.IntSource, None], optional):

extra_headers (Union[dict, None], optional):

ca_certificate_path (Union[dagster.StringSource, None], optional):

server_telemetry_id (Union[dagster.StringSource, None], optional):

disable_ssl_verification (dagster.BoolSource, optional):

Default Value: False

dagster_datahub.datahub_kafka_emitter ResourceDefinition[source]

Config Schema:
connection (strict dict):
Config Schema:
bootstrap (dagster.StringSource):

Kafka Boostrap Servers. Comma delimited

schema_registry_url (dagster.StringSource):

Schema Registry Location.

schema_registry_config (dict, optional):

Extra Schema Registry Config.

Default Value:
{}
topic (Union[dagster.StringSource, None], optional):

topic_routes (dict, optional):
Default Value:
{
    "MetadataChangeEvent": "MetadataChangeEvent_v4",
    "MetadataChangeProposal": "MetadataChangeProposal_v1"
}