Datahub (dagster-datahub)
This library provides an integration with Datahub, to support pushing metadata to Datahub from
within Dagster ops.
We use the Datahub Python Library. To use it, you’ll
first need to start up a Datahub Instance. Datahub Quickstart Guide.
-
dagster_datahub.DatahubRESTEmitterResource ResourceDefinition[source]
-
Config Schema:
- connection (dagster.StringSource):
Datahub GMS Server
- token (Union[dagster.StringSource, None], optional):
Personal Access Token
- connect_timeout_sec (Union[Float, None], optional):
- read_timeout_sec (Union[Float, None], optional):
- retry_status_codes (Union[List[dagster.IntSource], None], optional):
- retry_methods (Union[List[dagster.StringSource], None], optional):
- retry_max_times (Union[dagster.IntSource, None], optional):
- extra_headers (Union[dict, None], optional):
- ca_certificate_path (Union[dagster.StringSource, None], optional):
- server_telemetry_id (Union[dagster.StringSource, None], optional):
- disable_ssl_verification (dagster.BoolSource, optional):
Default Value: False
Base class for Dagster resources that utilize structured config.
This class is a subclass of both ResourceDefinition
and Config
.
Example definition:
class WriterResource(ConfigurableResource):
prefix: str
def output(self, text: str) -> None:
print(f"{self.prefix}{text}")
Example usage:
@asset
def asset_that_uses_writer(writer: WriterResource):
writer.output("text")
defs = Definitions(
assets=[asset_that_uses_writer],
resources={"writer": WriterResource(prefix="a_prefix")},
)
-
dagster_datahub.DatahubKafkaEmitterResource ResourceDefinition[source]
-
Config Schema:
- connection (strict dict):
Config Schema:
- bootstrap (dagster.StringSource):
Kafka Boostrap Servers. Comma delimited
- schema_registry_url (dagster.StringSource):
Schema Registry Location.
- schema_registry_config (dict, optional):
Extra Schema Registry Config.
Default Value:
- topic (Union[dagster.StringSource, None], optional):
- topic_routes (dict, optional):
Default Value:
{
"MetadataChangeEvent": "MetadataChangeEvent_v4",
"MetadataChangeProposal": "MetadataChangeProposal_v1"
}
Base class for Dagster resources that utilize structured config.
This class is a subclass of both ResourceDefinition
and Config
.
Example definition:
class WriterResource(ConfigurableResource):
prefix: str
def output(self, text: str) -> None:
print(f"{self.prefix}{text}")
Example usage:
@asset
def asset_that_uses_writer(writer: WriterResource):
writer.output("text")
defs = Definitions(
assets=[asset_that_uses_writer],
resources={"writer": WriterResource(prefix="a_prefix")},
)
-
dagster_datahub.datahub_rest_emitter ResourceDefinition[source]
-
Config Schema:
- connection (dagster.StringSource):
Datahub GMS Server
- token (Union[dagster.StringSource, None], optional):
Personal Access Token
- connect_timeout_sec (Union[Float, None], optional):
- read_timeout_sec (Union[Float, None], optional):
- retry_status_codes (Union[List[dagster.IntSource], None], optional):
- retry_methods (Union[List[dagster.StringSource], None], optional):
- retry_max_times (Union[dagster.IntSource, None], optional):
- extra_headers (Union[dict, None], optional):
- ca_certificate_path (Union[dagster.StringSource, None], optional):
- server_telemetry_id (Union[dagster.StringSource, None], optional):
- disable_ssl_verification (dagster.BoolSource, optional):
Default Value: False
-
dagster_datahub.datahub_kafka_emitter ResourceDefinition[source]
-
Config Schema:
- connection (strict dict):
Config Schema:
- bootstrap (dagster.StringSource):
Kafka Boostrap Servers. Comma delimited
- schema_registry_url (dagster.StringSource):
Schema Registry Location.
- schema_registry_config (dict, optional):
Extra Schema Registry Config.
Default Value:
- topic (Union[dagster.StringSource, None], optional):
- topic_routes (dict, optional):
Default Value:
{
"MetadataChangeEvent": "MetadataChangeEvent_v4",
"MetadataChangeProposal": "MetadataChangeProposal_v1"
}