Dagster Dagster

Dagster is the data orchestration layer of the platform — the control center where ingestion, transformation, and quality workflows come together.
It enables observability, automation, and governance across the full data lifecycle.

Core Concepts

Factories

The implementation of dagster relies heavily on factory models. This allows users to define their assets, and dagster will discover and build the asset graph.

Assets

Assets are the building blocks of your data platform.
Each asset represents a table, dataset, or derived output. This differs from other orchestration tools which focus more around pipeline definitions.

Defined with clear dependencies.
Automatically builds pipelines based on linage.
Automatically visualized in the Asset Graph.
Materialized on demand or via automated schedules.

Jobs & Runs

A job defines how assets are executed. A run is an instance of that execution, capturing all logs, metadata, and results. You can:

Trigger runs manually from the Dagster UI.
Schedule them via automation conditions.
Monitor and retry failed steps interactively.

Automation Conditions

automation condition

meta:
  dagster:
    automation_condition: "on_cron_no_deps"
    automation_condition_config:
      cron_schedule: "@daily"
      cron_timezone: "utc"

Automation conditions are lightweight configurations defined in metadata that tell Dagster when to run assets automatically. Common conditions include:

Condition	Behavior
on_cron_no_deps	Runs on a defined cron schedule, independent of dependencies.
on_cron_with_deps	Runs on a schedule after upstream assets succeed.
on_upstream_change	Automatically triggers when an upstream asset is updated.
manual_only	Requires explicit user-triggered materialization.

Schedules & Sensors

Dagster automates when and how assets are refreshed.

Type	Description	Example
Schedule	Runs on a fixed cron schedule	`@daily`, `0 2 * * *`
Sensor	Reacts to an event (e.g., upstream completion, new file)	New S3 file, dbt model update

Code Locations

User code can be split into separate code locations, which allow for different code dependencies and environments to coincide together in one asset graph.
Keeps teams isolated by responsibility (ingestion, transformation, etc.)
Enables independent deployment and testing.
Unified in the Dagster UI for complete lineage visibility.

Integration with the Platform

Dagster orchestrates across all layers:

Component	Role	Dagster’s Function
Sling	Raw ingestion via YAML-defined replication	Schedules and monitors replication runs
dltHub	Python-based data extraction and generation	Executes ingestion and normalization code
dbt	SQL-based transformation and testing	Orchestrates dbt models and freshness runs
Snowflake	Central data warehouse	All assets ultimately materialize here
Snowpark	Python-based machine learning and analytics	Runs feature engineering and model scoring within Snowflake compute

Observability & Debugging

Run logs show detailed execution flow per step.
Metadata and lineage views visualize upstream/downstream dependencies.
Auto-retries, failure notifications, and event-based triggers reduce manual overhead.
Dagster tracks data freshness, versioning, and success status across all assets.

Why Dagster?

Unified orchestration, providing the ability to glue different resources together.
Built-in data lineage, freshness tracking, and type safety.
Simplified CI/CD integration and local development (dagster dev).
First-class UI for developers and analysts alike.
Dagster connects your data sources, transformations, and consumers into a cohesive, observable, and automated data platform.