Dagster is the data orchestration layer of the platform — the control center where ingestion, transformation, and quality workflows come together.
It enables observability, automation, and governance across the full data lifecycle.
Core Concepts
Factories
The implementation of dagster relies heavily on factory models. This allows users to define their assets, and dagster will discover and build the asset graph.
Assets
Assets are the building blocks of your data platform.
Each asset represents a table, dataset, or derived output. This differs from other
orchestration tools which focus more around pipeline definitions.
- Defined with clear dependencies.
- Automatically builds pipelines based on linage.
- Automatically visualized in the Asset Graph.
- Materialized on demand or via automated schedules.
Jobs & Runs
A job defines how assets are executed. A run is an instance of that execution, capturing all logs, metadata, and results. You can:
- Trigger runs manually from the Dagster UI.
- Schedule them via automation conditions.
- Monitor and retry failed steps interactively.
Automation Conditions
automation condition
Automation conditions are lightweight configurations defined in metadata that tell Dagster when to run assets automatically. Common conditions include:
| Condition | Behavior |
|---|---|
| on_cron_no_deps | Runs on a defined cron schedule, independent of dependencies. |
| on_cron_with_deps | Runs on a schedule after upstream assets succeed. |
| on_upstream_change | Automatically triggers when an upstream asset is updated. |
| manual_only | Requires explicit user-triggered materialization. |
Schedules & Sensors
Dagster automates when and how assets are refreshed.
| Type | Description | Example |
|---|---|---|
| Schedule | Runs on a fixed cron schedule | @daily, 0 2 * * * |
| Sensor | Reacts to an event (e.g., upstream completion, new file) | New S3 file, dbt model update |
Code Locations
- User code can be split into separate code locations, which allow for different code dependencies and environments to coincide together in one asset graph.
- Keeps teams isolated by responsibility (ingestion, transformation, etc.)
- Enables independent deployment and testing.
- Unified in the Dagster UI for complete lineage visibility.
Integration with the Platform
Dagster orchestrates across all layers:
| Component | Role | Dagster’s Function |
|---|---|---|
| Sling | Raw ingestion via YAML-defined replication | Schedules and monitors replication runs |
| dltHub | Python-based data extraction and generation | Executes ingestion and normalization code |
| dbt | SQL-based transformation and testing | Orchestrates dbt models and freshness runs |
| Snowflake | Central data warehouse | All assets ultimately materialize here |
| Snowpark | Python-based machine learning and analytics | Runs feature engineering and model scoring within Snowflake compute |
Observability & Debugging
- Run logs show detailed execution flow per step.
- Metadata and lineage views visualize upstream/downstream dependencies.
- Auto-retries, failure notifications, and event-based triggers reduce manual overhead.
- Dagster tracks data freshness, versioning, and success status across all assets.
Why Dagster?
- Unified orchestration, providing the ability to glue different resources together.
- Built-in data lineage, freshness tracking, and type safety.
- Simplified CI/CD integration and local development (dagster dev).
- First-class UI for developers and analysts alike.
- Dagster connects your data sources, transformations, and consumers into a cohesive, observable, and automated data platform.
