Home
Dagster • dbt • Snowflake
A modern, modular data platform for end-to-end orchestration, transformation, and analytics.
What It Is
This repository is a reference implementation of a modern data stack that unifies:
- Dagster for orchestration and observability
- dbt for SQL-based transformation and testing
- Snowflake for scalable cloud warehousing
- dltHub for Pythonic ingestion and normalization of semi-structured data
- Sling for declarative configuration of replication pipelines from databases and filesystems
It’s designed for data engineers, analytics engineers, and platform engineers to work side-by-side with clear boundaries, strong governance, and simple local development.
Core Layers
Dagster — Orchestration & Lineage
Dagster ties everything together:
- Executes Sling and dltHub pipelines
- Runs dbt transformations in sequence
- Surfaces metadata, lineage, and freshness in one unified graph
➡ See Dagster Overview
dbt — Transformations & Marts
SQL transformations that model raw data into clean, analytics-ready marts.
Includes:
- Staging layers organized by source system
- Shared dimensions and fact models (
marts/) - Tests, freshness checks, and ownership metadata
➡ See dbt Overview
Sling — Declarative Replication
YAML-based configuration for extract and load (EL) pipelines.
Sling connects external databases or APIs to Snowflake’s raw layer, defining:
- Connection details (
connections.yaml) - Replication rules and stream settings (
replication.yaml)
➡ See Sling Overview
dltHub — Pythonic Ingestion
Lightweight Python connectors for APIs and semi-structured sources.
Each module defines:
data.py: a generator yielding structured data batchessources.yaml: replication and destination mapping
Ideal for dynamic datasets or paginated APIs.
➡ See dltHub Overview
Repository Layout
| Path | Owner Focus | Description |
|---|---|---|
| .dagster_home/ | Platform | persistent storage for development settings such as telemetry defaults. |
| .devconatainer/ | Platform | Visual Studio Code Dev Container to share development environment for local development. |
| .github/ | Platform | CI/CD automation (publishing docs, running checks). |
| .helm | Platform | Deployment values for helm chart to deploy to Kubernetes. |
| .mkdocs/ | Cross-functional | Markdown sources for the MkDocs site published via GitHub Pages. |
| .vscode/ | Cross-functional | Shared workspace settings, including code snippets for faster development. |
| docs/ | Cross-functional | Static documentation pages. Prioritize placing documentation in the relevant area of the project, however this space can be used for documentation does not otherwise have an appropriate location. |
| libs/ | Cross-functional | Shared libraries for common functions between packages. |
| packages/ | --- | Code locations which are deployed as separate docker images providing environment isolation so that multiple teams can manage their own code. |
| packages/data_analytics/ | Data & Analytics | Exploratory sql analyses and notebooks. |
| packages/data_science/ | Data & Analytics | Dagster definitions, resource configuration, and integration glue code. Includes Snowpark definitions for ML-Ops. |
| packages/data_foundation/ | --- | Foundational data assets that are used across the business. Contains ingestion's and the main dbt project |
| ../../src/data_foundation | Platform | Dagster definitions, resource configuration, and integration glue code. Includes Sling source connection YAML that controls raw data ingestion |
| ../../dbt/ | Cross-functional | dbt project containing models, seeds, snapshots, and tests. YAML files document sources, staging models, and marts. |
| .env.example | Cross-functional | Environment variable template for local development. Should be copied to .env and have values replaced with correct credentials. |
| Dockerfile, pyproject.toml, uv.lock | Platform | Runtime dependencies for orchestrator workers. |
| workspaces | Platform | Dagster code location configuration for local development. |
Local Development
- Consistent containerized environment across all engineers
- Fast iteration with instant feedback from Dagster & dbt
- Test and debug orchestration locally before deployment
- Safe sandboxing with isolated Snowflake schemas
- Reproducible builds via Dev Container configuration
- Preinstalled extensions for dbt, Snowflake, and Python
Governance & Observability
- YAML-first configuration for transparency and version control
- Dagster lineage graphs linking ingestion → transformation → marts
- Automated docs published via MkDocs and GitHub Pages
- Secrets securely managed via
.envand vault integration - CI checks for linting, unit tests, and dbt validation
Learn More
| Topic | Documentation |
|---|---|
| Sling | Extract & Load Configuration |
| dltHub | Python Connectors & API Sources |
| dbt | Transformations, Marts, and Testing |
| Dagster | Orchestration and Asset Graph |
| Local Setup | Developer Quick Start |
Contributing
- Keep YAML comments current — they document orchestration and ownership.
- Add new sources via Sling or dltHub, then create matching dbt sources.
- All production changes must include tests and docs updates.
Links
- ⚙️ Dagster Docs: docs.dagster.io
- 🧠 dbt Docs: docs.getdbt.com
- ❄️ Snowflake Docs: docs.snowflake.com
- 🔁 dltHub Docs: dlthub.com/docs
- ⚡ Sling Docs: slingdata.io/docs