Multi-Repository Code Locations
In this tutorial, you'll build a multi-repository data platform with Dagster+ that:
- Separates Analytics and ML teams into independent repositories
- Enables cross-repository asset dependencies and data sharing
- Implements shared resource configurations for seamless data flow
- Demonstrates independent deployment cycles for different teams
- Shows how to coordinate production deployments across repositories
You will learn to:
- Set up multiple code locations with independent repositories
- Configure shared storage for cross-repository asset access
- Declare and manage cross-repository asset dependencies
- Organize teams with different development and deployment schedules
- Deploy multiple code locations to Dagster+ with proper coordination
- Monitor and maintain cross-repository data pipelines
Prerequisites
To follow the steps in this tutorial, you'll need:
- Python 3.9+ installed. For more information, see the Installation guide.
- A Dagster+ account for deployment examples.
- Familiarity with Python, data pipelines, and basic machine learning concepts.
- Understanding of Git workflows and repository management.
Architecture Overview
This example demonstrates a realistic multi-team scenario with two separate repositories:
- Analytics Team Repository (
repo-analytics/): Handles data ingestion, transformation, and business reporting - ML Platform Team Repository (
repo-ml/): Manages feature engineering, model training, and predictions
Despite being in separate repositories, assets in one code location can depend on assets from another code location, enabling cross-team collaboration while maintaining clear organizational boundaries.
Multi-Repository Structure
Each repository is structured as an independent Dagster project with its own configuration:
load_from:
- python_package:
package_name: analytics.definitions
location_name: analytics-team
working_directory: ./repo-analytics
- python_package:
package_name: ml_platform.definitions
location_name: ml-platform
working_directory: ./repo-ml
The workspace configuration defines two separate code locations, each pointing to a different Python package and working directory. This allows both repositories to be loaded simultaneously in a single Dagster instance while maintaining clear separation.
Cross-Repository Dependencies
The example demonstrates how assets in the ML repository can depend on assets from the Analytics repository:
customer_featuresdepends oncustomer_order_summary(from analytics)product_featuresdepends onproduct_performance(from analytics)
These dependencies are handled through explicit asset key references and shared storage, enabling cross-team data collaboration while maintaining repository independence.
Shared Resource Configuration
Both repositories use a shared I/O manager configuration to enable cross-repository asset dependencies:
# Add shared I/O manager for cross-repository access
resources={
**defs_from_folder.resources,
"io_manager": dg.FilesystemIOManager(base_dir="~/Documents/dagster_shared_assets"),
},
The FilesystemIOManager with a shared base directory ensures that assets materialized in one repository can be accessed by assets in another repository. In production, this would typically be replaced with cloud storage like S3 or GCS.
Next steps
- Continue this tutorial with Analytics Repository