Multi-Repository Code Locations

In this tutorial, you'll build a multi-repository data platform with Dagster+ that:

Separates Analytics and ML teams into independent repositories
Enables cross-repository asset dependencies and data sharing
Implements shared resource configurations for seamless data flow
Demonstrates independent deployment cycles for different teams
Shows how to coordinate production deployments across repositories

You will learn to:

Set up multiple code locations with independent repositories
Configure shared storage for cross-repository asset access
Declare and manage cross-repository asset dependencies
Organize teams with different development and deployment schedules
Deploy multiple code locations to Dagster+ with proper coordination
Monitor and maintain cross-repository data pipelines

Prerequisites

To follow the steps in this tutorial, you'll need:

Python 3.9+ installed. For more information, see the Installation guide.
A Dagster+ account for deployment examples.
Familiarity with Python, data pipelines, and basic machine learning concepts.
Understanding of Git workflows and repository management.

Architecture Overview

This example demonstrates a realistic multi-team scenario with two separate repositories:

Analytics Team Repository (repo-analytics/): Handles data ingestion, transformation, and business reporting
ML Platform Team Repository (repo-ml/): Manages feature engineering, model training, and predictions

Despite being in separate repositories, assets in one code location can depend on assets from another code location, enabling cross-team collaboration while maintaining clear organizational boundaries.

Multi-Repository Structure

Each repository is structured as an independent Dagster project with its own configuration:

workspace.yaml
load_from:
  - python_package:
      package_name: analytics.definitions
      location_name: analytics-team
      working_directory: ./repo-analytics
  - python_package:
      package_name: ml_platform.definitions  
      location_name: ml-platform
      working_directory: ./repo-ml

The workspace configuration defines two separate code locations, each pointing to a different Python package and working directory. This allows both repositories to be loaded simultaneously in a single Dagster instance while maintaining clear separation.

Cross-Repository Dependencies

The example demonstrates how assets in the ML repository can depend on assets from the Analytics repository:

customer_features depends on customer_order_summary (from analytics)
product_features depends on product_performance (from analytics)

These dependencies are handled through explicit asset key references and shared storage, enabling cross-team data collaboration while maintaining repository independence.

Shared Resource Configuration

Both repositories use a shared I/O manager configuration to enable cross-repository asset dependencies:

repo-analytics/src/analytics/definitions.py
        # Add shared I/O manager for cross-repository access
        resources={
            **defs_from_folder.resources,
            "io_manager": dg.FilesystemIOManager(base_dir="~/Documents/dagster_shared_assets"),
        },

The FilesystemIOManager with a shared base directory ensures that assets materialized in one repository can be accessed by assets in another repository. In production, this would typically be replaced with cloud storage like S3 or GCS.

Next steps

Continue this tutorial with Analytics Repository

Prerequisites​

Architecture Overview​

Multi-Repository Structure​

Cross-Repository Dependencies​

Shared Resource Configuration​

Next steps​