Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

MADSci Data Manager

Handles capturing, storing, and querying data generated during experiments - both JSON values and files.

MADSci Data Manager Diagram

Features

Installation

See the main README for installation options. This package is available as:

Dependencies: MongoDB database, optional MinIO/S3 storage (see example_lab)

Usage

Quick Start

Use the example_lab as a starting point:

# Start with working example
docker compose up  # From repo root
# Data Manager available at http://localhost:8004/docs

# Or run standalone
python src/madsci_data_manager/madsci/data_manager/data_server.py

Manager Setup

For custom deployments, see example_data.manager.yaml for configuration options.

Data Client

Use DataClient to store and retrieve experimental data:

from madsci.client.data_client import DataClient
from madsci.common.types.datapoint_types import DataPoint, DataPointTypeEnum
from datetime import datetime

client = DataClient(data_server_url="http://localhost:8004")

# Store JSON data
value_dp = DataPoint(
    label="Temperature Reading",
    data_type=DataPointTypeEnum.JSON,
    value={"temperature": 23.5, "unit": "Celsius"}
)
submitted = client.submit_datapoint(value_dp)

# Store files
file_dp = DataPoint(
    label="Experiment Log",
    data_type=DataPointTypeEnum.FILE,
    path="/path/to/data.txt"
)
submitted_file = client.submit_datapoint(file_dp)

# Retrieve data
retrieved = client.get_datapoint(submitted.datapoint_id)

# Save file locally
client.save_datapoint_value(submitted_file.datapoint_id, "/local/save/path.txt")

Examples: See example_lab/notebooks/experiment_notebook.ipynb for data management workflows.

Storage Configuration

Local Storage (Default)

Object Storage (S3-Compatible)

Supports cloud and self-hosted storage providers:

Benefits:

Quick Setup

# Use example_lab with pre-configured MinIO
docker compose up  # From repo root
# MinIO Console: http://localhost:9001 (minioadmin/minioadmin)

Configuration Examples

AWS S3:

from madsci.common.types.datapoint_types import ObjectStorageSettings

aws_config = ObjectStorageSettings(
    endpoint="s3.amazonaws.com",
    access_key="YOUR_ACCESS_KEY",
    secret_key="YOUR_SECRET_KEY",
    secure=True,
    default_bucket="my-bucket",
    region="us-east-1"
)
client = DataClient(object_storage_settings=aws_config)

Google Cloud Storage:

gcs_config = ObjectStorageSettings(
    endpoint="storage.googleapis.com",
    access_key="YOUR_HMAC_ACCESS_KEY",
    secret_key="YOUR_HMAC_SECRET",
    secure=True,
    default_bucket="my-gcs-bucket"
)

Direct Object Storage DataPoints

from madsci.common.types.datapoint_types import DataPoint, DataPointTypeEnum

storage_dp = DataPoint(
    label="Large Dataset",
    data_type=DataPointTypeEnum.OBJECT_STORAGE,
    path="/path/to/data.parquet",
    bucket_name="my-bucket",
    object_name="datasets/data.parquet",
    custom_metadata={"version": "v2.1"}
)
uploaded = client.submit_datapoint(storage_dp)

Authentication: Use IAM users/service accounts with appropriate storage permissions. See cloud provider documentation for detailed setup.

Database Migration Tools

MADSci Data Manager includes automated MongoDB migration tools that handle schema changes and version tracking for the data management system.

Features

Usage

Standard Usage

# Run migration for data database (auto-detects schema file)
python -m madsci.common.mongodb_migration_tool --database madsci_data

# Migrate with explicit database URL
python -m madsci.common.mongodb_migration_tool --db-url mongodb://localhost:27017 --database madsci_data

# Use custom schema file
python -m madsci.common.mongodb_migration_tool --database madsci_data --schema-file /path/to/schema.json

# Create backup only
python -m madsci.common.mongodb_migration_tool --database madsci_data --backup-only

# Restore from backup
python -m madsci.common.mongodb_migration_tool --database madsci_data --restore-from /path/to/backup

# Check version compatibility without migrating
python -m madsci.common.mongodb_migration_tool --database madsci_data --check-version

Docker Usage

When running in Docker containers, use docker-compose to execute migration commands:

# Run migration for data database in Docker
docker-compose run --rm data-manager python -m madsci.common.mongodb_migration_tool --db-url 'mongodb://mongodb:27017' --database 'madsci_data' --schema-file '/app/madsci/data_manager/schema.json'

# Create backup only in Docker
docker-compose run --rm data-manager python -m madsci.common.mongodb_migration_tool --db-url 'mongodb://mongodb:27017' --database 'madsci_data' --schema-file '/app/madsci/data_manager/schema.json' --backup-only

# Check version compatibility in Docker
docker-compose run --rm data-manager python -m madsci.common.mongodb_migration_tool --db-url 'mongodb://mongodb:27017' --database 'madsci_data' --schema-file '/app/madsci/data_manager/schema.json' --check-version

Server Integration

The Data Manager server automatically checks for version compatibility on startup. If a mismatch is detected, the server will refuse to start and display migration instructions:

DATABASE INITIALIZATION REQUIRED! SERVER STARTUP ABORTED!
The database exists but needs version tracking setup.
To resolve this issue, run the migration tool and restart the server.

Schema File Location

The migration tool automatically searches for schema files in:

Backup Location

Backups are stored in .madsci/mongodb/backups/ with timestamped filenames:

Requirements