# MADSci Agents Guidelines

This file provides guidance to coding agents when working with code in this repository.

## Repository Overview

MADSci (Modular Autonomous Discovery for Science) is a Python-based framework for scientific laboratory automation and experimentation. The codebase is organized as a monorepo with multiple packages under the `src/` directory, each providing different components of the MADSci ecosystem.

## Key Concepts

### ULID (Universally Unique Lexicographically Sortable Identifier)
MADSci uses **ULID** for all ID generation throughout the system instead of traditional UUIDs. ULIDs provide:
- Better performance than UUIDs
- Lexicographical sorting capability
- Timestamp-based ordering
- Universally unique identification

When generating new IDs in MADSci code, always use `new_ulid_str()` from `madsci.common.utils`.

## Architecture

The system follows a microservices architecture with the following main components:

### Core Components
- **madsci_common**: Shared types, utilities, and base classes used across all components
- **madsci_client**: Client libraries for interacting with MADSci services
- **madsci_squid**: Central lab configuration manager and dashboard provider (Lab Manager)
- **madsci_node_module**: Framework for creating laboratory instrument nodes
- **madsci_experiment_application**: Provides a class for managing automated and autonomous experiments using MADSci-powered labs.

### Manager Services
- **madsci_event_manager**: Distributed event logging and querying (Port 8001)
- **madsci_experiment_manager**: Experimental runs and campaigns management (Port 8002)
- **madsci_resource_manager**: Laboratory resource and inventory tracking (Port 8003)
- **madsci_data_manager**: Data capture, storage, and querying (Port 8004)
- **madsci_workcell_manager**: Workflow coordination and scheduling (Port 8005)
- **madsci_location_manager**: Laboratory location management, resource attachments, and node-specific references (Port 8006)

### Frontend
- **ui/**: Vue 3 + Vuetify dashboard for lab management and monitoring

## Development Commands

This project uses PDM for Python dependency management and `just` for task running.

**When to use PDM vs pip:**
- **PDM**: Use for development work on the MADSci codebase itself (modifying source code, running tests, contributing)
- **pip**: Use for installing released MADSci packages in your own projects or when using MADSci components as dependencies

### Setup
```bash
just init                    # Initialize project and install all dependencies
```

### Testing
```bash
pytest                       # Run all tests
just test                    # Alternative way to run tests
just coverage                # Run tests with coverage report
just coverage-html           # Generate HTML coverage report
just coverage-xml            # Generate XML coverage report (for CI)
```

### Code Quality
```bash
just checks                  # Run pre-commit checks (ruff, formatting, etc.)
ruff check                   # Run linter manually
ruff format                  # Format code manually
```

### Docker & Services
```bash
just build                   # Build docker images
just up                      # Start example lab services
just down                    # Stop services and remove containers
```

### Frontend Development
```bash
cd ui/
yarn dev                     # Start Vue development server
yarn build                   # Build for production
```

### Python Package Management
```bash
pdm install                  # Install default dependencies
pdm install -G:all           # Install all dependency groups
pdm build                    # Build python packages
```

## Configuration

Configuration uses a dual-layer approach with a hierarchical precedence system:

1. **`settings.yaml`** — Default configuration values, version-controlled, self-documenting with comments. Uses prefixed keys (e.g., `event_server_url`, `workcell_database_name`).
2. **`.env`** — Secrets and environment-specific overrides only, gitignored. Uses prefixed env var keys (e.g., `RESOURCE_DB_URL`).
3. **Environment variables** — Override both files; highest precedence.

See `docs/Configuration.md` for comprehensive configuration options. Note that this file is automatically generated, and shouldn't be edited directly.

### Prefixed Alias System

Each manager settings class uses `prefixed_alias_generator()` from `base_types.py` to support prefixed keys in YAML:
- Code uses unprefixed field names: `server_url`, `database_name`
- YAML/exported settings use prefixed keys: `event_server_url`, `event_database_name`
- Env vars use the env prefix: `EVENT_SERVER_URL`, `EVENT_DATABASE_NAME`
- `model_dump(by_alias=True)` produces prefixed keys (for shared YAML export)
- `model_dump()` produces unprefixed field names (for internal use)
- Fields with explicit `validation_alias` or `alias` are not affected by the generator

Key configuration patterns:
- Each manager has its own settings class with environment prefix (e.g., `WORKCELL_`, `EVENT_`, `LOCATION_`)
- Server URLs default to localhost with specific ports
- Database connections default to MongoDB/PostgreSQL on localhost
- File storage paths default to `~/.madsci/` subdirectories

### Settings Directory (Walk-Up Discovery)

Walk-up file discovery is **always active**, starting from the current working directory by default. Each filename in `yaml_file`, `json_file`, `toml_file`, and `env_file` tuples is searched independently by walking up the directory tree, so `settings.yaml` can resolve to a shared lab root while `node.settings.yaml` resolves in the node-specific directory.

**Starting directory override:**
- `_settings_dir` keyword argument: `MySettings(_settings_dir="/opt/my-lab/nodes/arm")`
- `MADSCI_SETTINGS_DIR` environment variable
- `--settings-dir` CLI option on `madsci start` and `madsci config export`

**Walk-up boundaries** — walk-up stops at the first boundary encountered:
- A `.madsci/` directory (project root sentinel). The directory containing `.madsci/` is searched, but parents above it are not.
- A `.git/` directory (secondary project root boundary). The directory containing `.git/` is searched, but parents above it are not.
- The user's home directory (`Path.home()`). The home directory itself is searched, but parents above it are not.
- The filesystem root.
- The `max_levels` limit (default: 10).

```
/opt/my-lab/                     # Shared config found via walk-up
├── .madsci/                     # Sentinel — walk-up stops here
├── settings.yaml                # node_name, lab URLs, etc.
├── .env                         # Shared secrets
└── nodes/robot-arm/             # _settings_dir or CWD points here
    ├── node.settings.yaml       # Node-specific settings
    └── .env                     # Node-specific secrets
```

**Precedence** is unchanged: CLI args > init kwargs > env vars > .env > file secrets > JSON > TOML > YAML. Walk-up only affects *where* the files are found, not their priority.

### `.madsci/` Sentry Directory (`sentry.py`)

All `.madsci/` directory path resolution is centralized in `madsci.common.sentry`. This module is the canonical place for resolving where PIDs, logs, backups, and other runtime state should be stored.

**Resolution algorithm** (`find_madsci_dir`):
1. Walk up from start directory looking for an existing `.madsci/` directory
2. If none found, look for `.git/` as a secondary boundary and return `{git_parent}/.madsci/`
3. Fall back to `~/.madsci/`

**Key functions:**
- `find_madsci_dir(start_dir, auto_create)` — canonical resolution
- `get_madsci_subdir(name, start_dir, create)` — get a subdirectory within `.madsci/`
- `get_global_madsci_subdir(name, create)` — always uses `~/.madsci/` (for user-level resources like templates)
- `ensure_madsci_dir(path)` — scaffold a `.madsci/` directory with standard subdirs

When adding new code that reads/writes from `.madsci/`, always use `sentry.py` functions instead of constructing paths manually.

## Development Patterns

### Manager Implementation
Each manager service follows this pattern:
1. Settings class inheriting from `MadsciBaseSettings`
2. Server class inheriting from `AbstractManagerBase` with FastAPI endpoints
3. Client class for programmatic interaction
4. Database models (SQLModel for PostgreSQL, Pydantic for MongoDB)

The `AbstractManagerBase` class provides:
- Common functionality for all managers (settings, logging, CORS middleware)
- Standard endpoints (settings, health)
- FastAPI app configuration and server lifecycle management
- Generic typing for settings class (`AbstractManagerBase[SettingsT]`)

### Type System
- All types are defined in `madsci_common/types/`
- Uses Pydantic v2 for data validation and serialization
- SQLModel for database ORM with PostgreSQL
- Enum types for status and state management

### ID Generation
- **ULID (Universally Unique Lexicographically Sortable Identifier)** is used for all IDs throughout MADSci
- ULIDs provide better performance than UUIDs while maintaining uniqueness and sortability
- When generating new IDs, use `new_ulid_str()` from `madsci.common.utils`
- Example usage: `resource_id = new_ulid_str()`

### Database Patterns

MADSci uses two database systems optimized for different use cases:

#### Database Types
- **PostgreSQL**: Used by Resource Manager for relational data with strict schemas
- **MongoDB**: Used by Event, Data, Experiment, and Workcell Managers for flexible document storage

#### Backup and Restore

All backup tools are centralized in `madsci_common` for maximum reusability:

```python
# PostgreSQL backups
from madsci.common.backup_tools import PostgreSQLBackupTool
from madsci.common.types.backup_types import PostgreSQLBackupSettings

settings = PostgreSQLBackupSettings(
    db_url="postgresql://localhost/resources",
    backup_dir=Path("./backups"),
    max_backups=10,
    validate_integrity=True
)
backup_tool = PostgreSQLBackupTool(settings)
backup_path = backup_tool.create_backup("pre_migration")

# MongoDB backups
from madsci.common.backup_tools import MongoDBBackupTool
from madsci.common.types.backup_types import MongoDBBackupSettings

settings = MongoDBBackupSettings(
    mongo_db_url=AnyUrl("mongodb://localhost:27017"),
    database="events",
    backup_dir=Path("./backups"),
    max_backups=10
)
backup_tool = MongoDBBackupTool(settings)
backup_path = backup_tool.create_backup("hourly")
```

**CLI Usage:**
```bash
# Unified CLI (auto-detects database type)
madsci-backup create --db-url postgresql://localhost/resources
madsci-backup create --db-url mongodb://localhost:27017/events

# Database-specific CLIs
madsci-postgres-backup create --db-url postgresql://localhost/resources
madsci-mongodb-backup create --mongo-url mongodb://localhost:27017 --database events
```

#### Database Connections

**PostgreSQL** (using SQLModel):
```python
from sqlmodel import Session, create_engine

engine = create_engine(
    db_url,
    pool_size=20,        # Connection pool size
    pool_pre_ping=True   # Verify connections before use
)

with Session(engine) as session:
    # Perform operations
    session.commit()
```

**MongoDB** (using pymongo):
```python
from pymongo import MongoClient

with MongoClient(mongo_url) as client:
    db = client[database_name]
    collection = db[collection_name]
    # Perform operations
```

#### Database Migrations

**PostgreSQL migrations** (Resource Manager):
- Uses Alembic for schema version management
- Automatic backups before migrations
- Auto-restore on migration failure
```bash
python -m madsci.resource_manager.migration_tool --db-url postgresql://localhost/resources
```

**MongoDB migrations** (per manager):
- Handle index creation and schema validation
- Manager-specific migration tools
- Automatic pre-migration backups

#### Best Practices

1. **Always use ULID for IDs**: `resource_id = new_ulid_str()`
2. **Backup before migrations**: Automatic with migration tools
3. **Use connection pooling**: Configure appropriate pool sizes
4. **Environment variables for config**: Never hardcode connection strings
5. **Validate backups**: Use `validate_backup_integrity()` for critical backups
6. **Test migrations first**: Always test in development before production


### Node Development
Laboratory instruments implement the Node interface:
1. Inherit from `AbstractNodeModule`
2. Implement required action methods
3. Define node configuration in YAML
4. Use REST endpoints for communication

### Logging and Context Management

MADSci provides a hierarchical logging context system for structured logging across components.

#### EventClient Context System
Use the context system for hierarchical logging that automatically propagates context:

```python
from madsci.common.context import event_client_context, get_event_client

# Establish context at entry points (scripts, CLI commands, experiment runs)
with event_client_context(name="my_operation", experiment_id="exp-123") as logger:
    logger.info("Starting operation")

    # Nested context adds more metadata
    with event_client_context(name="substep", step_id="step-1") as step_logger:
        step_logger.info("Executing substep")  # Includes both experiment_id and step_id

# In library code, use get_event_client() to inherit context
def utility_function():
    logger = get_event_client()  # Uses context if available, creates new if not
    logger.info("Utility running")
```

#### Structured Logging Best Practices
```python
# Good: Structured logging with kwargs
logger.info(
    "Workflow step completed",
    event_type=EventType.WORKFLOW_STEP_COMPLETE,
    workflow_id=workflow.workflow_id,
    step_index=step_index,
    duration_ms=elapsed_ms,
)

# Bad: F-string formatting (data not queryable)
logger.info(f"Workflow {workflow_id} step {step_index} completed in {elapsed_ms}ms")
```

#### Context Decorators
Decorators are available for functions and classes:

```python
from madsci.common.context import with_event_client, event_client_class

@with_event_client(name="my_workflow", workflow_id="wf-123")
def my_workflow(event_client: EventClient = None):
    event_client.info("Running workflow")

@event_client_class(component_type="processor")
class DataProcessor:
    def process(self, data):
        self.event_client.info("Processing", data_size=len(data))
```

See [docs/guides/logging.md](docs/guides/logging.md) for comprehensive logging documentation.

### OpenTelemetry Integration

MADSci includes OpenTelemetry support for distributed tracing, metrics, and log correlation.

#### Configuration
Enable OTEL per-manager via environment variables:
```bash
EVENT_OTEL_ENABLED=true
EVENT_OTEL_SERVICE_NAME="madsci.event"
EVENT_OTEL_EXPORTER="otlp"
EVENT_OTEL_ENDPOINT="http://localhost:4317"
EVENT_OTEL_PROTOCOL="grpc"
```

#### Using Tracing in Code
```python
from madsci.common.otel import span_context, with_span, traced_class

# Context manager for spans
with span_context("process_data", attributes={"data.size": 100}) as span:
    result = process(data)
    span.set_attribute("result.count", len(result))

# Decorator for functions
@with_span(name="fetch_user")
def get_user(user_id: str):
    return api.fetch(user_id)

# Class decorator for automatic method tracing
@traced_class(attributes={"component": "data_processor"})
class DataProcessor:
    def process(self, data):
        return transform(data)
```

See [docs/guides/observability.md](docs/guides/observability.md) for the full observability stack setup.

### Ownership Context

MADSci tracks ownership metadata (user, experiment, workflow, etc.) throughout the system:

```python
from madsci.common.ownership import ownership_context, get_current_ownership_info

with ownership_context(experiment_id="exp-123", workflow_id="wf-456") as info:
    # All operations within this context include ownership metadata
    print(info.experiment_id)  # "exp-123"
```

### Testing
- Uses pytest with in-memory database handlers for most tests (no Docker required)
- Docker is only needed for end-to-end tests against the full service stack
- Database handler abstractions (`db_handlers/`) provide injectable in-memory implementations for all database backends
- Component tests are located in each package's `tests/` directory
- **IMPORTANT**: Use `pytest` directly instead of `python -m pytest` for running tests

## File Structure Conventions

```
src/madsci_*/
├── madsci/package_name/     # Python package code
│   └── *_server.py           # FastAPI server (for managers)
├── tests/                  # Package-specific tests
├── README.md               # Package documentation
└── pyproject.toml          # Package dependencies
src/madsci_client/
└── madsci/client/*_client.py     # Client implementation
src/madsci_common/
└── madsci/common/types/*_types.py     # Pydantic Data Models and Enums
```

## Important Notes

- Python 3.10+ required
- Docker required for running services and end-to-end tests (most unit/integration tests use in-memory handlers)
- Pre-commit hooks enforce code quality standards
- The project is currently in beta with potential breaking changes
- Each package can be used independently or composed together
- Use PDM virtual environments for development isolation
- **IMPORTANT**: if you try to run python commands and see missing modules, ensure that the correct virtual environment is activated.
- **IMPORTANT**: Use `yarn` for managing Node.js dependencies in the `ui/` directory, not npm
- Try using ruff's autofix/autoformatting before attempting to manually fix linter errors, especially related to things like whitespace.
- Always use pydantic's `AnyUrl` to store URL's, and note that AnyUrl always ensures a trailing forward slash
- Imports should generally be done at the top of the file, unless there are circular dependencies or other factors which require localized importing.
- **IMPORTANT**: do not use noqa's or modify the configuration of linters or checks to bypass linter errors without the users _EXPLICIT_ permission.
- **IMPORTANT**: Use `./.scratch/` for any temporary files, test outputs, or scratch work. Do NOT use `/tmp` or other system-level temporary directories.