Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Lab Operator Guide

This guide is for the Lab Operator persona - those who run and maintain MADSci-powered self-driving laboratories.

Guide Contents

  1. Daily Operations - Starting, monitoring, and stopping the lab

  2. Monitoring & Health Checks - Using TUI, CLI, and observability tools

  3. Backup & Recovery - Database backups and disaster recovery

  4. Troubleshooting - Common issues and solutions

  5. Updates & Maintenance - Upgrading MADSci and dependencies

Who is a Lab Operator?

A Lab Operator:

Quick Reference

Starting the Lab

# Start all services (recommended)
cd my_lab
madsci start -d

# Alternative: using Docker Compose directly
docker compose up -d

# Start a single manager
madsci start manager event -d

# Start a node
madsci start node ./my_node.py -d

# Verify everything is running
madsci status

# Watch logs
madsci logs --follow

Checking Health

# Quick status check
madsci status

# Detailed diagnostics
madsci doctor

# Launch TUI for monitoring
madsci tui

Stopping the Lab

# Stop all services (recommended)
madsci stop

# Stop a specific manager or node
madsci stop manager event
madsci stop node my_node

# Alternative: using Docker Compose directly
docker compose stop              # Graceful stop (preserves data)
docker compose down              # Stop and remove containers
docker compose down -v           # Full cleanup (WARNING: deletes data)

Backups

# Quick backup
madsci-backup create --db-url mongodb://localhost:27017 --output ./backups

# Full backup with verification
madsci-backup create --db-url mongodb://localhost:27017 --output ./backups --validate

Viewing Logs

# All services
madsci logs --follow

# Specific service
madsci logs workcell_manager --tail 100

# Filter by level
madsci logs --level error --since 1h

Key Concepts

Service Types

TypeExamplesPurpose
InfrastructureMongoDB, PostgreSQL, Redis, MinIOData storage
ManagersEvent, Experiment, Resource, WorkcellCoordination
Nodestemp_sensor, robot_armInstruments

Ports Reference

ServicePortURL
Lab Manager8000http://localhost:8000
Event Manager8001http://localhost:8001
Experiment Manager8002http://localhost:8002
Resource Manager8003http://localhost:8003
Data Manager8004http://localhost:8004
Workcell Manager8005http://localhost:8005
Location Manager8006http://localhost:8006
MongoDB27017mongodb://localhost:27017
PostgreSQL5432postgresql://localhost:5432
Redis6379redis://localhost:6379
MinIO9000/9001http://localhost:9000

Health Check Endpoints

All managers expose:

Log Levels

LevelMeaning
DEBUGDetailed diagnostic information
INFOGeneral operational events
WARNINGSomething unexpected but not critical
ERRORSomething failed
CRITICALSystem is in a critical state

Common Tasks

Restarting a Single Service

# Restart workcell manager
docker compose restart workcell_manager

# View its logs
docker compose logs -f workcell_manager

Checking Why a Service Failed

# View recent logs
docker compose logs --tail 50 <service_name>

# Check container status
docker compose ps

# Inspect container
docker inspect <container_id>

Checking Port Usage

# See what's using ports
lsof -i :8000-8006

# Or with netstat
netstat -tuln | grep -E '800[0-6]'

Emergency Shutdown

# If compose is unresponsive, stop all containers
docker stop $(docker ps -q --filter "network=madsci")

# Force stop if needed
docker kill $(docker ps -q --filter "network=madsci")

Prerequisites