This guide is for the Lab Operator persona - those who run and maintain MADSci-powered self-driving laboratories.
Guide Contents¶
Daily Operations - Starting, monitoring, and stopping the lab
Monitoring & Health Checks - Using TUI, CLI, and observability tools
Backup & Recovery - Database backups and disaster recovery
Troubleshooting - Common issues and solutions
Updates & Maintenance - Upgrading MADSci and dependencies
Who is a Lab Operator?¶
A Lab Operator:
Starts and stops the lab services
Monitors system health and status
Performs backups and maintenance
Troubleshoots issues when they arise
Coordinates with Equipment Integrators and Experimentalists
Quick Reference¶
Starting the Lab¶
# Start all services (recommended)
cd my_lab
madsci start -d
# Alternative: using Docker Compose directly
docker compose up -d
# Start a single manager
madsci start manager event -d
# Start a node
madsci start node ./my_node.py -d
# Verify everything is running
madsci status
# Watch logs
madsci logs --followChecking Health¶
# Quick status check
madsci status
# Detailed diagnostics
madsci doctor
# Launch TUI for monitoring
madsci tuiStopping the Lab¶
# Stop all services (recommended)
madsci stop
# Stop a specific manager or node
madsci stop manager event
madsci stop node my_node
# Alternative: using Docker Compose directly
docker compose stop # Graceful stop (preserves data)
docker compose down # Stop and remove containers
docker compose down -v # Full cleanup (WARNING: deletes data)Backups¶
# Quick backup
madsci-backup create --db-url mongodb://localhost:27017 --output ./backups
# Full backup with verification
madsci-backup create --db-url mongodb://localhost:27017 --output ./backups --validateViewing Logs¶
# All services
madsci logs --follow
# Specific service
madsci logs workcell_manager --tail 100
# Filter by level
madsci logs --level error --since 1hKey Concepts¶
Service Types¶
| Type | Examples | Purpose |
|---|---|---|
| Infrastructure | MongoDB, PostgreSQL, Redis, MinIO | Data storage |
| Managers | Event, Experiment, Resource, Workcell | Coordination |
| Nodes | temp_sensor, robot_arm | Instruments |
Ports Reference¶
| Service | Port | URL |
|---|---|---|
| Lab Manager | 8000 | http:// |
| Event Manager | 8001 | http:// |
| Experiment Manager | 8002 | http:// |
| Resource Manager | 8003 | http:// |
| Data Manager | 8004 | http:// |
| Workcell Manager | 8005 | http:// |
| Location Manager | 8006 | http:// |
| MongoDB | 27017 | mongodb://localhost:27017 |
| PostgreSQL | 5432 | postgresql://localhost:5432 |
| Redis | 6379 | redis://localhost:6379 |
| MinIO | 9000/9001 | http:// |
Health Check Endpoints¶
All managers expose:
GET /health- Basic health checkGET /info- Service informationGET /status- Detailed status
Log Levels¶
| Level | Meaning |
|---|---|
DEBUG | Detailed diagnostic information |
INFO | General operational events |
WARNING | Something unexpected but not critical |
ERROR | Something failed |
CRITICAL | System is in a critical state |
Common Tasks¶
Restarting a Single Service¶
# Restart workcell manager
docker compose restart workcell_manager
# View its logs
docker compose logs -f workcell_managerChecking Why a Service Failed¶
# View recent logs
docker compose logs --tail 50 <service_name>
# Check container status
docker compose ps
# Inspect container
docker inspect <container_id>Checking Port Usage¶
# See what's using ports
lsof -i :8000-8006
# Or with netstat
netstat -tuln | grep -E '800[0-6]'Emergency Shutdown¶
# If compose is unresponsive, stop all containers
docker stop $(docker ps -q --filter "network=madsci")
# Force stop if needed
docker kill $(docker ps -q --filter "network=madsci")Prerequisites¶
Docker and Docker Compose installed
Access to the lab’s
compose.yamland.envfilesUnderstanding of basic Docker commands
SSH access to the lab server (if remote)