This notebook demonstrates best practices for backing up and migrating MADSci databases.
Table of Contents¶
Prerequisites¶
Before running this notebook:
Start the MADSci databases:
docker compose up -d mongodb postgresEnsure you’re in the MADSci root directory
Have
madsci-commoninstalled with backup dependencies
1. Setup and Prerequisites ¶
First, let’s import the necessary libraries and check our environment.
import sys
from pathlib import Path
from datetime import datetime
from pydantic import AnyUrl
# Import MADSci backup tools
from madsci.common.backup_tools import PostgreSQLBackupTool, MongoDBBackupTool
from madsci.common.types.backup_types import (
PostgreSQLBackupSettings,
MongoDBBackupSettings,
)
print("✓ MADSci backup tools imported successfully")# Configuration
BACKUP_BASE_DIR = Path("../../backups/notebook_examples")
BACKUP_BASE_DIR.mkdir(parents=True, exist_ok=True)
# Database connection strings
POSTGRES_URL = "postgresql://madsci:madsci@localhost:5432/resources"
MONGODB_URL = AnyUrl("mongodb://localhost:27017")
print(f"Backup directory: {BACKUP_BASE_DIR.absolute()}")
print(f"PostgreSQL URL: {POSTGRES_URL}")
print(f"MongoDB URL: {MONGODB_URL}")2. Database Connectivity ¶
Before backing up, let’s verify we can connect to all databases.
# Test PostgreSQL connection
from sqlmodel import create_engine, text
try:
engine = create_engine(POSTGRES_URL, pool_pre_ping=True)
with engine.connect() as conn:
result = conn.execute(text("SELECT version()"))
version = result.fetchone()[0]
result = conn.execute(text("SELECT pg_size_pretty(pg_database_size(current_database()))"))
size = result.fetchone()[0]
print("✓ PostgreSQL Connection Successful")
print(f" Version: {version.split(' ')[1]}")
print(f" Database Size: {size}")
except Exception as e:
print(f"✗ PostgreSQL Connection Failed: {e}")# Test MongoDB connection
from pymongo import MongoClient
databases = ["events", "data", "experiments", "workflows"]
for db_name in databases:
try:
client = MongoClient(str(MONGODB_URL), serverSelectionTimeoutMS=5000)
client.admin.command("ping")
db = client[db_name]
stats = db.command("dbStats")
print(f"✓ MongoDB '{db_name}' Connection Successful")
print(f" Collections: {stats['collections']}")
print(f" Size: {stats['dataSize'] / (1024 * 1024):.2f} MB")
client.close()
except Exception as e:
print(f"✗ MongoDB '{db_name}' Connection Failed: {e}")3. Creating Backups ¶
Now let’s create backups of our databases using both programmatic and CLI approaches.
3.1 PostgreSQL Backup (Programmatic)¶
# Configure PostgreSQL backup settings
pg_settings = PostgreSQLBackupSettings(
db_url=POSTGRES_URL,
backup_dir=BACKUP_BASE_DIR / "postgres",
max_backups=5, # Keep last 5 backups
validate_integrity=True, # Validate with SHA256
backup_format="custom", # Compressed custom format
)
print("PostgreSQL Backup Configuration:")
print(f" Backup directory: {pg_settings.backup_dir}")
print(f" Max backups: {pg_settings.max_backups}")
print(f" Validation: {pg_settings.validate_integrity}")
print(f" Format: {pg_settings.backup_format}")# Create PostgreSQL backup tool
pg_backup_tool = PostgreSQLBackupTool(pg_settings)
# Create backup
print("Creating PostgreSQL backup...")
try:
backup_path = pg_backup_tool.create_backup("notebook_demo")
print(f"✓ Backup created successfully: {backup_path}")
print(f" Size: {backup_path.stat().st_size / 1024:.2f} KB")
except Exception as e:
print(f"✗ Backup failed: {e}")3.2 MongoDB Backup (Programmatic)¶
# Configure MongoDB backup settings
mongo_settings = MongoDBBackupSettings(
mongo_db_url=MONGODB_URL,
database="madsci_events",
backup_dir=BACKUP_BASE_DIR / "mongodb" / "madsci_events",
max_backups=5,
validate_integrity=True,
collections=None, # Backup all collections
)
print("MongoDB Backup Configuration:")
print(f" Database: {mongo_settings.database}")
print(f" Backup directory: {mongo_settings.backup_dir}")
print(f" Collections: {'All' if mongo_settings.collections is None else ', '.join(mongo_settings.collections)}")# Create MongoDB backup tool
mongo_backup_tool = MongoDBBackupTool(mongo_settings)
# Create backup
print("Creating MongoDB backup...")
try:
backup_path = mongo_backup_tool.create_backup("notebook_demo")
print(f"✓ Backup created successfully: {backup_path}")
# Get backup size
import subprocess
size_output = subprocess.check_output(["du", "-sh", str(backup_path)]).decode()
print(f" Size: {size_output.split()[0]}")
except FileNotFoundError:
print("⚠️ mongodump not found. See troubleshooting section.")
except Exception as e:
print(f"✗ Backup failed: {e}")3.3 Using CLI Commands¶
Alternatively, you can use CLI commands for quick backups:
# PostgreSQL CLI backup
!madsci-backup create --db-url postgresql://madsci:madsci@localhost:5432/resources \
--backup-dir ../../backups/cli_examples --name cli_demo 2>&1 | tail -20# MongoDB CLI backup
!madsci-backup create --db-url mongodb://localhost:27017/madsci_events \
--backup-dir ../../backups/cli_examples --name cli_demo 2>&1 | tail -204. Validating Backups ¶
Always validate backups to ensure they can be restored successfully.
# List available PostgreSQL backups
print("Available PostgreSQL Backups:")
print("=" * 60)
backups = pg_backup_tool.list_available_backups()
for i, backup in enumerate(backups, 1):
print(f"\n{i}. {backup.backup_path.name}")
print(f" Created: {backup.created_at}")
print(f" Size: {backup.backup_size / 1024:.2f} KB")
print(f" Checksum: {backup.checksum[:16]}...")
print(f" Valid: {'✓' if backup.is_valid else '✗'}")
if not backups:
print("No backups found")# Validate specific backup
if backups:
latest_backup = backups[0]
print(f"Validating backup: {latest_backup.backup_path.name}")
is_valid = pg_backup_tool.validate_backup_integrity(latest_backup.backup_path)
if is_valid:
print("✓ Backup integrity verified!")
print(" - SHA256 checksum matches")
print(" - Metadata is valid")
print(" - File is not corrupted")
else:
print("✗ Backup validation failed!")
print(" - Do not use this backup for restore")
print(" - Create a new backup")4.1 Batch Validation¶
Validate all backups at once:
print("Validating All PostgreSQL Backups:")
print("=" * 60)
valid_count = 0
invalid_count = 0
for backup in backups:
is_valid = pg_backup_tool.validate_backup_integrity(backup.backup_path)
status = "✓ Valid" if is_valid else "✗ Invalid"
print(f"{status}: {backup.backup_path.name}")
if is_valid:
valid_count += 1
else:
invalid_count += 1
print("\nSummary:")
print(f" Valid backups: {valid_count}")
print(f" Invalid backups: {invalid_count}")
if invalid_count > 0:
print("\n⚠️ Warning: Some backups are invalid. Delete and recreate them.")5. Database Migrations ¶
Database migrations handle schema changes and version upgrades.
5.1 PostgreSQL Migrations with Alembic¶
The Resource Manager uses Alembic for PostgreSQL migrations with automatic backups:
# Check current database version
from sqlmodel import create_engine, text
engine = create_engine(POSTGRES_URL)
try:
with engine.connect() as conn:
# Check if schema_version table exists
result = conn.execute(text("""
SELECT EXISTS (
SELECT FROM information_schema.tables
WHERE table_name = 'madsci_schema_version'
)
"""))
table_exists = result.fetchone()[0]
if table_exists:
result = conn.execute(text("SELECT version, applied_at FROM madsci_schema_version ORDER BY applied_at DESC LIMIT 1"))
row = result.fetchone()
if row:
print(f"Current Schema Version: {row[0]}")
print(f"Last Updated: {row[1]}")
else:
print("Schema version table exists but is empty")
else:
print("Schema version table does not exist (fresh database)")
except Exception as e:
print(f"Error checking schema version: {e}")5.2 Running Migrations¶
Migrations should be run from the command line with proper error handling:
Running PostgreSQL Migrations¶
From MADSci root directory:
python -m madsci.resource_manager.migration_tool --db-url postgresql://madsci:madsci@localhost:5432/resourcesThis will:
Check current schema version
Create automatic backup (pre_migration)
Apply Alembic migrations
Update schema version
Auto-restore if migration fails
⚠️ Always backup before running migrations in production!
5.3 Pre-Migration Checklist¶
import subprocess
print("Pre-Migration Checklist")
print("=" * 60)
# 1. Check database connectivity
try:
engine = create_engine(POSTGRES_URL)
with engine.connect() as conn:
conn.execute(text("SELECT 1"))
print("✓ 1. Database connectivity verified")
except Exception as e:
print(f"✗ 1. Database connectivity failed: {e}")
# 2. Check available backups
backups = pg_backup_tool.list_available_backups()
if backups and backups[0].is_valid:
latest = backups[0]
time_diff = datetime.now() - latest.created_at
print(f"✓ 2. Recent valid backup exists ({time_diff.seconds // 60} minutes old)")
else:
print("✗ 2. No recent valid backup found - create one before migrating!")
# 3. Check disk space
try:
import shutil
stats = shutil.disk_usage(BACKUP_BASE_DIR)
free_gb = stats.free / (1024**3)
if free_gb > 1.0:
print(f"✓ 3. Sufficient disk space available ({free_gb:.2f} GB free)")
else:
print(f"⚠️ 3. Low disk space: {free_gb:.2f} GB free")
except Exception as e:
print(f"✗ 3. Could not check disk space: {e}")
# 4. Check for active connections
try:
with engine.connect() as conn:
result = conn.execute(text("""
SELECT count(*)
FROM pg_stat_activity
WHERE datname = current_database()
AND pid != pg_backend_pid()
"""))
active_conns = result.fetchone()[0]
if active_conns == 0:
print("✓ 4. No active database connections (safe to migrate)")
else:
print(f"⚠️ 4. {active_conns} active connections detected")
except Exception as e:
print(f"✗ 4. Could not check active connections: {e}")
print("\n" + "=" * 60)
print("Migration readiness: Check all items above before proceeding")6. Restore Procedures ¶
Learn how to restore from backups in case of issues.
6.1 PostgreSQL Restore (Programmatic)¶
⚠️ Warning: This will overwrite the current database!
PostgreSQL Restore Procedure:
1. List available backups:
The code above shows available backups with their creation dates and validation status.
2. Choose a backup and validate it:
is_valid = pg_backup_tool.validate_backup_integrity(backup_path)3. Restore from backup:
⚠️ Warning: This will overwrite the current database!
pg_backup_tool.restore_from_backup(backup_path)4. Verify restoration:
Test database connectivity and data integrity after restoration.
To actually restore, run:
if backups and input('Type YES to restore: ') == 'YES':
pg_backup_tool.restore_from_backup(backups[0].backup_path)6.2 MongoDB Restore¶
MongoDB Restore Procedure:
1. List available backups:
The code above shows available MongoDB backups.
2. Restore to original database:
mongo_backup_tool.restore_from_backup(backup_path)3. Or restore to different database (for testing):
mongo_backup_tool.restore_from_backup(backup_path, target_database='events_restored')⚠️ Always test restore to a separate database first!
6.3 Testing Restore Procedures¶
Best practice: Regularly test your restore procedures!
Restore Testing Workflow:
1. Create a test database:
docker compose exec -T postgres psql -U madsci -c 'CREATE DATABASE resources_test;'2. Restore backup to test database:
pg_backup_tool.restore_from_backup(backup_path, target_database='resources_test')3. Verify data integrity:
Connect to test database and run verification queries
Check row counts, verify critical data exists
4. Clean up test database:
docker compose exec -T postgres psql -U madsci -c 'DROP DATABASE resources_test;'✓ If test restore succeeds, your backups are reliable!
7. Best Practices ¶
Follow these best practices for reliable database operations.
7.1 Backup Best Practices¶
Backup Best Practices¶
Frequency
Development: Daily backups
Production: Hourly backups + daily off-site copies
Before migrations: Always create a backup
Retention
Keep last 7 days of hourly backups
Keep last 30 days of daily backups
Keep monthly backups for 1 year
Validation
Validate backups immediately after creation
Test restore procedures weekly
Monitor backup success/failure
Storage
Store backups on separate disk from database
Copy critical backups to off-site location
Automation
Use cron jobs or systemd timers
Alert on backup failures
Monitor disk space
Documentation
Document backup/restore procedures
Keep restore instructions accessible
Test procedures with team members
7.2 Migration Best Practices¶
Migration Best Practices¶
Pre-Migration
Test migration in development environment first
Create verified backup before migration
Review migration scripts
Plan maintenance window
Notify team members
During Migration
Stop application services
Monitor migration progress
Keep logs of migration process
Be ready to rollback if needed
Post-Migration
Verify schema changes applied correctly
Test application functionality
Check data integrity
Monitor for errors
Create post-migration backup
Rollback Plan
Know how to restore from pre-migration backup
Have rollback procedure documented
Test rollback in development
Keep backup accessible during migration
8. Troubleshooting ¶
Common issues and their solutions.
8.1 Common Issues¶
Common Issues and Solutions¶
1. Version Mismatch Error¶
Error:
pg_dump: error: server version: 17.7; pg_dump version: 14.19
pg_dump: error: aborting because of server version mismatchSolution:
Run backup commands inside Docker containers:
docker compose exec -T postgres pg_dump -U madsci -d resources -Fc > backup.dumpOr install matching pg_dump version on host
2. Tool Not Found¶
Error:
mongodump: command not foundSolution:
Install MongoDB Database Tools:
# macOS brew install mongodb-database-tools # Ubuntu/Debian sudo apt-get install mongodb-database-toolsOr use Docker-based backups
3. Permission Denied¶
Error:
Permission denied: backup.dumpSolution:
mkdir -p backups
chmod 755 backups4. Connection Refused¶
Error:
connection to server at "localhost", port 5432 failedSolution:
# Check if databases are running
docker compose ps postgres mongodb
# Start databases if needed
docker compose up -d postgres mongodb5. Backup Validation Failed¶
Error:
Backup validation failed: checksum mismatchSolution:
Backup may be corrupted
Delete corrupted backup
Create new backup
Check disk for errors
Summary¶
This notebook covered:
✅ Testing database connectivity
✅ Creating backups (programmatic and CLI)
✅ Validating backup integrity
✅ Database migration procedures
✅ Restore workflows
✅ Best practices
✅ Troubleshooting common issues
Key Takeaways:
Always backup before migrations
Validate backups immediately after creation
Test restore procedures regularly
Use Docker-based backups for containerized databases
Automate backups with cron/systemd
Store backups securely and redundantly
Next Steps:
Set up automated backup schedule
Create backup retention policy
Document your specific backup/restore procedures
Test disaster recovery scenarios