How to Recover from System Failures¶
Step-by-step recovery procedures for database loss, credential recovery, MinIO failures, and full system rebuilds.
Prerequisites¶
- Access to the server or new infrastructure
.env.prodfile (or knowledge of theJWT_SECRET_KEY)- Docker and Docker Compose installed
- Backup timestamp to restore from (if restoring data)
Scenario 1: Full System Recovery¶
Use when recovering to a new server or after complete data loss.
Step 1: Prepare Infrastructure¶
# Clone repository
git clone https://gitlab.example.com/qa/playwright-agent.git /opt/playwright-agent
cd /opt/playwright-agent
# Restore .env.prod from secure backup
cp /secure-backup/.env.prod .env.prod
# OR recreate from template
cp .env.prod.example .env.prod
# Edit with your values
Step 2: Start Core Services¶
# Start database and MinIO first
docker compose --env-file .env.prod -f docker-compose.prod.yml up -d db minio
# Wait for services to be healthy
docker compose --env-file .env.prod -f docker-compose.prod.yml ps
Step 3: Restore from Backup¶
# List available local backups
make restore-list
# Restore from local backup
make restore TS=20260115_143022
# Or restore from MinIO
make restore-from-minio TS=20260115_143022
Step 4: Apply Migrations¶
Step 5: Start Application¶
Scenario 2: Database Recovery Only¶
Use when the database is corrupted but files are intact.
Step 1: Stop the Backend¶
Step 2: Restore Database¶
docker compose --env-file .env.prod -f docker-compose.prod.yml run --rm --profile restore restore bash -c "
apk add --no-cache postgresql15-client
export PGPASSWORD=\$POSTGRES_PASSWORD
gunzip -c /backups/20260115_143022_db.sql.gz | psql -h db -U playwright -d playwright_agent
"
Step 3: Restart Services¶
docker compose --env-file .env.prod -f docker-compose.prod.yml up -d backend frontend
make health-check
Scenario 3: MinIO Storage Recovery¶
Use when MinIO data is lost but local backups exist.
# Stop MinIO
docker compose --env-file .env.prod -f docker-compose.prod.yml stop minio
# Remove corrupted volume
docker volume rm playwright-agent_minio_data
# Restart MinIO (fresh volume)
docker compose --env-file .env.prod -f docker-compose.prod.yml up -d minio
# Re-populate MinIO with a fresh backup
make backup-full
Scenario 4: Lost JWT_SECRET_KEY¶
Danger
All encrypted credentials (TestRail API keys, Jira tokens, dashboard-stored passwords) are unrecoverable without the original JWT_SECRET_KEY.
Step 1: Generate a New Key¶
Step 2: Update Configuration¶
Update JWT_SECRET_KEY in .env.prod with the new value.
Step 3: Restart and Re-enter Credentials¶
Users must re-enter all stored credentials through the dashboard Settings page.
Step 4: Prevent Future Loss¶
Back up .env.prod to a password manager or secure vault immediately.
Scenario 5: Lost Database Password¶
# Reset PostgreSQL password
docker compose --env-file .env.prod -f docker-compose.prod.yml exec db \
psql -U postgres -c "ALTER USER playwright WITH PASSWORD 'new-password';"
# Update .env.prod with new password
# Restart
make prod-restart
Scenario 6: Schema-Only Recovery¶
Use when you have a fresh database but no data backup. Alembic migrations recreate the schema from scratch.
# Stop backend
docker compose --env-file .env.prod -f docker-compose.prod.yml stop backend
# Apply all migrations
make db-upgrade
# Restart
docker compose --env-file .env.prod -f docker-compose.prod.yml start backend
# Create admin user (no users exist after schema-only recovery)
docker compose --env-file .env.prod -f docker-compose.prod.yml exec backend \
python orchestrator/scripts/create_admin.py \
--email admin@yourcompany.com \
--password YourSecurePassword
Note
Schema-only recovery restores the table structure, not data. Use Scenario 1 or 2 to restore data from a backup.
Post-Recovery Verification¶
After any recovery, verify these items:
1. Service Health¶
2. User Authentication¶
curl -X POST http://localhost:8001/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"admin@example.com","password":"your-password"}'
# Should return JWT token, not error
3. Credential Decryption¶
- Log in to the dashboard
- Navigate to Settings > Credentials
- Verify stored credentials are readable (not error)
4. Data Integrity¶
5. Run a Test¶
docker compose --env-file .env.prod -f docker-compose.prod.yml exec backend \
python orchestrator/cli.py specs/example-test.md
Preventive Measures¶
- Back up
.env.prodseparately -- store in password manager - Monitor backup age -- alert if backup is older than 48 hours
- Test restores regularly -- monthly restore to a test environment
- Keep backup-scheduler running -- daily automated backups at 2 AM
Critical Information Reference¶
| Secret | Purpose | Recovery |
|---|---|---|
.env.prod | All secrets | Must be backed up separately |
JWT_SECRET_KEY | Decrypts credentials | Without it, credentials are unrecoverable |
POSTGRES_PASSWORD | Database access | Can be reset with server access |
MINIO_ROOT_PASSWORD | MinIO access | Can be reset with server access |
Verification¶
Confirm recovery is complete:
make health-checkpasses all endpoints- Login and token refresh work
- Encrypted credentials are decryptable
- Test execution completes
- Backup service is running and creating new backups
Related Guides¶
- Company Deployment -- initial deployment setup
- Deployment -- all deployment modes
- Troubleshooting -- diagnose issues before recovery
- Credential Management -- re-enter credentials after key loss