Skip to content

Runbooks

This document provides runbooks for common operational procedures including releases, incidents, backups, and upgrades.


We follow a structured release train approach:

  1. Trigger Release Please

    • Merge main into release branch.
    • Let Release Please generate changelog + bump version.
  2. Tag & Artifacts

    • CI tags repo with version (e.g. v1.2.3).
    • CI builds container images and pushes to registry.
  3. Docs Versioning

    • Use mike to version and publish docs:
      Terminal window
      mike deploy 1.2 latest
      mike set-default latest
      git push origin gh-pages

  • Check API logs for errors around /stream.
  • Restart affected API pods.
  • Validate NATS connection health.
  • Inspect server pod/container logs.
  • Restart pod in Kubernetes.
  • Verify database connectivity.
  • Monitor NATS JetStream metrics.
  • Check outbox relay worker logs.
  • Scale API server instances if needed.

  • Backups: cron pg_dump to S3.
  • Restore: psql < dump.sql into new instance.
  • Event store is append-only, backup with regular pg_dump.
  • Point-in-time recovery via PostgreSQL WAL archiving.

  • Run database migrations before deploying new code.
  • Test migrations on staging environment first.
  • Validate API compatibility with existing clients.
  • Monitor error rates after deployment.
  • Keep rollback plan ready.