Taking Pods Offline

Use this runbook when you need to take bundled PostgreSQL pods offline during node maintenance, storage work, scaling, or manual troubleshooting. If PostgreSQL is externally managed, follow the operational process for that PostgreSQL service. The Metoro chart does not manage external PostgreSQL pods.

Availability Rules

Avoid taking too many PostgreSQL pods offline at once. With the default three-instance PostgreSQL deployment, take at most one PostgreSQL pod or node offline at a time. CloudNativePG uses pod disruption budgets and switchover-aware behavior to protect the cluster during routine maintenance. Keep at least one PostgreSQL instance ready. Running with one ready instance removes standby failover capacity, but it keeps PostgreSQL available. Taking every PostgreSQL instance offline is a planned outage for Metoro UI, API, authentication, configuration, and Temporal workflow persistence paths. If the primary is affected, expect CloudNativePG to switchover where possible. Single-instance deployments cannot switchover, so taking the only PostgreSQL pod offline is a PostgreSQL outage. For maintenance, work one pod or node at a time. Wait for PostgreSQL readiness to recover before continuing to the next pod or node.

Before Maintenance

Check the CloudNativePG cluster:

kubectl -n metoro-hub get cluster.postgresql.cnpg.io metoro-postgresql

Check PostgreSQL pods and roles:

kubectl -n metoro-hub get pods -l cnpg.io/cluster=metoro-postgresql \
  -L cnpg.io/instanceRole

Check pod disruption budgets:

kubectl -n metoro-hub get pdb -l cnpg.io/cluster=metoro-postgresql

Do not start maintenance if PostgreSQL is already degraded, no PostgreSQL pod is ready, or the remaining ready instances cannot tolerate another disruption.

During Maintenance

Take one pod or one node offline at a time. After each step, confirm PostgreSQL availability before moving on:

kubectl -n metoro-hub get pods -l cnpg.io/cluster=metoro-postgresql \
  -L cnpg.io/instanceRole
kubectl -n metoro-hub get cluster.postgresql.cnpg.io metoro-postgresql

Expected state during routine maintenance:

At least one PostgreSQL instance stays ready.
In the default three-instance deployment, at least two PostgreSQL instances should normally remain ready while one is under maintenance.
If the primary is disrupted, CloudNativePG may promote a standby and move the write endpoint.
Metoro UI, API, authentication, configuration writes, and Temporal persistence may have reduced capacity or brief reconnects, but should not become fully unavailable unless all PostgreSQL instances are offline.

After Maintenance

Confirm all expected PostgreSQL pods are ready:

kubectl -n metoro-hub get pods -l cnpg.io/cluster=metoro-postgresql \
  -L cnpg.io/instanceRole

Confirm pod disruption budgets are healthy:

kubectl -n metoro-hub get pdb -l cnpg.io/cluster=metoro-postgresql

Check recent events if pods are slow to return:

kubectl -n metoro-hub get events --sort-by='.lastTimestamp'

The expected outcome is:

At least one PostgreSQL instance stayed ready throughout the maintenance window, unless the work was planned as a PostgreSQL outage.
The default three-instance deployment returns to three ready PostgreSQL pods.
CloudNativePG reports a healthy cluster.
Apiserver and Temporal recover automatically if PostgreSQL readiness briefly changed.

Get Started

AI SRE

Concepts

Traces

Logs

Metrics

Profiling

Kubernetes Resources

Dashboards

Infrastructure

Advisor

Alerts & Monitoring

Integrations

Uptime Monitoring

User Management

On-Premises

Administration

Availability Rules

Before Maintenance

During Maintenance

After Maintenance

​Availability Rules

​Before Maintenance

​During Maintenance

​After Maintenance

Availability Rules

Before Maintenance

During Maintenance

After Maintenance