Availability Rules
Avoid taking too many ClickHouse or Keeper pods offline at once. ClickHouse Keeper must keep quorum. With the default three Keeper pods, keep at least two Keeper pods ready. Losing Keeper quorum can block ClickHouse coordination and replica management, which can block telemetry ingestion and user queries until enough Keeper pods are healthy again. TheClickHouseInstallation needs at least one ClickHouse pod online. Running with one ClickHouse pod removes redundancy and reduces query and ingest capacity, but it keeps the telemetry store available. Taking every ClickHouse pod offline is a planned outage for ClickHouse-backed UI, API, and ingestion paths.
For maintenance, work one pod or node at a time. Wait for Keeper quorum and at least one ready ClickHouse pod before continuing to the next pod or node.
Before Maintenance
Check ClickHouse pods:ClickHouseInstallation:
During Maintenance
Take one pod or one node offline at a time. After each step, confirm Keeper quorum and ClickHouse availability before moving on:- At least two Keeper pods stay ready in the default three-Keeper deployment.
- At least one ClickHouse pod stays ready.
- ClickHouse-backed UI, API, and ingestion paths may have reduced capacity, but should not become fully unavailable.
After Maintenance
Confirm all expected ClickHouse and Keeper pods are ready:- Keeper has quorum.
- At least one ClickHouse pod stayed ready throughout the maintenance window, unless the work was planned as a ClickHouse outage.
- All expected ClickHouse and Keeper pods return to ready.
- Apiserver and Ingester recover automatically if ClickHouse readiness briefly changed.
