Skip to main content
Use this runbook when you need to change the number of bundled ClickHouse replicas managed by the metoro-onprem chart and the Altinity ClickHouse Operator. If ClickHouse is externally managed, scale replicas through the team and tooling that operate that ClickHouse service. The Metoro chart does not manage external ClickHouse replicas. This page is about ClickHouse replicas, not ClickHouse Keeper replicas.

What To Change

The clickhouse.bundled.replicaCount Helm value controls the ClickHouse replica count. The chart renders this into spec.configuration.clusters[0].layout.replicasCount on the ClickHouseInstallation named metoro. Change the value in metoro-hub-values.yaml:
clickhouse:
  bundled:
    replicaCount: 3
Keep metoro-hub-values.yaml as the source of truth. Do not manually edit the ClickHouseInstallation for normal replica scaling.

When To Scale

Metoro normally runs ClickHouse as one shard with three replicas in production. Three replicas gives three copies of the telemetry data and allows each replica to query the full dataset. Scale up when you are moving toward the recommended three-replica production shape or restoring replica count after a temporary downscale. Each added replica needs its own CPU, memory, hot PVC, local cache, and retained telemetry capacity. Do not go above three replicas for ordinary capacity scaling. Prefer vertical scaling first, and contact Metoro before planning unusual replica layouts or sharding. Scale down only when you accept reduced redundancy and query capacity. Downscaling can reduce running pod cost, but it may not immediately reduce storage cost because ClickHouse PVCs use retained storage. If replica scaling is part of node maintenance or any workflow that takes pods offline, follow Taking Pods Offline so Keeper keeps quorum and at least one ClickHouse pod stays ready.

Apply The Change

Apply the updated values with the same Helm release and chart version used for the hub:
helm upgrade --install metoro oci://quay.io/metoro/charts/metoro-onprem \
  --namespace metoro-hub \
  --version 10.0.1 \
  --values metoro-hub-values.yaml

What Happens When Scaling Up

After the Helm upgrade, the ClickHouseInstallation named metoro updates its desired replica count. The Altinity ClickHouse Operator reconciles the updated ClickHouseInstallation and creates the additional ClickHouse replica pod resources and retained PVCs. New replicas join the single ClickHouse shard and catch up with replicated data. This can temporarily increase ClickHouse, disk, object-storage, and network load while the new replicas initialize. Existing replicas should generally continue serving reads and writes while new replicas come online. If adding more than one replica, expect a gradual reconciliation rather than all new replicas becoming ready at once.

What Happens When Scaling Down

After the Helm upgrade, the ClickHouseInstallation named metoro lowers its desired replica count. The Altinity ClickHouse Operator reconciles the updated ClickHouseInstallation and removes surplus ClickHouse replica pod resources. Because ClickHouse volume claim templates use retained storage, PVCs or PVs from removed replicas may remain after downscaling. Do not manually delete retained ClickHouse PVCs as part of the normal downscale runbook. Storage cost may not drop until retained volumes are intentionally cleaned up after confirming rollback is not needed. Production deployments should normally stay at three replicas. Downscale only when the operational tradeoff is understood.

Verify The Rollout

Check that the ClickHouseInstallation exists and is being reconciled:
kubectl -n metoro-hub get chi metoro
Confirm the desired CHI replica count matches the new Helm value:
kubectl -n metoro-hub get chi metoro \
  -o jsonpath='{.spec.configuration.clusters[0].layout.replicasCount}{"\n"}'
Watch the ClickHouse pods while the operator reconciles:
kubectl -n metoro-hub get pods -l app=metoro-clickhouse -w
Check ClickHouse PVCs:
kubectl -n metoro-hub get pvc
Check recent events if pods or PVCs are slow to converge:
kubectl -n metoro-hub get events --sort-by='.lastTimestamp'
The expected outcome is:
  • The CHI desired replica count matches clickhouse.bundled.replicaCount.
  • Scale-up creates new metoro-clickhouse pods and new retained ClickHouse PVCs.
  • Scale-down removes surplus ClickHouse pod resources while retained PVCs may remain.
  • Existing replicas generally stay ready, though readiness may briefly change during reconciliation.
  • Apiserver and Ingester recover automatically if ClickHouse readiness briefly changes.