CPU Throttling Detection
Detect CPU throttling in your Kubernetes services
The CPU Throttling Detection workflow monitors your Kubernetes services for CPU throttling events and creates issues when services experience significant throttling. This helps you identify when services are being constrained by their CPU limits and take corrective action.
How it Works
The workflow monitors two key metrics:
container_resources_cpu_throttled_seconds_total
: Measures the time a container spends throttled due to CPU limitscontainer_resources_cpu_usage_seconds_total
: Measures the total CPU time used by the container
When the ratio of throttling time to CPU usage time exceeds configured thresholds, the workflow creates an issue to alert you about potential CPU constraints.
Configuration
The workflow can be configured with the following parameters:
Parameter | Type | Description | Default |
---|---|---|---|
mediumThrottleThreshold | float | Minimum throttling ratio (throttle time / CPU time) to create a medium severity issue | 0.05 (5%) |
highThrottleThreshold | float | Minimum throttling ratio to create a high severity issue | 0.10 (10%) |
minCpuSeconds | float | Minimum CPU seconds used in the time window before considering throttling issues | 3600 (1 hour) |
Issue Details
When an issue is created, it includes:
- The service and environment experiencing CPU throttling
- The throttling ratio (percentage of CPU time spent throttled)
- The severity level based on the throttling ratio
- A visualization showing:
- CPU throttling over time
- CPU usage patterns
Example Issue
Here’s an example of an issue created by the CPU Throttling Detection workflow:
Severity Levels
The workflow assigns severity levels based on the throttling ratio:
- Medium: When the throttling ratio meets or exceeds
mediumThrottleThreshold
(default: 5%) - High: When the throttling ratio meets or exceeds
highThrottleThreshold
(default: 10%)
Understanding CPU Throttling
CPU throttling in Kubernetes can be counterintuitive. Even if your average CPU usage is under the limit, you can still experience throttling due to how Kubernetes implements CPU limits:
- The default quota period is 100ms
- For example, with a 50m (millicores) CPU limit:
- The container gets a 5ms CPU quota per 100ms period
- If the container needs more than 5ms of CPU in any 100ms period, it gets throttled
- This happens even if the average CPU usage over longer periods is below the limit
This is particularly problematic for request-handling services because throttling manifests as increased latency.
Related Documentation
Was this page helpful?