The CPU Throttling Detection workflow monitors your Kubernetes services for CPU throttling events and creates issues when services experience significant throttling. This helps you identify when services are being constrained by their CPU limits and take corrective action.

How it Works

The workflow monitors two key metrics:

  • container_resources_cpu_throttled_seconds_total: Measures the time a container spends throttled due to CPU limits
  • container_resources_cpu_usage_seconds_total: Measures the total CPU time used by the container

When the ratio of throttling time to CPU usage time exceeds configured thresholds, the workflow creates an issue to alert you about potential CPU constraints.

Configuration

The workflow can be configured with the following parameters:

ParameterTypeDescriptionDefault
mediumThrottleThresholdfloatMinimum throttling ratio (throttle time / CPU time) to create a medium severity issue0.05 (5%)
highThrottleThresholdfloatMinimum throttling ratio to create a high severity issue0.10 (10%)
minCpuSecondsfloatMinimum CPU seconds used in the time window before considering throttling issues3600 (1 hour)

Issue Details

When an issue is created, it includes:

  • The service and environment experiencing CPU throttling
  • The throttling ratio (percentage of CPU time spent throttled)
  • The severity level based on the throttling ratio
  • A visualization showing:
    • CPU throttling over time
    • CPU usage patterns

Example Issue

Here’s an example of an issue created by the CPU Throttling Detection workflow:

Title: CPU Throttling Detected: my-service (production)

Service my-service (production environment) is experiencing severe CPU throttling (15.0% of CPU time). 
This indicates that the service is being significantly constrained by CPU limits.

Severity Levels

The workflow assigns severity levels based on the throttling ratio:

  • Medium: When the throttling ratio meets or exceeds mediumThrottleThreshold (default: 5%)
  • High: When the throttling ratio meets or exceeds highThrottleThreshold (default: 10%)

Understanding CPU Throttling

CPU throttling in Kubernetes can be counterintuitive. Even if your average CPU usage is under the limit, you can still experience throttling due to how Kubernetes implements CPU limits:

  1. The default quota period is 100ms
  2. For example, with a 50m (millicores) CPU limit:
    • The container gets a 5ms CPU quota per 100ms period
    • If the container needs more than 5ms of CPU in any 100ms period, it gets throttled
    • This happens even if the average CPU usage over longer periods is below the limit

This is particularly problematic for request-handling services because throttling manifests as increased latency.