Skip to main content
The OOM Detection workflow monitors your Kubernetes services for Out of Memory (OOM) events and creates suggestions when services experience OOM events. This helps you identify resource constraints in your services and take corrective action.

How it Works

The workflow monitors the container_oom_kills_total metric, which is incremented each time a container in your service is killed due to an Out of Memory condition. When a service experiences more than the configured number of OOM events, a suggestion is created with details about the events.

Configuration

The workflow can be configured with the following parameters:
ParameterTypeDescriptionDefault
minOOMEventsToCreateIssueintegerMinimum number of OOM events required to create a suggestion1

Suggestion Details

When a suggestion is created, it includes:
  • The service and environment where OOM events occurred
  • The number of OOM events in the last 24 hours
  • The severity level (high if OOM count is 10x the minimum threshold)
  • A visualization showing:
    • OOM events over time
    • Memory usage patterns
    • Memory limits and requests

Example Suggestion

Here’s an example of a suggestion created by the OOM Detection workflow:
Title: OOMs Detected: my-service (production)

Service my-service (production environment) has experienced 5 OOM events in the last 24 hours.
High severity as the service experienced > 10x the minimum number of OOM events.

Severity Levels

The workflow assigns severity levels based on the number of OOM events:
  • Medium: When the number of OOM events meets or exceeds minOOMEventsToCreateIssue
  • High: When the number of OOM events is 10x or more than minOOMEventsToCreateIssue

Best Practices

  1. Set Appropriate Thresholds: Configure minOOMEventsToCreateIssue based on your service’s characteristics. A lower threshold is more sensitive but may generate more suggestions.
  2. Monitor Memory Usage: Use the suggestion details view to understand memory usage patterns leading up to OOM events. Look for:
    • Memory usage approaching limits
    • Sudden spikes in memory usage
    • Inadequate memory limits or requests
  3. Regular Review: Regularly review OOM suggestions to identify patterns and recurring resource constraints in your services.
  4. Memory Management: When OOM suggestions are detected:
    • Review and adjust memory limits
    • Look for memory leaks
    • Consider implementing memory optimization strategies
    • Monitor memory usage trends