OOM Detection
Detect Out of Memory (OOM) events in your Kubernetes services
The OOM Detection workflow monitors your Kubernetes services for Out of Memory (OOM) events and creates issues when services experience OOM events. This helps you identify memory-related problems in your services and take corrective action.
How it Works
The workflow monitors the container_oom_kills_total
metric, which is incremented each time a container in your service is killed due to an Out of Memory condition. When a service experiences more than the configured number of OOM events, an issue is created with details about the events.
Configuration
The workflow can be configured with the following parameters:
Parameter | Type | Description | Default |
---|---|---|---|
minOOMEventsToCreateIssue | integer | Minimum number of OOM events required to create an issue | 1 |
Issue Details
When an issue is created, it includes:
- The service and environment where OOM events occurred
- The number of OOM events in the last 24 hours
- The severity level (high if OOM count is 10x the minimum threshold)
- A visualization showing:
- OOM events over time
- Memory usage patterns
- Memory limits and requests
Example Issue
Here’s an example of an issue created by the OOM Detection workflow:
Severity Levels
The workflow assigns severity levels based on the number of OOM events:
- Medium: When the number of OOM events meets or exceeds
minOOMEventsToCreateIssue
- High: When the number of OOM events is 10x or more than
minOOMEventsToCreateIssue
Best Practices
-
Set Appropriate Thresholds: Configure
minOOMEventsToCreateIssue
based on your service’s characteristics. A lower threshold is more sensitive but may generate more issues. -
Monitor Memory Usage: Use the issue details view to understand memory usage patterns leading up to OOM events. Look for:
- Memory usage approaching limits
- Sudden spikes in memory usage
- Inadequate memory limits or requests
-
Regular Review: Regularly review OOM issues to identify patterns and systemic problems in your services.
-
Memory Management: When OOM issues are detected:
- Review and adjust memory limits
- Look for memory leaks
- Consider implementing memory optimization strategies
- Monitor memory usage trends
Related Documentation
Was this page helpful?