Alerts & Monitoring
Alert Examples
Alerts & Monitoring
Alert Examples
List of example alerts
Here are some example alerts you can set up in Metoro to monitor your Kubernetes infrastructure and applications. These examples cover various scenarios, including CPU usage, error rates, and latency.
If you are using Kubernetes ConfigMaps to manage your alerts, you can define these alerts in a ConfigMap and apply it to your cluster. Make sure to include the label metoro.io/alert: "true"
in your ConfigMap.
Complete Example ConfigMap
You can use the following ConfigMap to set up all example alerts at once:
kind: ConfigMap
apiVersion: v1
metadata:
name: alert-config
labels:
metoro.io/alert: "true"
data:
alert.yaml: |
alerts:
- metadata:
id: "cpu-usage-alert-001"
name: "High CPU Usage"
description: "Alert when CPU usage exceeds 80% for 5 minutes"
type: timeseries
timeseries:
expression:
metoroQLTimeseries:
query: sum(container_resources_cpu_usage_seconds_total{service_name="/k8s/default/myimportantservice"}) / 60 / sum(container_resources_cpu_limit_cores{service_name="/k8s/default/myimportantservice"})
bucketSize: 60
evaluationRules:
- name: "warning"
type: static
static:
operators:
- operator: greaterThan
threshold: 80
persistenceSettings:
datapointsToAlarm: 5
datapointsInEvaluationWindow: 5
missingDatapointBehavior: notBreaching
- metadata:
id: "error-log-alert-001"
name: "High Error Rate"
description: "Alert when error logs exceed 100 in 15 minutes"
type: timeseries
timeseries:
expression:
metoroQLTimeseries:
query: count(logs{log_level="error"})
bucketSize: 60
evaluationRules:
- name: "critical"
type: static
static:
operators:
- operator: greaterThan
threshold: 100
persistenceSettings:
datapointsToAlarm: 15
datapointsInEvaluationWindow: 15
missingDatapointBehavior: notBreaching
- metadata:
id: "high-latency-alert-001"
name: "High Latency"
description: "Alert when HTTP request duration exceeds 2 seconds for 5 minutes"
type: timeseries
timeseries:
expression:
metoroQLTimeseries:
query: trace_duration_quantile(0.99, traces)
bucketSize: 60
evaluationRules:
- name: "warning"
type: static
static:
operators:
- operator: greaterThan
threshold: 2
persistenceSettings:
datapointsToAlarm: 5
datapointsInEvaluationWindow: 5
missingDatapointBehavior: notBreaching
- metadata:
id: "latency-with-notifications-001"
name: "High Latency with Notifications"
description: "Alert when HTTP request duration exceeds 2 seconds for 5 minutes with notifications"
type: timeseries
timeseries:
expression:
metoroQLTimeseries:
query: trace_duration_quantile(0.99, traces)
bucketSize: 60
evaluationRules:
- name: "Warning"
type: static
static:
operators:
- operator: greaterThan
threshold: 2
persistenceSettings:
datapointsToAlarm: 5
datapointsInEvaluationWindow: 5
missingDatapointBehavior: notBreaching
actions:
- type: slack
slackDestination:
channel: "alerts-critical"
additionalMessage: "Service availability has dropped below SLA threshold!"
- type: email
emailDestination:
emails:
- "oncall@example.com"
- "sre-team@example.com"
Was this page helpful?
On this page