Anomaly Detection & Autonomous Investigation

Overview

Metoro’s Anomaly Detection feature automatically identifies unusual patterns in your systems without requiring you to configure explicit alert thresholds. When an anomaly is detected, Guardian AI automatically investigates to determine if there’s a real issue and what the root cause might be.

How It Works

Detection - Metoro continuously monitors your systems for anomalous behavior
Investigation - When an anomaly is detected, Guardian automatically runs an investigation
Analysis - Guardian determines whether the anomaly represents a real issue
Notification - If an issue is confirmed, Guardian posts to Slack with its findings

Types of Anomalies Detected

Currently, Metoro detects the following types of anomalies:

Anomaly Type	Description
Error Rate Spikes	Sudden increases in error rates compared to baseline

The types of anomalies detected will expand over time. Check back for updates or reach out to our team if you have specific anomaly types you’d like us to support.

Enabling Anomaly Detection

Step 1: Navigate to Settings

Go to Settings → Features → Anomaly Detection

Step 2: Enable Anomaly Detection

Toggle Enable Anomaly Detection to activate the feature.

Step 3: Configure Detection Scope

Select which services and environments should have anomaly detection enabled:

Services - Choose specific services or select all
Environments - Choose specific environments (e.g., prod, staging)

We recommend starting with production environments to focus on the most impactful issues.

Configuring Notifications

Autonomous investigations uses the same flexible notification configuration as other Guardian features.

Setting Up Notification Rules

Navigate to Settings → Features → Autonomous Investigation
Click Add Notification Configuration
Configure:
- Services - Which services should trigger notifications
- Environments - Which environments should trigger notifications
- Destination - Where to send notifications (Slack channel, webhook, etc.)

Example Configurations

Critical Services
All Production
Team-Specific

Route anomalies for critical services to an incidents channel:

Services: payment-service, auth-service, checkout-service
Environments: prod
Destination: #incidents

Route all production anomalies to a monitoring channel:

Services: All
Environments: prod
Destination: #production-alerts

Route anomalies to the owning team’s channel:Config 1:

Services: api-gateway, api-service
Environments: prod
Destination: #backend-team

Config 2:

Services: web-frontend, mobile-bff
Environments: prod
Destination: #frontend-team

How Anomaly Detection Differs from Alerts

Feature	Alerts	Anomaly Detection
Configuration	You define thresholds	Automatic baseline learning
Trigger	Fixed thresholds	Statistical anomalies
Investigation	Manual or runbook	Automatic
Best for	Known failure modes	Unknown unknowns

Anomaly Detection and Alerts are complementary. Use alerts for known failure modes with specific thresholds, and anomaly detection to catch unexpected issues.

Per-Workload Configuration

You can customize anomaly detection behavior for individual workloads using Kubernetes annotations. This allows you to fine-tune detection windows or disable detection entirely for specific services.

Available Annotations

Annotation	Type	Default	Range	Description
`metoro.io/anomaly-detection-disabled`	string	`"false"`	`"true"`/`"false"`	Disable anomaly detection for this workload
`metoro.io/anomaly-detection-baseline-minutes`	int	`30`	`5-30`	Baseline window for calculating normal behavior
`metoro.io/anomaly-detection-evaluation-minutes`	int	`5`	`1-10`	Evaluation window compared against baseline

The evaluation window must be at most half the baseline window (e.g., if baseline is 10 minutes, evaluation can be at most 5 minutes). This ensures statistical validity of anomaly detection.

Example: Disable Detection for a Service

For services with expected high error rates or batch jobs:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-processor
  annotations:
    metoro.io/anomaly-detection-disabled: "true"
spec:
  # ...

Example: Shorter Detection Window

For services where you want faster detection at the cost of potentially more false positives:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-gateway
  annotations:
    metoro.io/anomaly-detection-baseline-minutes: "10"
    metoro.io/anomaly-detection-evaluation-minutes: "2"
spec:
  # ...

Example: Longer Baseline for Stable Services

For stable services where you want to reduce noise:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: core-service
  annotations:
    metoro.io/anomaly-detection-baseline-minutes: "30"
    metoro.io/anomaly-detection-evaluation-minutes: "10"
spec:
  # ...

Annotations can be placed in either metadata.annotations or spec.template.metadata.annotations. The former takes precedence if both are specified.

Best Practices

Start with Production

Focus anomaly detection on production environments first, where issues have the most impact.

Review Investigation Quality

Periodically review Guardian’s investigations to ensure they’re finding real issues:

Are the anomalies significant?
Is the root cause analysis accurate?
Provide feedback to improve detection

Combine with Alerts

Use both anomaly detection and alerts:

Alerts for critical thresholds you always want to know about
Anomaly detection for catching unexpected issues

Tune Notification Routing

Route notifications appropriately:

Critical services → dedicated incident channels
Non-critical services → general monitoring channels

Deployment Verification

Automatic verification of deployments

AI Runbooks

Configure investigation runbooks for alerts

Inbox

View and manage actionable items

Alerts

Configure threshold-based alerts

Get Started

Concepts

Guardian AI

Traces

Logs

Metrics

Profiling

Kubernetes Resources

Dashboards

Infrastructure

Issue Detection

Alerts & Monitoring

Inbox

Integrations

Uptime Monitoring

User Management

On-Premises

Administration

​Overview

​How It Works

​Types of Anomalies Detected

​Enabling Anomaly Detection

​Step 1: Navigate to Settings

​Step 2: Enable Anomaly Detection

​Step 3: Configure Detection Scope

​Configuring Notifications

​Setting Up Notification Rules

​Example Configurations

​How Anomaly Detection Differs from Alerts

​Per-Workload Configuration

​Available Annotations

​Example: Disable Detection for a Service

​Example: Shorter Detection Window

​Example: Longer Baseline for Stable Services

​Best Practices

​Start with Production

​Review Investigation Quality

​Combine with Alerts

​Tune Notification Routing

​Related Documentation

Deployment Verification

AI Runbooks

Inbox

Alerts

Overview

How It Works

Types of Anomalies Detected

Enabling Anomaly Detection

Step 1: Navigate to Settings

Step 2: Enable Anomaly Detection

Step 3: Configure Detection Scope

Configuring Notifications

Setting Up Notification Rules

Example Configurations

How Anomaly Detection Differs from Alerts

Per-Workload Configuration

Available Annotations

Example: Disable Detection for a Service

Example: Shorter Detection Window

Example: Longer Baseline for Stable Services

Best Practices

Start with Production

Review Investigation Quality

Combine with Alerts

Tune Notification Routing

Related Documentation