Overview

Metoro’s issue detection system continuously monitors your clusters to identify potential problems and inefficiencies. It operates through two main concepts:

  1. Issues: Concrete problems identified within cluster components
  2. Workflows: Automated processes that scan clusters to detect issues

Understanding Issues

Issues represent specific problems detected within your cluster components. Examples include:

  • Services with excessive memory allocation
  • Over-provisioned nodes with low utilization
  • High error rates (e.g., HTTP 500s) from specific containers

Each issue is assigned attributes:

  • Severity (Low to High): Helps prioritize resolution
  • Attributes: Identifying information (service name, namespace, etc.)
  • Measurements: Quantitative data about the issue

Severity levels help prioritize issues:

  • Low: Efficiency improvements (e.g., over-provisioned resources)
  • Medium: Performance impacts (e.g., CPU throttling)
  • High: Critical problems requiring immediate attention

Workflows

Workflows are automated processes that continuously scan your clusters for issues.

  • Run every 24 hours by default (midnight UTC)
  • Analyze previous day’s data
  • Automatically close resolved issues and open new ones when detected
  • Configurable parameters for your specific needs
  • Can be triggered on demand

Metoro includes several built-in workflows:

  • Right-Sizing Workflow: Optimizes resource allocation across your services
  • More workflows coming soon…

Managing Issues

Issues View

The issues view lists all open and closed issues. You can apply filters, sort and search to identify specific issues you’d like to address.

Check out an example Issues Page

Issues Details

When clicking into an issue, you can view the data that the workflow used to detect it and a number of related metrics.

In addition to the basic issue information, there’s also a timeline view of all workflow runs that fired for this issue. This allows you to spot recurrences and patterns.

Workflow Configuration

Workflows come with sensible defaults, but you can adjust them to match your needs through workflow settings.

Issue Muting

Control which issues you want to track:

  1. Find an example issue
  2. Click “Mute Similar”
  3. Select attributes to mute (e.g., development environment)
  4. Apply mute rule

This is particularly useful for:

  • Development environments with expected low utilization
  • Known exceptions to standard rules
  • Temporary suppressions during maintenance