Overview
Understand the Metoro issue detection system
Overview
Metoro’s issue detection system continuously monitors your clusters to identify potential problems and inefficiencies. It operates through two main concepts:
- Issues: Concrete problems identified within cluster components
- Workflows: Automated processes that scan clusters to detect issues
Understanding Issues
Issues represent specific problems detected within your cluster components. Examples include:
- Services with excessive memory allocation
- Over-provisioned nodes with low utilization
- High error rates (e.g., HTTP 500s) from specific containers
Each issue is assigned attributes:
- Severity (Low to High): Helps prioritize resolution
- Attributes: Identifying information (service name, namespace, etc.)
- Measurements: Quantitative data about the issue
Severity levels help prioritize issues:
- Low: Efficiency improvements (e.g., over-provisioned resources)
- Medium: Performance impacts (e.g., CPU throttling)
- High: Critical problems requiring immediate attention
Workflows
Workflows are automated processes that continuously scan your clusters for issues.
- Run every 24 hours by default (midnight UTC)
- Analyze previous day’s data
- Automatically close resolved issues and open new ones when detected
- Configurable parameters for your specific needs
- Can be triggered on demand
Metoro includes several built-in workflows:
- Right-Sizing Workflow: Optimizes resource allocation across your services
- More workflows coming soon…
Managing Issues
Issues View
The issues view lists all open and closed issues. You can apply filters, sort and search to identify specific issues you’d like to address.
Check out an example Issues Page
Issues Details
When clicking into an issue, you can view the data that the workflow used to detect it and a number of related metrics.
In addition to the basic issue information, there’s also a timeline view of all workflow runs that fired for this issue. This allows you to spot recurrences and patterns.
Workflow Configuration
Workflows come with sensible defaults, but you can adjust them to match your needs through workflow settings.
Issue Muting
Control which issues you want to track:
- Find an example issue
- Click “Mute Similar”
- Select attributes to mute (e.g., development environment)
- Apply mute rule
This is particularly useful for:
- Development environments with expected low utilization
- Known exceptions to standard rules
- Temporary suppressions during maintenance
Was this page helpful?