Metoro’s on-premises deployment brings enterprise-grade observability directly into your infrastructure. This guide provides a comprehensive overview of the architecture, components, and deployment considerations.

Architecture Overview

Metoro consists of two main parts:

  1. Metoro Hub

    The central control plane that processes, stores, and serves observability data. In the cloud installation, this is managed by Metoro, but in the on-premises version, you deploy and manage it yourself.

  2. Metoro Agents

    Lightweight agents deployed on each Kubernetes cluster that collect and forward data to the hub. This is ran the same way as the cloud installation.

The agent configuration is the same as the cloud installation, so we will not cover it here. Instead, these docs are focused on how to effectively deploy and manage the Metoro Hub components in your own infrastructure.

Core Components

Metoro Hub Components

The Metoro Hub is the brain of your observability platform, consisting of several key services:

1. API Server

  • Purpose: Serves the web UI and provides REST APIs
  • Key Features:
    • User authentication and authorization
    • Query processing
    • Dashboard and alert management
    • Status page hosting
    • Runs AI investigations and analysis
    • Integration with external services
  • Resource Requirements: <1 CPU core, 1-2GB RAM

2. Ingester

  • Purpose: Receives and processes telemetry data from agents
  • Key Features:
    • Handles various data formats converts them to a common format and forwards to storage
    • Data validation and enrichment
    • Batching and compression
  • Resource Requirements: Scales with data volume. Typically around 1 core and 1GB RAM will ingest 20MB/s of uncompressed telemetry data. Can be scaled horizontally or vertically.

3. ClickHouse (Telemetry Database)

  • Purpose: Stores all telemetry data (metrics, traces, logS, profiling, kubernetes objects etc.)
  • Key Features:
    • Columnar storage optimized for analytics
    • Real-time data ingestion
    • Efficient compression, we typically see around 20x compression on raw telemetry data
  • Resource Requirements: Scales with data ingestion rate and required query performance. Better to scale vertically, though horizontal scaling with sharding is also supported.

4. PostgreSQL (Metadata Database)

  • Purpose: Stores configuration, user data, and metadata
  • Key Features:
    • User accounts and permissions
    • Dashboard configurations
    • Alert rules and integrations
    • Temporal workflow data
  • Resource Requirements: 2 CPU cores, 4GB RAM, 20GB storage is enough for most installations. Scales with the number of alerts, users, and workflows.

5. Temporal (Workflow Engine)

  • Purpose: Manages background jobs and scheduled tasks
  • Key Features:
    • Alert evaluation and notification
    • Data retention and cleanup
    • Scheduled reports
    • AI-powered analysis workflows
  • Resource Requirements: 1 cores, 1GB RAM across all components is enough for most installations. Scales with the number of workflows and tasks (alerts, reports, etc.).