MetoroQL (mQL for short) is Metoro’s query language for observability data. It’s designed to be familiar to users of PromQL but with several important enhancements that make it more powerful for querying across different types of observability data.

Overview

MetoroQL has a PromQL-like syntax but provides unified access to different types of data:

  • Metrics (both standard and custom)
  • Logs
  • Traces
  • Kubernetes resources

This allows you to correlate and analyze data from different sources using a consistent query language.

Key Differences from PromQL

MetoroQL is generally a subset of promql with a few notable difference:

  1. Counters return the delta of consecutive values by default
  2. Queries can be over resource types other than metrics
  3. Timeseries queries must have an aggregate applied to them

Counter Handling

In PromQL, counter metrics require explicit functions like rate() or increase() to calculate the rate of change. In MetoroQL, counter values are automatically presented as the difference between consecutive data points. This means:

  • Values represent changes between points rather than cumulative values
  • You should not apply rate() or increase() functions to chart changes

For example, consider a HTTP request counter metric with one minute buckets:

# In PromQL, a query to get the number of requests 
# sent in a given minute might look something like this
sum(rate(http_requests_total{service="api"}[5m])) / 5
# or
sum(rate(http_requests_total{service="api"}[5m])) * 60

# In MetoroQL, you directly query the counter
sum(http_requests_total{service="api"})

The MetoroQL query will directly show the change in request count between data points, while in PromQL, the raw http_requests_total would show monotonically increasing cumulative values that generally aren’t immediately useful without applying rate() or increase().

Multi-domain Queries

One of the most powerful features of MetoroQL is the ability to query across different observability domains using special metric names:

  • logs - Log data
  • traces - Distributed tracing data
  • kubernetes_resources - Kubernetes resources information
  • Any other metric identifier is treated as a regular metric

Each of these domains has specific functions and aggregations that can be applied to them.

Forced aggregations

In mQL all timeseries queries require an explicit aggregation function such as:

  • sum
  • avg
  • min
  • max
  • count
  • histogram_quantile

For example

# The following is a valid query in promql, this will return 
# all individual timeseries (unique combination of labels and values).
container_resources_cpu_usage_seconds_total

# In mQL you must specify an aggregate
sum(container_resources_cpu_usage_seconds_total)

This behaviour differs from the default promql behaviour.

Basic Query Syntax

A simple MetoroQL timeseries query has the following structure:

aggregation(metric_name{label1="value1", label2="value2"}) by (grouplabel1, grouplabel2) 

For example to get the cpu usage of all services running in the default namespace:

max(container_resources_cpu_usage_seconds_total{namespace="default"}) by (service_name)

You can also perform arithmetic on a timeseries or between multiple timeseries

For example to get the percent of allocated disk actually used by a service

(sum(container_resources_disk_used_bytes) by (service_name) / 
sum(container_resources_disk_size_bytes) by (service_name)) * 100

Special Data Types

In addition to metrics, you can write mQL queries over logs, traces and kubernetes_resources. Each of these resources have their own rules on how they can be queried.

Log Queries

  • Log queries just support only the count aggregate.
  • They support all filtering and group by operation.
  • Structured json logs attributes are parsed into filterable fields.
# Count of error logs
count(logs{log_level="error"})

# Number of logs with a message matching a regex
count(logs{message=~".*failed.*"})

# Error logs by service
count(logs{log_level="error"}) by (service.name)

# Count each of the individual values of the 
# custom "caller" field for all logs that have the field.
count(logs{caller=~".+"}) by (caller)

Trace Queries

  • Trace queries support both the count aggregation and the trace_duration_quantile.
  • They support all filtering and group by operation.
  • All custom attributes are queryable for filtering and group bys
# Count of all requests being served by currency services
count(traces{server.service.name=~".*currency.*"})

# Percent of 5XX requests served by the currency services
count(traces{http.status_code=~"5..", server.service.name=~".*currency.*"}) * 100 
/ count(traces{server.service.name=~".*currency.*"})

# P95 for the convert endpoint
trace_duration_quantile(0.95, traces{http.path="/convert",
                        server.service.name=~".*currency.*"})

Kubernetes Resources Queries

  • Trace queries support both the count aggregation and all other aggerations after the json_path function is applied.
  • They support all filtering and group by operation.
# Count of pods by namespace
count(kubernetes_resources{Kind="Pod"}) by (Namespace)

# Total number of replicas specified by deployments per service
sum(json_path("spec.replicas", kubernetes_resources)) by (ServiceName)

With json_path, you can:

  • Extract and analyze specific fields from Kubernetes resources
  • Use sum, avg, min, or max aggregations with the extracted values

Advanced Features

Filtering

MetoroQL supports several filtering operators:

# Exact match
metric{label="value"}

# Negation
metric{label!="value"}

# Regex matching
metric{label=~"pattern"}

# Negated regex
metric{label!~"pattern"}

Binary Operations

You can create complex queries using arithmetic operations. Supported operations are:

  • + addition
  • - subtraction
  • * multiplication
  • / division
  • % modulo
  • ^ exponentiation
  • == equal
  • != not equal
  • <= less than or equal
  • >= greater than or equal
# Calculate logs error rate percentage
100 * (count(logs{log_level="error"}) / sum(logs))

# Compare CPU usage across environments
sum(container_resources_cpu_usage_seconds_total{environment="production"}) - sum(container_resources_cpu_usage_seconds_total{environment="staging"})

Grouping

Group data by specific labels:

# Group by container and namespace
sum(container_memory_working_set_bytes) by (container, namespace)

# Top 5 pods by CPU usage
topk(5, max(container_cpu_usage) by (pod_name))