Traces Cheat Sheet - Metoro Documentation

Start here if traces feel confusing. This page explains the minimum you need to answer questions like “who is calling this service?”, “which endpoints are busiest?”, and “what is failing?”.

The 30-second version

A trace is one request or one network call.
client.service.name is the service that made the call.
server.service.name is the service that received the call.
http.path is the endpoint or route.
count(traces) means “how many calls happened?”
by (...) means “split that total into buckets.”

First: switch to MetoroQL mode

Most of the examples below are written in MetoroQL first, because it is the clearest way to express trace questions.

In Trace Search, click the <> icon at the end of the search section (at the right end of the search bar). The tooltip says Switch to MetoroQL mode.
In Metric Explorer, open the mode selector at the top-right of the query row and switch from Standard mode to MetoroQL mode.
If you want to come back later, use the same controls to switch back to Standard mode. Switching between the Standard and MetoroQL modes will convert the query automatically.

The “Standard Mode” examples later on refer to Metric Explorer with trace data selected. Trace Search standard mode is for browsing individual traces, not for grouped time-series questions.

One request, pictured simply

Keep this picture in your head for almost every traces question:

Incoming vs outgoing, in plain English

If service-a calls service-b on /api/dev:

From service-a’s point of view, that is an outgoing call.
From service-b’s point of view, that is an incoming call.
In Metoro, the caller is the client.service.name.
In Metoro, the receiver is the server.service.name.

In tracing language:

client does not necessarily mean a browser or end user. It means “the thing that started the request.”
server does not necessarily mean a VM or physical server. It means “the thing that received the request.”

That leads to one simple rule:

Want outgoing calls from a service: filter on client.service.name
Want incoming calls to a service: filter on server.service.name

Which field do I filter?

What the main trace fields mean

Field	Plain English
`client.service.name`	Who made the call
`server.service.name`	Who received the call
`http.method`	GET, POST, PUT, DELETE, etc.
`http.path`	Which endpoint was hit
`http.status_code`	The exact response code
`http.status_code.bucket`	The response code family, like `2XX` or `5XX`
`metoro.is_server_span`	Whether Metoro is showing the server side of the request

Other useful trace attributes

Field	Plain English
`client.namespace`	Kubernetes namespace that made the call
`server.namespace`	Kubernetes namespace that received the call
`client.container.id`	Exact caller pod/container
`server.container.id`	Exact receiver pod/container
`server.net.host.name`	Destination host or IP, especially useful for external calls
`server.external`	Whether the destination is outside your Kubernetes cluster
`environment`	Metoro environment
`traceId`	Unique ID for one trace
`client.host.availability_zone`	Availability zone the request came from
`server.host.availability_zone`	Availability zone that handled the request

Which screen to use

If you want to…	Use…
See who talks to who	Service Map
Inspect individual requests	The Traces page
Count requests, split by endpoint, or chart traffic	Metric Explorer in Standard Mode, or a MetoroQL query using `count(traces)`

Group by, without the jargon

group by just means:

Do not give me one big total. Split the total into smaller buckets.

No group by: one total

count(traces{server.service.name="service-b"})

Group by `http.path`: split by endpoint

count(traces{server.service.name="service-b"}) by (http.path)

In the platform UI, putting http.path in the Group by control does the same thing as writing by (http.path) in MetoroQL. Add more fields when you need narrower buckets. For example, by (http.method, http.path) splits GET /api/dev and POST /api/dev into separate rows. Examples:

# One total number of incoming calls
count(traces{server.service.name="<service>"})

# Split incoming calls by endpoint
count(traces{server.service.name="<service>"}) by (http.path)

# Split incoming calls by method and endpoint
count(traces{server.service.name="<service>"}) by (http.method, http.path)

If multiple services have the same endpoint name, include the service in the by (...) clause too.

Copy-paste questions and answers

Each example below has:

a MetoroQL version you can paste directly
a Standard Mode version you can build in Metric Explorer without writing the query yourself

Number of incoming calls to a service

count(traces{server.service.name="<service>"})

Standard Mode

Timeseries data: trace
Stat: request count
Filters: server.service.name=<service>
Group by: leave empty

Number of incoming calls to a service by endpoint

count(traces{server.service.name="<service>"}) by (http.path)

Standard Mode

Timeseries data: trace
Stat: request count
Filters: server.service.name=<service>
Group by: http.path

Number of incoming calls to a service by method and endpoint

count(traces{server.service.name="<service>"}) by (http.method, http.path)

Standard Mode

Timeseries data: trace
Stat: request count
Filters: server.service.name=<service>
Group by: http.method, http.path

Number of outgoing calls from a service

count(traces{client.service.name="<service>"})

Standard Mode

Timeseries data: trace
Stat: request count
Filters: client.service.name=<service>
Group by: leave empty

Number of outgoing calls from a service by destination service

count(traces{client.service.name="<service>"}) by (server.service.name)

Standard Mode

Timeseries data: trace
Stat: request count
Filters: client.service.name=<service>
Group by: server.service.name

Number of outgoing calls from a service by endpoint

count(traces{client.service.name="<service>"}) by (server.service.name, http.path)

Standard Mode

Timeseries data: trace
Stat: request count
Filters: client.service.name=<service>
Group by: server.service.name, http.path

Which services are calling my service?

count(traces{server.service.name="<service>"}) by (client.service.name)

Standard Mode

Timeseries data: trace
Stat: request count
Filters: server.service.name=<service>
Group by: client.service.name

Which namespaces are calling my service?

count(traces{server.service.name="<service>"}) by (client.namespace)

Standard Mode

Timeseries data: trace
Stat: request count
Filters: server.service.name=<service>
Group by: client.namespace

All endpoint call counts in a namespace

count(traces{server.namespace="<namespace>"}) by (server.service.name, http.path)

Standard Mode

Timeseries data: trace
Stat: request count
Filters: server.namespace=<namespace>
Group by: server.service.name, http.path

Which endpoints on my service are failing?

count(traces{server.service.name="<service>", http.status_code.bucket="5XX"}) by (http.path)

Standard Mode

Timeseries data: trace
Stat: request count
Filters: server.service.name=<service>, http.status_code.bucket=5XX
Group by: http.path

Which endpoints on my service are slow?

trace_duration_quantile(0.95, traces{server.service.name="<service>"}) by (http.path)

Standard Mode

Timeseries data: trace
Stat: p95 latency
Filters: server.service.name=<service>
Group by: http.path

Which containers in my service are handling traffic?

count(traces{server.service.name="<service>"}) by (server.container.id)

Standard Mode

Timeseries data: trace
Stat: request count
Filters: server.service.name=<service>
Group by: server.container.id

Which external APIs is my service calling?

count(traces{client.service.name="<service>", server.namespace="External Service"}) by (server.service.name, http.path)

Standard Mode

Timeseries data: trace
Stat: request count
Filters: client.service.name=<service>, server.namespace=External Service
Group by: server.service.name, http.path

Which external hosts is my service calling?

count(traces{client.service.name="<service>", server.namespace="External Service"}) by (server.net.host.name, http.path)

Standard Mode

Timeseries data: trace
Stat: request count
Filters: client.service.name=<service>, server.namespace=External Service
Group by: server.net.host.name, http.path

A real mental model

If you are looking at your API service:

server.service.name="<your-api>" means “requests coming into my API”
client.service.name="<your-api>" means “requests my API made to something else”

If you only remember one thing, remember that.

If you do not know the exact service name

Service names in Metoro are often full Kubernetes-style names such as /k8s/default/checkout-service. If you are not sure of the exact value, use a regex match:

count(traces{server.service.name=~".*checkout-service.*"}) by (http.path)

When `http.path` is not enough

http.path is best for HTTP traffic. For other protocols, use a more useful field for that protocol:

span.name for a generic span name
db.operation for database calls
server.service.name to see which downstream service was hit

Beginner workflow

Start with the service you care about

Decide whether you care about traffic coming into the service or going out of the service.

Pick the right side

Use server.service.name for incoming traffic. Use client.service.name for outgoing traffic.

Add one split at a time

Start with by (http.path). If that is too broad, use by (http.method, http.path) or by (server.service.name, http.path).

Add failure or latency next

Add http.status_code.bucket="5XX" to focus on errors, or switch to trace_duration_quantile(0.95, ...) to look for slow endpoints.

Traces Overview

Understand how trace data is collected and shown in Metoro

Service Map

See who talks to who before drilling into specific requests

MetoroQL

Learn the query syntax behind the examples on this page

​The 30-second version

​First: switch to MetoroQL mode

​One request, pictured simply

​Incoming vs outgoing, in plain English

​Which field do I filter?

​What the main trace fields mean

​Other useful trace attributes

​Which screen to use

​Group by, without the jargon

​No group by: one total

​Group by http.path: split by endpoint

​Copy-paste questions and answers

​Number of incoming calls to a service

​Number of incoming calls to a service by endpoint

​Number of incoming calls to a service by method and endpoint

​Number of outgoing calls from a service

​Number of outgoing calls from a service by destination service

​Number of outgoing calls from a service by endpoint

​Which services are calling my service?

​Which namespaces are calling my service?

​All endpoint call counts in a namespace

​Which endpoints on my service are failing?

​Which endpoints on my service are slow?

​Which containers in my service are handling traffic?

​Which external APIs is my service calling?

​Which external hosts is my service calling?

​A real mental model

​If you do not know the exact service name

​When http.path is not enough

​Beginner workflow

​Related docs

Traces Overview

Service Map

MetoroQL

The 30-second version

First: switch to MetoroQL mode

One request, pictured simply

Incoming vs outgoing, in plain English

Which field do I filter?

What the main trace fields mean

Other useful trace attributes

Which screen to use

Group by, without the jargon

No group by: one total

Group by `http.path`: split by endpoint

Copy-paste questions and answers

Number of incoming calls to a service

Number of incoming calls to a service by endpoint

Number of incoming calls to a service by method and endpoint

Number of outgoing calls from a service

Number of outgoing calls from a service by destination service

Number of outgoing calls from a service by endpoint

Which services are calling my service?

Which namespaces are calling my service?

All endpoint call counts in a namespace

Which endpoints on my service are failing?

Which endpoints on my service are slow?

Which containers in my service are handling traffic?

Which external APIs is my service calling?

Which external hosts is my service calling?

A real mental model

If you do not know the exact service name

When `http.path` is not enough

Beginner workflow

Related docs