Skip to main content
Start here if traces feel confusing. This page explains the minimum you need to answer questions like “who is calling this service?”, “which endpoints are busiest?”, and “what is failing?”.

The 30-second version

  • A trace is one request or one network call.
  • client.service.name is the service that made the call.
  • server.service.name is the service that received the call.
  • http.path is the endpoint or route.
  • count(traces) means “how many calls happened?”
  • by (...) means “split that total into buckets.”

First: switch to MetoroQL mode

Most of the examples below are written in MetoroQL first, because it is the clearest way to express trace questions.
  1. In Trace Search, click the <> icon at the end of the search section (at the right end of the search bar). The tooltip says Switch to MetoroQL mode.
  2. In Metric Explorer, open the mode selector at the top-right of the query row and switch from Standard mode to MetoroQL mode.
  3. If you want to come back later, use the same controls to switch back to Standard mode. Switching between the Standard and MetoroQL modes will convert the query automatically.
The “Standard Mode” examples later on refer to Metric Explorer with trace data selected. Trace Search standard mode is for browsing individual traces, not for grouped time-series questions.

One request, pictured simply

Keep this picture in your head for almost every traces question:

Incoming vs outgoing, in plain English

If service-a calls service-b on /api/dev:
  • From service-a’s point of view, that is an outgoing call.
  • From service-b’s point of view, that is an incoming call.
  • In Metoro, the caller is the client.service.name.
  • In Metoro, the receiver is the server.service.name.
In tracing language:
  • client does not necessarily mean a browser or end user. It means “the thing that started the request.”
  • server does not necessarily mean a VM or physical server. It means “the thing that received the request.”
That leads to one simple rule:
  • Want outgoing calls from a service: filter on client.service.name
  • Want incoming calls to a service: filter on server.service.name

Which field do I filter?

What the main trace fields mean

FieldPlain English
client.service.nameWho made the call
server.service.nameWho received the call
http.methodGET, POST, PUT, DELETE, etc.
http.pathWhich endpoint was hit
http.status_codeThe exact response code
http.status_code.bucketThe response code family, like 2XX or 5XX
metoro.is_server_spanWhether Metoro is showing the server side of the request

Other useful trace attributes

FieldPlain English
client.namespaceKubernetes namespace that made the call
server.namespaceKubernetes namespace that received the call
client.container.idExact caller pod/container
server.container.idExact receiver pod/container
server.net.host.nameDestination host or IP, especially useful for external calls
server.externalWhether the destination is outside your Kubernetes cluster
environmentMetoro environment
traceIdUnique ID for one trace
client.host.availability_zoneAvailability zone the request came from
server.host.availability_zoneAvailability zone that handled the request

Which screen to use

If you want to…Use…
See who talks to whoService Map
Inspect individual requestsThe Traces page
Count requests, split by endpoint, or chart trafficMetric Explorer in Standard Mode, or a MetoroQL query using count(traces)

Group by, without the jargon

group by just means:
Do not give me one big total. Split the total into smaller buckets.

No group by: one total

count(traces{server.service.name="service-b"})

Group by http.path: split by endpoint

count(traces{server.service.name="service-b"}) by (http.path)
In the platform UI, putting http.path in the Group by control does the same thing as writing by (http.path) in MetoroQL. Add more fields when you need narrower buckets. For example, by (http.method, http.path) splits GET /api/dev and POST /api/dev into separate rows. Examples:
# One total number of incoming calls
count(traces{server.service.name="<service>"})

# Split incoming calls by endpoint
count(traces{server.service.name="<service>"}) by (http.path)

# Split incoming calls by method and endpoint
count(traces{server.service.name="<service>"}) by (http.method, http.path)
If multiple services have the same endpoint name, include the service in the by (...) clause too.

Copy-paste questions and answers

Each example below has:
  • a MetoroQL version you can paste directly
  • a Standard Mode version you can build in Metric Explorer without writing the query yourself

Number of incoming calls to a service

count(traces{server.service.name="<service>"})
Standard Mode
  • Timeseries data: trace
  • Stat: request count
  • Filters: server.service.name=<service>
  • Group by: leave empty

Number of incoming calls to a service by endpoint

count(traces{server.service.name="<service>"}) by (http.path)
Standard Mode
  • Timeseries data: trace
  • Stat: request count
  • Filters: server.service.name=<service>
  • Group by: http.path

Number of incoming calls to a service by method and endpoint

count(traces{server.service.name="<service>"}) by (http.method, http.path)
Standard Mode
  • Timeseries data: trace
  • Stat: request count
  • Filters: server.service.name=<service>
  • Group by: http.method, http.path

Number of outgoing calls from a service

count(traces{client.service.name="<service>"})
Standard Mode
  • Timeseries data: trace
  • Stat: request count
  • Filters: client.service.name=<service>
  • Group by: leave empty

Number of outgoing calls from a service by destination service

count(traces{client.service.name="<service>"}) by (server.service.name)
Standard Mode
  • Timeseries data: trace
  • Stat: request count
  • Filters: client.service.name=<service>
  • Group by: server.service.name

Number of outgoing calls from a service by endpoint

count(traces{client.service.name="<service>"}) by (server.service.name, http.path)
Standard Mode
  • Timeseries data: trace
  • Stat: request count
  • Filters: client.service.name=<service>
  • Group by: server.service.name, http.path

Which services are calling my service?

count(traces{server.service.name="<service>"}) by (client.service.name)
Standard Mode
  • Timeseries data: trace
  • Stat: request count
  • Filters: server.service.name=<service>
  • Group by: client.service.name

Which namespaces are calling my service?

count(traces{server.service.name="<service>"}) by (client.namespace)
Standard Mode
  • Timeseries data: trace
  • Stat: request count
  • Filters: server.service.name=<service>
  • Group by: client.namespace

All endpoint call counts in a namespace

count(traces{server.namespace="<namespace>"}) by (server.service.name, http.path)
Standard Mode
  • Timeseries data: trace
  • Stat: request count
  • Filters: server.namespace=<namespace>
  • Group by: server.service.name, http.path

Which endpoints on my service are failing?

count(traces{server.service.name="<service>", http.status_code.bucket="5XX"}) by (http.path)
Standard Mode
  • Timeseries data: trace
  • Stat: request count
  • Filters: server.service.name=<service>, http.status_code.bucket=5XX
  • Group by: http.path

Which endpoints on my service are slow?

trace_duration_quantile(0.95, traces{server.service.name="<service>"}) by (http.path)
Standard Mode
  • Timeseries data: trace
  • Stat: p95 latency
  • Filters: server.service.name=<service>
  • Group by: http.path

Which containers in my service are handling traffic?

count(traces{server.service.name="<service>"}) by (server.container.id)
Standard Mode
  • Timeseries data: trace
  • Stat: request count
  • Filters: server.service.name=<service>
  • Group by: server.container.id

Which external APIs is my service calling?

count(traces{client.service.name="<service>", server.namespace="External Service"}) by (server.service.name, http.path)
Standard Mode
  • Timeseries data: trace
  • Stat: request count
  • Filters: client.service.name=<service>, server.namespace=External Service
  • Group by: server.service.name, http.path

Which external hosts is my service calling?

count(traces{client.service.name="<service>", server.namespace="External Service"}) by (server.net.host.name, http.path)
Standard Mode
  • Timeseries data: trace
  • Stat: request count
  • Filters: client.service.name=<service>, server.namespace=External Service
  • Group by: server.net.host.name, http.path

A real mental model

If you are looking at your API service:
  • server.service.name="<your-api>" means “requests coming into my API”
  • client.service.name="<your-api>" means “requests my API made to something else”
If you only remember one thing, remember that.

If you do not know the exact service name

Service names in Metoro are often full Kubernetes-style names such as /k8s/default/checkout-service. If you are not sure of the exact value, use a regex match:
count(traces{server.service.name=~".*checkout-service.*"}) by (http.path)

When http.path is not enough

http.path is best for HTTP traffic. For other protocols, use a more useful field for that protocol:
  • span.name for a generic span name
  • db.operation for database calls
  • server.service.name to see which downstream service was hit

Beginner workflow

1

Start with the service you care about

Decide whether you care about traffic coming into the service or going out of the service.
2

Pick the right side

Use server.service.name for incoming traffic. Use client.service.name for outgoing traffic.
3

Add one split at a time

Start with by (http.path). If that is too broad, use by (http.method, http.path) or by (server.service.name, http.path).
4

Add failure or latency next

Add http.status_code.bucket="5XX" to focus on errors, or switch to trace_duration_quantile(0.95, ...) to look for slow endpoints.

Traces Overview

Understand how trace data is collected and shown in Metoro

Service Map

See who talks to who before drilling into specific requests

MetoroQL

Learn the query syntax behind the examples on this page