Monitoring microservices with the RED method
08 August 2021
The RED method is a set of three metrics, Requests, Errors, Duration, that act as a good place to start for monitoring our microservices. When this gets first introduced into systems I frequently see it only applied in incoming HTTP traffic, but we can do better for microservices. Tom Wilkie introduced the RED method as a monitoring philosophy for any system, but we will just look at microservices for now.
Assuming our service primarily handles incoming HTTP traffic, we might start with these:
- Requests, the count of incoming HTTP requests as requests/second
- Errors, total HTTP errors as (!status_code_5xx)/total_requests
- Duration, the latency of our responses as p50, p90, or p99 percentile latency
This is good because we cover a primary source of incoming traffic for our system and general happiness indicator of our clients. However, we are missing out on the other network pipes our service is using. Your service might vary, but here are the inflows and outflows for our “example” microservice:
In addition to inbound HTTP we typically have:
- HTTP inflow, incoming requests we handle
- AMQP inflow, incoming message we consume
- HTTP outflow, the outgoing request to 3rd parties
- GRCP outflow, outgoing requests in other internal services
- AMQP outflow, outgoing messages to a queuing system
- DB outflow, outgoing requests to the database
Your service may use different protocols and have a separate set of inbound and outbound clients. For each pipe both in and out of your service, we want to apply the RED method.