Monitoring and Logging
Monitoring and Logging
Observability is critical for production containers. This chapter covers logging strategies, metrics collection, and monitoring solutions for containerized applications.
The Three Pillars of Observability
┌─────────────────────────────────────────────────────────┐
│ Observability │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ Logs │ │ Metrics │ │ Traces │ │
│ │ │ │ │ │ │ │
│ │ What │ │ How much │ │ Where │ │
│ │ happened │ │ and when │ │ (flow) │ │
│ └───────────────┘ └───────────────┘ └───────────────┘ │
└─────────────────────────────────────────────────────────┘Observability Pillars
| Pillar | Purpose | Examples |
|---|---|---|
| Logs | Event records | Errors, requests, debug info |
| Metrics | Numerical measurements | CPU, memory, request count |
| Traces | Request flow | Distributed transaction path |
Container Logging
Docker Logging Basics
Containers should log to stdout and stderr:
# Application logs to stdout/stderr
FROM node:20-alpine
WORKDIR /app
COPY . .
# Logs go to container output
CMD ["node", "server.js"]// Application logging to stdout/stderr
console.log("Info message"); // stdout
console.error("Error message"); // stderr
// Structured logging
console.log(
JSON.stringify({
level: "info",
message: "Request received",
timestamp: new Date().toISOString(),
requestId: req.id,
}),
);Viewing Logs
# View container logs
docker logs mycontainer
# Follow logs in real-time
docker logs -f mycontainer
# Show timestamps
docker logs -t mycontainer
# Last N lines
docker logs --tail 100 mycontainer
# Logs since specific time
docker logs --since 1h mycontainer
docker logs --since 2024-01-15T10:00:00 mycontainer
# Compose logs
docker compose logs
docker compose logs -f api
docker compose logs --tail 50 api dbLogging Drivers
Docker supports multiple logging drivers:
# Check current driver
docker info --format '{{.LoggingDriver}}'
# Run with specific driver
docker run --log-driver json-file myapp
docker run --log-driver syslog myapp
docker run --log-driver none myappLogging Drivers
| Driver | Description | Use Case |
|---|---|---|
json-file | Default, local JSON files | Development, simple setups |
syslog | System syslog | Integration with syslog |
journald | systemd journal | systemd-based systems |
fluentd | Fluentd collector | Centralized logging |
awslogs | AWS CloudWatch | AWS deployments |
gcplogs | Google Cloud Logging | GCP deployments |
none | No logging | When using app-level logging |
Configuring json-file Driver
# docker-compose.yml
services:
app:
image: myapp
logging:
driver: json-file
options:
max-size: "10m"
max-file: "5"
compress: "true"// /etc/docker/daemon.json
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "5",
"compress": "true"
}
}Warning
Without log rotation (max-size and max-file), container logs can consume
all disk space. Always configure rotation in production.
Structured Logging
JSON Logging
// Node.js with pino
const pino = require("pino");
const logger = pino({
level: process.env.LOG_LEVEL || "info",
formatters: {
level: (label) => ({ level: label }),
},
});
logger.info({ requestId: "123", userId: "456" }, "User action");
// Output:
// {"level":"info","time":1704067200000,"requestId":"123","userId":"456","msg":"User action"}# Python with structlog
import structlog
structlog.configure(
processors=[
structlog.stdlib.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.JSONRenderer()
]
)
logger = structlog.get_logger()
logger.info("user_action", request_id="123", user_id="456")
# Output:
# {"event": "user_action", "request_id": "123", "user_id": "456", "level": "info", "timestamp": "2024-01-15T10:00:00Z"}Log Levels
// Define consistent log levels
const logger = {
debug: (msg, meta) =>
console.log(JSON.stringify({ level: "debug", msg, ...meta })),
info: (msg, meta) =>
console.log(JSON.stringify({ level: "info", msg, ...meta })),
warn: (msg, meta) =>
console.warn(JSON.stringify({ level: "warn", msg, ...meta })),
error: (msg, meta) =>
console.error(JSON.stringify({ level: "error", msg, ...meta })),
};
// Use based on log level environment variable
const level = process.env.LOG_LEVEL || "info";Note
Use structured (JSON) logging in production. It enables filtering, searching, and analysis in log aggregation systems.
Centralized Logging
ELK Stack (Elasticsearch, Logstash, Kibana)
# docker-compose.yml
services:
elasticsearch:
image: elasticsearch:8.11.0
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data
ports:
- "9200:9200"
logstash:
image: logstash:8.11.0
volumes:
- ./logstash/pipeline:/usr/share/logstash/pipeline
ports:
- "5044:5044"
depends_on:
- elasticsearch
kibana:
image: kibana:8.11.0
ports:
- "5601:5601"
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
depends_on:
- elasticsearch
app:
image: myapp
logging:
driver: gelf
options:
gelf-address: "udp://localhost:12201"
volumes:
elasticsearch-data:Loki + Grafana
# docker-compose.yml
services:
loki:
image: grafana/loki:2.9.0
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
volumes:
- loki-data:/loki
promtail:
image: grafana/promtail:2.9.0
volumes:
- /var/log:/var/log
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- ./promtail-config.yml:/etc/promtail/config.yml
command: -config.file=/etc/promtail/config.yml
grafana:
image: grafana/grafana:10.2.0
ports:
- "3000:3000"
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
volumes:
- grafana-data:/var/lib/grafana
volumes:
loki-data:
grafana-data:# promtail-config.yml
server:
http_listen_port: 9080
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: docker
static_configs:
- targets:
- localhost
labels:
job: docker
__path__: /var/lib/docker/containers/*/*-json.logMetrics Collection
Prometheus Metrics
# docker-compose.yml
services:
prometheus:
image: prom/prometheus:v2.47.0
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
grafana:
image: grafana/grafana:10.2.0
ports:
- "3000:3000"
volumes:
- grafana-data:/var/lib/grafana
depends_on:
- prometheus
app:
image: myapp
ports:
- "3001:3000"
- "9100:9100" # Metrics endpoint
volumes:
prometheus-data:
grafana-data:# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "app"
static_configs:
- targets: ["app:9100"]
- job_name: "docker"
static_configs:
- targets: ["host.docker.internal:9323"]Exposing Application Metrics
// Node.js with prom-client
const client = require("prom-client");
const express = require("express");
const app = express();
// Create metrics
const httpRequestCounter = new client.Counter({
name: "http_requests_total",
help: "Total number of HTTP requests",
labelNames: ["method", "path", "status"],
});
const httpRequestDuration = new client.Histogram({
name: "http_request_duration_seconds",
help: "HTTP request duration in seconds",
labelNames: ["method", "path"],
buckets: [0.1, 0.5, 1, 2, 5],
});
// Middleware to track requests
app.use((req, res, next) => {
const end = httpRequestDuration.startTimer({
method: req.method,
path: req.path,
});
res.on("finish", () => {
httpRequestCounter.inc({
method: req.method,
path: req.path,
status: res.statusCode,
});
end();
});
next();
});
// Metrics endpoint
app.get("/metrics", async (req, res) => {
res.set("Content-Type", client.register.contentType);
res.end(await client.register.metrics());
});# Python with prometheus-client
from prometheus_client import Counter, Histogram, generate_latest
from flask import Flask, Response
app = Flask(__name__)
REQUEST_COUNT = Counter('http_requests_total', 'Total requests', ['method', 'path', 'status'])
REQUEST_LATENCY = Histogram('http_request_duration_seconds', 'Request latency', ['method', 'path'])
@app.before_request
def before_request():
request.start_time = time.time()
@app.after_request
def after_request(response):
REQUEST_COUNT.labels(request.method, request.path, response.status_code).inc()
REQUEST_LATENCY.labels(request.method, request.path).observe(time.time() - request.start_time)
return response
@app.route('/metrics')
def metrics():
return Response(generate_latest(), mimetype='text/plain')Container Metrics with cAdvisor
services:
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.47.0
ports:
- "8080:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
privileged: trueNote
cAdvisor provides detailed container metrics including CPU, memory, network, and filesystem usage. It's essential for understanding container resource consumption.
Complete Monitoring Stack
Docker Compose Setup
# docker-compose.monitoring.yml
services:
# Application
app:
image: myapp
ports:
- "3000:3000"
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
labels:
- "prometheus.scrape=true"
- "prometheus.port=9100"
# Metrics
prometheus:
image: prom/prometheus:v2.47.0
ports:
- "9090:9090"
volumes:
- ./prometheus:/etc/prometheus
- prometheus-data:/prometheus
# Container metrics
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.47.0
ports:
- "8080:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
# Log aggregation
loki:
image: grafana/loki:2.9.0
ports:
- "3100:3100"
promtail:
image: grafana/promtail:2.9.0
volumes:
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- ./promtail:/etc/promtail
# Visualization
grafana:
image: grafana/grafana:10.2.0
ports:
- "3001:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- grafana-data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning
# Alerting
alertmanager:
image: prom/alertmanager:v0.26.0
ports:
- "9093:9093"
volumes:
- ./alertmanager:/etc/alertmanager
volumes:
prometheus-data:
grafana-data:Alerting
Prometheus Alert Rules
# prometheus/alert.rules.yml
groups:
- name: container_alerts
rules:
- alert: HighMemoryUsage
expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "Container {{ $labels.name }} high memory usage"
description: "Memory usage is above 90%"
- alert: HighCPUUsage
expr: rate(container_cpu_usage_seconds_total[5m]) > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "Container {{ $labels.name }} high CPU usage"
- alert: ContainerDown
expr: up{job="containers"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Container {{ $labels.instance }} is down"Alertmanager Configuration
# alertmanager/config.yml
global:
smtp_smarthost: "smtp.example.com:587"
smtp_from: "alerts@example.com"
route:
receiver: "default"
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
routes:
- match:
severity: critical
receiver: "pagerduty"
receivers:
- name: "default"
email_configs:
- to: "team@example.com"
- name: "pagerduty"
pagerduty_configs:
- service_key: "your-pagerduty-key"
- name: "slack"
slack_configs:
- api_url: "https://hooks.slack.com/services/..."
channel: "#alerts"Docker Stats and Events
Built-in Monitoring
# Real-time container stats
docker stats
# Stats for specific containers
docker stats app db redis
# One-shot stats
docker stats --no-stream
# Custom format
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"
# Docker events
docker events
# Filter events
docker events --filter 'type=container'
docker events --filter 'event=die'Docker System Information
# Overall system info
docker system df
# Detailed breakdown
docker system df -v
# System information
docker infoQuick Reference
Logging Commands
| Command | Purpose |
|---|---|
docker logs | View container logs |
docker logs -f | Follow logs |
docker logs --tail N | Last N lines |
docker logs --since | Logs since time |
Monitoring Tools
| Tool | Purpose |
|---|---|
| Prometheus | Metrics collection |
| Grafana | Visualization |
| Loki | Log aggregation |
| cAdvisor | Container metrics |
| Alertmanager | Alert routing |
Essential Metrics
| Metric | Description |
|---|---|
container_cpu_usage_seconds_total | CPU usage |
container_memory_usage_bytes | Memory usage |
container_network_receive_bytes_total | Network in |
container_network_transmit_bytes_total | Network out |
container_fs_usage_bytes | Filesystem usage |
In the next chapter, we'll explore integrating Docker into CI/CD pipelines.