agentskills.codes
LO

logging-monitoring

Implement observability patterns including structured logging, log levels, correlation IDs, metrics, and distributed tracing. Use when adding structured logging, implementing correlation IDs for request tracing, configuring metrics collection, setting up distributed tracing, or designing alerting ru

Install

mkdir -p .claude/skills/logging-monitoring && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/16629" && unzip -o skill.zip -d .claude/skills/logging-monitoring && rm skill.zip

Installs to .claude/skills/logging-monitoring

Activation

This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.

Implement observability patterns including structured logging, log levels, correlation IDs, metrics, and distributed tracing. Use when adding structured logging, implementing correlation IDs for request tracing, configuring metrics collection, setting up distributed tracing, or designing alerting rules.
304 chars✓ has a “when” triggerlonger than Claude Code's old 250-char listing cap (fine on current versions)

About this skill

Logging & Monitoring

Purpose: Implement observability for production systems. Goal: Structured logs, correlation across requests, actionable metrics. Note: For implementation, see C# Development or Python Development.


When to Use This Skill

  • Adding structured logging to applications
  • Implementing request correlation IDs
  • Configuring metrics collection
  • Setting up distributed tracing (OpenTelemetry)
  • Designing alerting rules and health checks

Prerequisites

  • Logging framework installed
  • Monitoring platform access

Decision Tree

Observability concern?
+- What to log?
| +- Request start/end -> INFO with correlation ID
| +- Expected errors -> WARN (validation, not-found)
| +- Unexpected errors -> ERROR with stack trace
| - Debug details -> DEBUG (disabled in production)
+- What NOT to log?
| - PII, passwords, tokens, credit cards -> NEVER
+- Metrics needed?
| +- RED metrics: Rate, Errors, Duration (for services)
| - USE metrics: Utilization, Saturation, Errors (for resources)
+- Distributed tracing?
| - OpenTelemetry for cross-service correlation
- Alerting?
 +- SLO-based: alert on error budget burn rate
 - Avoid alert fatigue: page only for actionable issues

Structured Logging

Concept

Log structured data (key-value pairs) instead of plain text for better searchability and analysis.

[FAIL] Unstructured (hard to parse):
 "User [email protected] logged in from 192.168.1.1 at 2024-01-15 10:30:00"

[PASS] Structured (machine-readable):
 {
 "event": "user_login",
 "user_email": "[email protected]",
 "ip_address": "192.168.1.1",
 "timestamp": "2024-01-15T10:30:00Z",
 "level": "INFO"
 }

Benefits

  • Searchable: Query by any field
  • Filterable: Show only errors, specific users, etc.
  • Aggregatable: Count events, calculate averages
  • Parseable: Tools can process automatically

Log Levels

Standard Levels

LevelWhen to UseExample
TRACEVery detailed debugging"Entering function with params: {x: 1, y: 2}"
DEBUGDebugging information"Cache hit for key: user_123"
INFONormal operations"User logged in", "Order created"
WARNUnexpected but recoverable"Retry attempt 2 of 3", "Rate limit approaching"
ERRORFailures requiring attention"Payment failed", "Database connection lost"
FATALApplication cannot continue"Out of memory", "Configuration invalid"

Level Configuration by Environment

Development: DEBUG or TRACE
 - See detailed information for debugging

Staging: INFO
 - Normal operations plus warnings/errors

Production: INFO (or WARN)
 - Reduce noise, focus on significant events
 - Keep ERROR/FATAL always enabled

Core Rules

PracticeDescription
Structured loggingJSON format with key-value pairs
Correlation IDsTrace requests across services
Appropriate levelsDEBUG in dev, INFO+ in prod
No sensitive dataNever log passwords, tokens, PII
Context in errorsInclude what, why, and how to fix
Meaningful metricsTrack rate, errors, duration
Health checksLiveness + readiness endpoints
Actionable alertsInclude runbooks, reduce noise

Anti-Patterns

  • Log and Forget: Writing logs but never querying or reviewing them -> Set up dashboards and alerts on ERROR/FATAL; review logs in incident postmortems
  • PII in Logs: Logging email addresses, passwords, tokens, or credit card numbers -> Scrub sensitive fields before logging; use allowlists for loggable fields
  • Unstructured Strings: Logging plain text messages that are hard to parse or search -> Use structured logging (JSON key-value pairs) for all log entries
  • Missing Correlation: Logs from different services with no shared request ID -> Propagate W3C trace context or a correlation ID header across all service calls
  • Alert Fatigue: Alerting on every warning or non-actionable metric -> Page only on SLO budget burn rate; group related alerts; include runbook links
  • Debug in Production: Running production with DEBUG or TRACE level enabled -> Use INFO or WARN in production; enable DEBUG temporarily and only on specific components
  • Metric Overload: Tracking hundreds of custom metrics with no clear purpose -> Focus on RED (Rate, Errors, Duration) for services and USE (Utilization, Saturation, Errors) for resources

Observability Tools

CategoryTools
LoggingELK Stack, Splunk, Datadog Logs, CloudWatch Logs
MetricsPrometheus + Grafana, Datadog, New Relic, CloudWatch
TracingJaeger, Zipkin, Datadog APM, Application Insights
All-in-OneDatadog, New Relic, Dynatrace, Elastic Observability

See Also: Error Handling - C# Development - Python Development

Troubleshooting

IssueSolution
Logs not appearing in monitoring platformCheck log level configuration, verify sink/exporter endpoint
Correlation IDs missing across servicesPropagate W3C trace context headers in all HTTP calls
Alert fatigue from too many notificationsSet meaningful thresholds, group related alerts, add alert suppression windows

References

More by jnPiyush

View all by jnPiyush

ux-ui-design

jnPiyush

Design user experiences with wireframing, prototyping, user flows, accessibility, and production-ready HTML prototypes. Use when creating wireframes, building interactive prototypes, designing user flows, implementing accessibility standards, or producing HTML/CSS design deliverables.

00

copilot-studio-agents

jnPiyush

Design Microsoft Copilot Studio agents (formerly Power Virtual Agents) -- topics, trigger phrases, generative answers, knowledge sources, connector and MCP actions, authentication, channels, and agent flows -- so an agent can author the conversational logic that ships as a Bot component inside a Pow

00

verification-before-completion

jnPiyush

Block false completion claims. Force the agent to identify the claim, run the exact verification command, read the actual output, compare against the claim, and only then report. Use whenever an agent is about to say "done", "fixed", "tests pass", "deployed", "loop complete", or close an issue.

00

configuration

jnPiyush

Implement configuration management patterns including environment variables, secrets, feature flags, and validation strategies. Use when setting up app configuration, managing environment-specific settings, implementing feature flags, storing secrets securely, or validating configuration at startup.

00

docx

jnPiyush

Read, write, and transform Microsoft Word .docx files. Use when extracting text or tables from Word documents, generating reports from templates, applying styles, inserting images, building tables, or converting Markdown/HTML to Word.

00

error-handling

jnPiyush

Implement robust error handling with exceptions, retry logic, circuit breakers, and graceful degradation. Use when designing error handling strategies, implementing retry policies, adding circuit breakers, configuring timeouts, or building health check endpoints.

00

Search skills

Search the agent skills registry