We Didn’t Plan for Logging — Our Observability Mistakes in Azure

Author: Explain My Stack Team
Date: July 22, 2022   |   Read time: 4 min

Azure Monitoring Dashboard

We built what we thought was a robust API pipeline on Azure — leveraging best practices in scalability, security, and deployment. But we overlooked one essential aspect that would later haunt us: logging. Months after our go-live, we found ourselves in a reactive spiral, struggling to diagnose performance degradation, troubleshoot issues, and understand user behaviour.

Here’s a breakdown of what we missed, what it cost us, and what we’d do differently if we had the chance to rewind.

🔍 What We Missed

  • App Insights wasn’t wired into all services: Some of our microservices had telemetry configured, others didn’t. The result? Incomplete visibility across our stack. Dependencies, traces, and requests weren't being monitored uniformly.
  • No structured log format was agreed on: Different teams logged in different formats — or not at all. This made it difficult to correlate events across services or extract meaning from the logs programmatically.
  • Diagnostic settings weren’t enabled by default: Azure provides valuable diagnostic logs for services like API Management, Application Gateway, and Azure Functions. We failed to turn these on early, missing critical events and metrics.

⚠️ The Consequences

  • Blind spots in performance issues: When latency spiked, we had no clear trace of where delays originated — application tier, network, or downstream APIs. Troubleshooting was slow and frustrating.
  • Inability to isolate root causes: Without structured logs or telemetry correlation, every incident became a guessing game. Was it a bug? A configuration issue? A usage spike? We had no answers.
  • Stakeholder frustration: Internally, our stakeholders and leadership couldn’t get clarity on issues, eroding trust in the platform. Externally, users experienced outages we couldn’t explain or prevent from recurring.

✅ What We’d Do Differently

  • Define a logging and observability strategy early: Before writing code, align on how logs, metrics, and traces will be used. Decide what “good” looks like for operational visibility.
  • Use structured logs with correlation IDs: Logging in JSON format with consistent fields (timestamp, service, operation ID, correlation ID, etc.) enables effective parsing, filtering, and root cause tracing.
  • Enable diagnostic settings by default: Whether through Azure Portal, ARM templates, Bicep, or Terraform, ensure diagnostic logging is always configured for all resources — and sent to centralized destinations like Log Analytics or a SIEM.
  • Automate and visualize with dashboards: Real-time dashboards, powered by KQL in Log Analytics or via Grafana/Azure Monitor Workbooks, help identify anomalies and track system health proactively.

📌 Key Takeaway

You can’t fix what you can’t see. Observability is not a nice-to-have — it’s a first-class citizen in any cloud-native system. Treat it with the same priority as your deployment pipeline or security posture.

Logging isn’t just about debugging; it’s about enabling insight, accountability, and confidence in the systems you build. Don’t let it be an afterthought — or you’ll learn the hard way, like we did.