Observability. (Page 2)

🚀 Introducing Observability Architect GPT — Your Copilot for All Things Telemetry

I’ve been tinkering with AI‑assisted workflows for months, and today I’m thrilled to announce Observability Architect GPT, now live in ChatGPT. Think of it as a drop‑in teammate that speaks fluent metrics, logs, and traces — and never sleeps. ChatGPT - Observability ArchitectAn expert in observability, monitoring,

Datadog’s Toto: Open-Source Observability Model

Datadog unveils Toto, a 151M-parameter time-series foundation model and BOOM benchmark, sparking a new open-source, AI-driven observability wave.

Brown kraft paper peeled back to expose the ‘A’ and ‘I’ keys of a keyboard, symbolizing pulling back the curtain to observe what happens behind the scenes of AI.

Observability for LLMs: Why It Matters and How to Achieve It

Observability is crucial for LLM apps to ensure performance, reliability, and user trust. It involves monitoring metrics, logging prompts, outputs, and user interactions.

Building an In-House Observability Platform with a Data Lake (AWS S3 + Apache Iceberg)

In-house observability with data lake: unify metrics, logs & traces on AWS S3 + Iceberg to slash cost, dodge vendor lock-in & boost analytics.

Scaling Observability: Designing a High-Volume Telemetry Pipeline - Part 4

In Part 3, we explored building scalable telemetry pipelines with agents, batching, Kafka buffering, and backpressure control for resilient observability. Now let's bring it home with this last part of our blog series by addressing how to make the entire pipeline horizontally scalable and highly available, explore cost

Scaling Observability: Designing a High-Volume Telemetry Pipeline - Part 3

In Part 2, we saw that scaling observability pipelines involves specialized strategies for each telemetry signal type. For metrics, scalable architectures use distributed storage, aggregation, downsampling, etc. to handle high volumes. Traces pipelines employ sampling strategies like head-based, tail-based, and remote sampling to manage trace volume. While for Logs, it

Groundcover Secures $35M to Challenge Observability Giants

In a significant shake-up within the observability space, Israeli startup Groundcover recently announced a $35 million Series B funding round, led by Zeev Ventures, bringing their total funding to $60 million. This latest funding underscores the industry's strong appetite for modern, streamlined observability solutions designed for cloud-native ecosystems.

Scaling Observability: Designing a High-Volume Telemetry Pipeline - Part 2

In the Part 1, we saw that a scalable pipeline architecture consists of data collection, processing, storage, and querying stages, with key design principles including horizontal scaling, stateless processing, and backpressure management. Now, let's examine specialized scaling strategies for each telemetry signal type. Scaling Observability: Designing a High-Volume

â€œIllustration of a neon data super-highway representing high-volume observability telemetry flow.

Scaling Observability: Designing a High-Volume Telemetry Pipeline - Part 1

Building observability in large-scale, cloud-native systems requires collecting telemetry data (metrics, traces, and logs) at extremely high volumes. Modern platforms like Kubernetes can generate millions of metrics, traces, and log events per second, and enterprises often must handle this flood of telemetry across hybrid environments (on-premises and cloud). Designing a

Building Your First Observability Stack with Open‑Source Tools

Goal: Spin up Prometheus+Alertmanager, Loki+Promtail, Jaeger, and Grafana with a single docker‑compose.yml, then watch a tiny Java HTTP service emit metrics, logs, and traces—all in less than an hour. Why this post? You keep hearing that “observability ≠ monitoring” and that you need metrics, logs, and

Using the OpenTelemetry Collector: A Practical Guide

OpenTelemetry’s Collector is a vendor-neutral service that sits between your applications and observability backends. It can receive telemetry data (traces, metrics, logs), process or transform it, and export it to one or multiple destinations. In a production environment, the Collector becomes essential for building a flexible and resilient observability

Getting Started with OpenTelemetry (OTel 101)

OpenTelemetry (OTel) has quickly become a cornerstone of modern observability. If you’re a developer or engineer looking to instrument your applications for better insight, this beginner’s guide is for you. I’ll explain what OpenTelemetry is, why it matters, and walk through a step-by-step tutorial to instrument a

Fundamentals

Glossary of Observability Terms (Beginner’s Cheat Sheet)

Welcome to the world of observability! If you’re new to this field, all the jargon and acronyms can feel overwhelming. But fear not—this beginner’s cheat sheet will walk you through the essential observability terms in plain language. Use it as a reference whenever you encounter an unfamiliar

Getting Started

What is Observability? – Metrics, Logs, and Traces Demystified

Imagine you’re a detective for software systems. Late one night, an alert goes off: something is wrong with your application. But what is wrong? In a complex microservices environment, finding the culprit can feel like searching for a needle in a haystack. This is where observability comes in. Observability

Latest