Why you need OpenTelemetry based Observability for your AI apps

Karthik Kalyanaraman

Karthik Kalyanaraman

· 5 min read
Langtrace - OpenTelemetry x LLM Observability

Introduction

With the advent of LLMs, modern software development is going through an important shift - from mostly deterministic systems that can be reasoned through logic to non-deterministic inference endpoints. While LLMs are enabling developers to build innovative applications that were not possible before, they are also creating some interesting challenges when it comes to software quality, testing and debugging.

New Challenges

One of the biggest challenges of LLMs is it's non deterministic nature. In simple terms, the LLM's behaviour is governed by uncontrollable variables like model weights, training data etc. which does not guarantee predictability of the response. A traditional GET API endpoint reading a database and returning a JSON response is guaranteed to return a JSON response every single time or fail. But, asking an LLM to return a structured JSON response is not guaranteed and identifying/understanding why it failed the one time during the day is close to impossible as it depends on uncontrollable variables.

Image

This has big consequences to software testing, debugging and quality. When you cannot deterministically reproduce a bug, how can you fix it? One of the ways engineering teams are approaching solving this problem is by putting effective guard rails by using a combination model iteration, fine tuning and prompt engineering to control the behaviour of these systems. In reality, this will be an ongoing process of continuous improvement to get to the dream land of 100% model accurately and quality. This means defining metrics, having tools to gain visibility and diligently measuring and tracking the defined metrics will be of utmost importance to get the most out of these systems.

OpenTelemetry - an important standard

In order to effectively measure these new metrics of applications built using LLMs, we need the ability to generate and capture telemetry data from the LLM interaction layer which is typically made up of frameworks like Langchain and LlamaIndex, vectorDBs like Pinecone and PgVector and LLMs like OpenAI, Cohere and Anthropic.

Luckily we do not need to re-invent the wheel here as significant strides have been made in the last few years in pushing a standard data model for telemetry spans and traces primarily driven by OpenTelemetry, a CNCF incubated project that has wide adoption across the industry.

OpenTelemetry defines a standard data model for spans and traces, the building blocks of any observability system which guarantees minimal intrusion while maximizing high cardinality for effective debugging, visibility and incident response. OpenTelemetry also prevents vendor lock in by letting teams switch between different observability tools as the data model is fixed and widely adopted by various libraries, frameworks and the industry at large.

Langtrace SDK - the first step

The first step towards this goal is to equip teams with open telemetry(o11y) tracing capabilities for the LLM software layer. With Langtrace SDK, we are building open source SDKs for popular languages in order to capture o11y standard traces from LLM frameworks, vectorDBs and LLM providers. These SDKs are fully compatible with any of the available o11y exporters that can send these traces to any storage system or observability backend.

Image

Langtrace Cloud - the observability layer you need

New metrics demands the need for new capabilities on the observability client. This includes but not limited to:

  • Running tests and manual/automated evaluations.
  • Upload reference datasets and download captured, and annotated datasets.
  • Prompt management and a versioning.
  • Token usage tracking.

With Langtrace Cloud, we are building a light weight client that is hyper optimized to solve for the above needs while serving as an additional observability layer beside your existing observability solution.

Image

Conclusion

LLMs are a fascinating technology that is supercharging a leap in computing. It is important to pick the right set of tools early in the adoption journey to gain control, visibility and confidence to build and ship high quality software with new and innovative capabilities.

Karthik Kalyanaraman

About Karthik Kalyanaraman

Cofounder and CTO - Langtrace AI, Scale3 Labs