Integrating Langtrace with Inspect AI

Karthik Kalyanaraman

⸱

Cofounder and CTO

Jun 14, 2024

Introduction

Thrilled to announce that Langtrace now has first party support for Inspect AI, an open source evaluations framework that is super flexible to use.

What does this mean?

Users can now run automated evals on their datasets using techniques like 'LLM as a judge' which are model graded and automatically report the results back to Langtrace for tracking the performance of their applications.

In effect, with Langtrace, you can establish a feedback loop from day 1 for continuously monitoring and improving your LLM powered applications.

Why is this important?

Today, building a product/feature using an LLM is easy to get started and feels magical as it unlocks capabilities that were not possible before. But, the number one challenge developers face is, how to make sure the outputs coming out of their LLM stack is accurate and ensure their users have a good experience.

This is why observability coupled with human-in-the-loop annotations and a combination of manual and automated evaluations is key a tooling that you need to have as part of your stack and this has been our number one vision for Langtrace.

So how can Langtrace help?

Langtrace is open source and it's built on open telemetry standards with auto instrumentation capabilities. This means, developers can get started by integrating Langtrace as part of their stack in just 2 lines of code without spending too much engineering time. Once integrated successfully, Langtrace can- Automatically trace all the LLM, VectorDB & framework level requests.- Groups the LLM requests and presents it to you for manual evaluation and golden dataset curation for assessing the initial performance.- Use the curated datasets to set up and run evaluations on various integration points such as your CI/CD tooling - Github actions, Jenkins etc.- Allows you to version and manage prompts.- Lets you compare and contrast between different models in the prompt playground.This enhances the developer experience of building and scaling LLM powered applications by providing you with rich data and objective metrics in order to gain confidence.