Introducing Langtrace Evaluations
Karthik Kalyanaraman
⸱
Cofounder & CTO
Apr 10, 2024
Introduction
We are excited to show our new and improved Evaluations Dashboard. We learned from our early users that improving the RAG/model accuracy and gaining confidence with deploying their LLM based apps to production has been the number 1 priority.
🚢 Ship with confidence
What gets measured, gets managed
To solve for this, we have built a couple of things:
1. Create tests with different scoring scales and automatically capture LLM requests to these tests using Langtrace's SDK.
2. Evaluate the the requests by scoring against the response provided by the LLM to measure the overall average of each test.
Effectively, teams can come up with a release criteria and effectively evaluate, measure and understand how their products are performing over time.
For example: Teams can setup a release criteria like - "Factual Accuracy > 99%, Response Quality > 95%, Response Bias > 85%, Context Recall > 90%" and measure their product's performance against this release metric with Langtrace.
Hotkeys (🔥 ↕ ↩ ◀)
Additionally, we also realized that the user experience is extremely important for effective and fast evaluations. As a result, the evaluations flow is fully optimized for hot keys and as an evaluator, you can breeze through a series of evaluations with just the arrow keys, enter and backspace without having to click through a bunch of times for each request.
Finally, all of this can be setup with just 2 lines of code and Langtrace's Evaluation's dashboard will start capturing the requests in the appropriate test automatically 🍃 🤖
If you are using LLM's in production, why ship in dark?. Don't forget to check out Langtrace and star the repository on Github.
Website - https://langtrace.ai/
⭐ us on Github - https://github.com/Scale3-Labs/langtrace
Ready to deploy?
Try out the Langtrace SDK with just 2 lines of code.
Want to learn more?
Check out our documentation to learn more about how langtrace works
Join the Community
Check out our Discord community to ask questions and meet customers