Run your LLM Apps locally using Ollama and Debug with Langtrace

Yemi Adejumobi

⸱

Platform Engineer

Jul 3, 2024

Introduction

In the rapidly evolving landscape of Artificial Intelligence, large language models (LLMs) have become increasingly powerful and ubiquitous. However, the costs and complexities associated with running these models in cloud environments can be prohibitive, especially for developers and small teams looking to experiment and innovate.

Enter Ollama, a game-changing tool that brings the power of LLMs to your local machine. This blog post will explore how Ollama can simplify your development process, allowing you to run LLM applications easily and efficiently while adding Langtrace. This open-source observability tool complements Ollama perfectly, providing crucial insights into your LLM application's performance and behavior. Let's dive in.

What is Ollama?

Ollama is an innovative tool that enables running large language models (LLMs) locally, providing a cost-effective solution for testing and development. By running LLMs locally, you can experiment and refine your ideas without incurring significant production costs.

By running LLMs locally, you can:

Reduce cloud costs: Save on cloud computing expenses by running LLMs on your local machine.
Faster experimentation: Quickly test and iterate on your ideas without relying on remote servers.
Improved data privacy: Keep your data local and secure, reducing the risk of data breaches.

Setting up Ollama and running LLMs locally

For this step, we will be using Meta’s latest open source model, Llama3. For most optimal performance with Ollama ensure your laptop has at least 16GB of RAM. If you do then follow these steps:

Download and install Ollama https://ollama.com/download/Ollama-darwin.zip
Download the desired LLM model (e.g., Llama3 or other open-source models). In a terminal window run the following to run llama3 locally for example

ollama run llama3

This is similar to docker commands, it will pull and run the llama3

Once it is done pulling, you should have a terminal prompt you can start chatting from.

For further customization and to use Modefile to create your own custom system prompt, refer to Ollama documentation here.

Instrumenting Ollama with Langtrace

Now that you have a local LLM, let’s say you are building a customer service bot and you would like to view detailed traces on the LLM requests, this is where Langtrace shines. Langtrace provides a Python SDK that enables observability for Ollama, allowing you to trace LLM calls and gain valuable insights into your application's performance. To instrument Ollama with Langtrace:

Generate an API key from langtrace.ai - you can also self-host.
Install the Langtrace Python or Typescript SDK.
Import the SDK and initialize the SDK.
Start tracing!

Example code snippet:

from langtrace_python_sdk import langtrace, with_langtrace_root_span
import ollama
from dotenv import load_dotenv

load_dotenv()

# langtrace.init(write_spans_to_console=False)
langtrace.init(api_key = 'YOUR_API_KEY', write_spans_to_console=False)

@with_langtrace_root_span()
def give_recs():
  response = ollama.chat(model='llama3', messages=[
    {
      'role': 'user',
      'content': 'You are an AI assistant with expertise in mens clothing. Help me pick clothing for a black tie dinner at work.',
    },
  ])
  print(response['message']['content'])


if __name__ == "__main__":
  print("Running fashionista bot...")
  give_recs()

Here is what the trace looks like in Langtrace UI

Here is a link to a reference cookbook for Ollama integration with Langtrace.

Tracing LLM call

With Langtrace, you can now trace LLM calls and capture essential metadata, such as:

Input, Output and Total tokens
Latency
Error rates

This data provides valuable insights into your application's performance, helping you optimize and improve it over time.

In the next blog in this series, we will cover how to use Langtrace to perform evaluations on your application’s accuracy and optimize its behavior.

Quick Update

I added a UI option to this bot. Feel free to check out the code here. I use Streamlit for the UI but you can swap it out for Gradio or any other library.

To see this in action, install Streamlit

pip install streamlit

Then run the code using

streamlit run ollama-fashionistav2.py

Next steps

In conclusion, combining Ollama's local LLM capabilities with Langtrace's observability features unlocks a powerful toolset for building and optimizing LLM applications. By following the steps outlined in this post, you can leverage the benefits of running LLMs locally with Ollama, including reduced cloud costs, accelerated experimentation, and improved data privacy.

With Langtrace, you can gain valuable insights into your application's performance, identify bottlenecks, and optimize its behavior. By integrating Ollama and Langtrace, you can build more efficient, effective, and innovative LLM applications. Try out Ollama and Langtrace today and discover the advantages of local LLM development and open-source observability for yourself!

Ready to try Langtrace?

Try out the Langtrace SDK with just 2 lines of code.

Get Started

Star on Github

433

Ready to deploy?

Try out the Langtrace SDK with just 2 lines of code.

Get Started

Want to learn more?

Check out our documentation to learn more about how langtrace works

Documentation

Join the Community