Building Observable RAG Applications with Graphlit and Langtrace

Obinna Okafor

⸱

Software Engineer

Feb 5, 2025

Retrieval-Augmented Generation (RAG) has become a cornerstone technique in modern LLM applications, enabling more accurate and context-aware AI responses. While tools like LangChain and LlamaIndex are popular choices for implementing RAG, today we'll explore Graphlit, a cloud-native alternative that offers some unique advantages. We'll build a practical RAG application and show how Langtrace can help us understand what's happening under the hood.

Understanding Different RAG Approaches

Before diving into Graphlit, let's understand how RAG implementations typically differ across popular frameworks:

LangChain Approach

LangChain takes a modular approach, giving developers fine-grained control over each component:

from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

# 1. Load and process document
loader = WebBaseLoader("https://example.com/article")
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, 
    chunk_overlap=200
)
splits = text_splitter.split_documents(documents)

# 2. Create embeddings and store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(splits, embeddings)

# 3. Create QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(),
    retriever=vectorstore.as_retriever(),
    return_source_documents=True
)

# 4. Query the system
response = qa_chain("What are the key features discussed in the article?")

Key characteristics:

Explicit control over text splitting and embedding
Multiple vector store options
Flexible chain composition
Component-level customization

LlamaIndex Implementation

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SimpleNodeParser

# 1. Load and process document
documents = SimpleDirectoryReader("data").load_data()
parser = SimpleNodeParser.from_defaults()
nodes = parser.get_nodes_from_documents(documents)

# 2. Create index and query engine
index = VectorStoreIndex(nodes)
query_engine = index.as_query_engine(
    response_mode="tree_summarize",
    streaming=True
)

# 3. Query the system
response = query_engine.query(
    "What are the key features discussed in the article?"
)

Key characteristics:

Document-centric indexing
Automatic node parsing
Built-in query engine

Graphlit Implementation

Graphlit offers a cloud-native, unified approach:

import asyncio
from graphlit import Graphlit
from graphlit_api import input_types, enums
from openai import OpenAI

async def build_rag_app():
    # Initialize clients
    graphlit = Graphlit(
        organization_id="your-org-id",
        environment_id="your-env-id",
        jwt_secret="your-jwt-secret"
    )
    openai_client = OpenAI()
    
    # 1. Ingest content
    ingest_response = await graphlit.client.ingest_uri(
        uri="https://example.com/article",
        is_synchronous=True
    )
    
    # 2. Create model specification
    specification = await graphlit.client.create_specification(
        input_types.SpecificationInput(
            name="OpenAI GPT-4",
            type=enums.SpecificationTypes.COMPLETION,
            serviceType=enums.ModelServiceTypes.OPEN_AI,
            openAI=input_types.OpenAIModelPropertiesInput(
                model=enums.OpenAIModels.GPT4O_128K,
            )
        )
    )
    
    # 3. Create conversation
    conversation = await graphlit.client.create_conversation(
        input_types.ConversationInput(
            name="Product Analysis",
            specification=input_types.EntityReferenceInput(
                id=specification.create_specification.id
            )
        )
    )
    
    # 4. Format question with context
    formatted_response = await graphlit.client.format_conversation(
        "What is the main product offering?",
        conversation.create_conversation.id
    )
    
    # 5. Get completion from LLM
    completion = ""
    stream_response = openai_client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": formatted_response.format_conversation.message.message}],
        temperature=0.1,
        stream=True
    )
    
    for chunk in stream_response:
        if chunk.choices[0].delta.content:
            completion += chunk.choices[0].delta.content
    
    # 6. Store completion
    await graphlit.client.complete_conversation(
        completion,
        conversation.create_conversation.id
    )

Key Characteristics:

Unified API for all operations
Managed infrastructure
Conversation-centric design

Key Differences in Graphlit's Approach

Graphlit takes a different approach to RAG implementation:

Unified API: Unlike LangChain and LlamaIndex where you explicitly manage each component (document loading, chunking, embeddings), Graphlit handles these steps internally through its ingestion pipeline.
Cloud-Native Architecture: While other frameworks run locally by default, Graphlit is designed for cloud deployment, making it easier to scale and manage in production.
Conversation-Centric: Graphlit organizes interactions around conversations rather than just queries, making it natural to build chatbots and interactive applications.
Managed Infrastructure: There's no need to set up and maintain vector stores or embedding services - Graphlit handles this infrastructure for you.

Adding Observability with Langtrace

Now that we understand how Graphlit implements RAG, let's add observability to see what's happening under the hood. We'll use Langtrace to trace our RAG operations:

from langtrace_python_sdk import langtrace
from langtrace_python_sdk.utils.with_root_span import with_langtrace_root_span

# Initialize Langtrace
langtrace.init()

@with_langtrace_root_span()
async def build_rag_app():
    try:
        # Previous Graphlit code here...
        pass
    except Exception as e:
        raise

Here's a trace visualization of a complete RAG operation:

Conclusion

Graphlit offers a unique approach to building RAG applications, with a focus on cloud-native deployment and managed infrastructure. When combined with Langtrace's observability capabilities, we get deep insights into our RAG operations, helping us build more reliable and performant applications.

While tools like LangChain and LlamaIndex offer more flexibility in terms of local development and component customization, Graphlit's integrated approach can significantly reduce the complexity of building and deploying RAG applications at scale.

Remember that the choice of framework depends on your specific needs. Regardless of your choice, adding observability through Langtrace will help you understand and optimize your RAG applications better.