Grokking MIPROv2 - the new optimizer from DSPy

Karthik Kalyanaraman

⸱

Cofounder and CTO

Oct 24, 2024

Introduction

MIPROv2 is the new state of the art optimizer from DSPy. If you are new to DSPy, it is a python library implementation of the Stanford DSP Paper that has identified a few key techniques to optimize and find the best possible prompts for the tasks in hand. DSP (Demonstrate - Search - Predict) proposes a fundamental idea — "prompt optimization" as opposed to "prompt engineering" can yield better results when working with Language Models. And as part of the proposal, DSPy exposes constructs that lets you "program" and identify the best prompt that is suited for the task at hand. The techniques that DSPy uses are also called as optimizers and one such optimizer is MIPROv2 or Multiprompt Instruction PRoposal Optimizer Version 2.

In this post, we will go over the high level procedure of how MIPROv2 works and why it yields better results compared to previous techniques.

MIPROv2

First we start with a goal — Given a dataset made up of good examples of input output pairs and an evaluation function that validates whether the generated output for a given input is correct or not, generate the most optimal prompt for solving problems that will generate correct output given a randomly selected input. To solve this program, the MIPROv2 optimizer goes through 3 steps.

Step 1

The goal of this step is to generate a good set of demos to show what the ideal input output pairs look like. Using the provided dataset(training set that is labeled), the optimizer generates 'n' sets of 'x' number of demos that can be used with the prompt. To generate each set of 'x' demos, the optimizer uses a combination of labeled demos (directly picking from the dataset) and bootstrapped demos - picking random inputs from the dataset and generating outputs for each using the LLM and making sure it passes the criteria set by the evaluation function. At the end of this step, it has 'n' sets of 'x' demos.

Step 2

The goal of this step is to generate a good set of instructions(or prompts) that will likely generate the ideal input output pairs. In order to do this, the optimizer uses two inputs - A LLM generated summary of the demos generated in step 1 and a LLM generated description of the program code written by the user trying to infer the intention of the program. Using these two inputs, the optimizer generates again 'n' instructions (matching 'n' sets of demos generated in the step 1)

Step 3

The goal of this step is to run a Bayesian Optimization algorithm to figure out the best instruction - demo pair that can be used to generate the most ideal prompt. In order to do this, the optimizer runs 'y' evaluation trials. In each trial, a demo - instruction pair is randomly selected and evaluated against a batch of input - output pairs from the validation set. The scores are then calculated for each pair. The best score is kept track of at the end of each trial. Finally, the instruction-demo pair with the best evaluated score is returned. This is used for generating the prompt.

If you are curious to dive deep and understand more about this prompt optimization technique, check out the research paper here.

If you would like to start using this optimizer, check out the dspy docs here.

Langtrace x DSPy

Langtrace natively supports the tracing and monitoring of key metrics from DSPy optimizers and pipelines. This is helps you with understanding how a chosen module or an optimizer from DSPy works under the hood and gives you key visibility into better optimizing the performance of your application.

For more information, check out our previous blog post on this integration here.

Here are some additional threads that people have found helpful: