DSPy Assertions: A PR Review's POV

Manish Shetty

December 15, 2023

Filed under “

Prompting emerged as a way to “interact” with LMs due to the naturalness of language. We soon realized one needs to “engineer” their prompts to get desired outputs from these stochastic machines. While interactions are natural and imprecise, engineering is anything but.

Mixing the medium for interaction and engineering results in hand-crafted prompt templates—an error-prone practice, often verbose, and doesn’t always generalize.

That leads us to the question: How can we program LMs without prompting?

Language model (LM) programs are a programming paradigm that combines the precision of conventional programming with the flexibility of LMs. DSPy is a framework for creating such programs declaratively, with automatic prompt tuning through compilation. This short blog summarizes how DSPy works in a nutshell. You can try DSPy in this free Colab. Let’s build a simple ubercool LM program.

PR Reviewer with DSPy

This program reviews a GitHub pull request:

class Reviewer(dspy.Module):
  def __init__(self):
    self.gen_review = dspy.ChainOfThought("pull_request -> review, status")

  def forward(self, pr):
    return self.gen_review(pull_request=pr)

You could use this program as follows:

reviewer = Reviewer()
result = reviewer(pr="def add(a, b): return a - b")
print(result.review, result.status)

# review: ... could be improved by renaming 
# the function and adding proper error handling.
# status: Changes requested

DSPy also automatically compiles quality few-shot prompts for your program, so you don’t have to! You can use it to build a more robust reviewer in a few lines and some I/O pairs. Imagine collecting a few hundred I/O pairs from GitHub PRs and compiling a reviewer that is robust to unseen PRs. 🤩

In the code, is_valid is the optimizer metric and could be a fuzzy string match or an LLM call that checks the review’s validity.

teleprompter = BootstrapFewShot(metric=is_valid)
reviewer = teleprompter.compile(Reviewer(), trainset=io_pairs)

Reflection: Is this a good reviewer? 🤔

drawing It is relatively easy to write a valid review for a PR.

But, a good reviewer writes concise, constructive, and informative reviews. The question is, how do we capture and ensure these properties in a program?

“Sketching” the Solution (pun intended)

In traditional program synthesis, particularly in sketching [Solar-Lezama, 2008], developers provide a high-level outline of a program— a sketch—along with a set of assertions that specify the desired behavior. The synthesizer then fills in the details, turning the sketch into a fully-fledged program that adheres to the assertions.

Assertions in a sketch enable you to express intuitive insights without overthinking the implementation details.

💡 Wait? By now, we know that DSPy is a sophisticated program synthesizer at its core. It takes a program specification (Reviewer) and a set of I/O pairs (io_pairs) and returns a tuned program (prompt) that satisfies the spec.

Why not guide DSPy with assertions?

Introducing DSPy Assertions

We introduce to DSPy: LM Assertions. As simple as one-liners, they are assertion style constraints on LM outputs. We distinguish two types of constraints: Assert (hard) and Suggest (soft):

dspy.Assert(constraint: bool, message: str)
dspy.Suggest(constraint: bool, message: str)

Unlike regular assertions, LM assertions are more than just monitors. On violating the constraint, the execution pauses, and the LM program attempts to recover from the violation by backtracking to the failing module. During recovery, the construct uses reflective information to self-correct and continue execution. Asserts fail the program if irrecoverable, while Suggests continue execution. More about how this works in our paper. This tweet is also a great start.

Here’s how we can use a few LM assertions to build our “good” reviewer:

class Reviewer(dspy.Module):
  def __init__(self):
    self.gen_review = dspy.ChainOfThought("pull_request -> review, status")

  def forward(self, pr):
    result = self.gen_review(pull_request=pr)

    # Assert that the review is concise.
    dspy.Assert(len(result.review) < 2000, 
        "review must be concise")

    # Suggest that the review be constructive.
    dspy.Suggest(not result.review.startswith("lgtm"), 
        "review must be constructive")

    return result

That’s it! We can now use DSPy to compile a reviewer that satisfies these assertions and suggestions. Our paper evaluates these new constructs and finds that resulting programs are more robust and performant!

Conclusion

LM assertions are simple powerful constructs for guiding LMs toward desired outputs. There are natural connections between traditional program synthesis and self-refining LM programs with assertions. We are excited to explore these connections further and build a next-generation programming paradigm.

Read more in our paper and checkout our Github.


Idempotence and Recovery

Manish Shetty

October 30, 2023

Filed under “

Recovery capability is an important aspect of modern systems. In most cases, recovery is used to repair the state of the program in the rare event that an execution failure occurs.

Checkpoints provide a simple way to implement recovery. However, they have several challenges:

  1. Checkpointing entire program state is expensive.
  2. They have limited application visibility and often end up being overly aggresive in saving more state than necessary.

To address these challenges, we could study the core property that enables simple recovery via re-execution: idempotence. There are approaches like transactional semantics that can be used to implement efficient recovery. However, they require the programming language to support a specific programming model.

Idempotence is the property of a program that re-execution is free of side-effects.

In contrast, to explicit checkpointing, idempotence allows the architecture state at the beginning of a code region to be used as an implicit checkpoint (that is never saved). On execution failure, idempotence allows repairing the state via a simple re-execution.

The definition of idempotence looks simple, but is not easy to apply in practice. Particularly challenging is the fact that most real-world programs are not completely idempotent; they have some side-effects. Let’s define side-effects more formally.

Side-effects

A side-effect is a change in the state of the program that is not visible to the caller. For instance, consider the following code snippet:

def foo(x, y):
    return x + y

The function foo has no side-effects, since it manipulates only the arguments passed to it. In contrast, consider the following code snippet:

total = 0

def bar(x, y):
    global total
    total += x + y
    return total

In this version of the function bar we have added a global variable total outside of the function that is modified. This creates a side effect, since the function now affects something outside of its immeditate scope.

Real-world programs look more like bar. They have side-effects that are not visible to the caller. Does this mean that we cannot use idempotence to implement recovery in real-world programs? Not necessarily.

Idempotence modulo side-effects

To make idempotence practical for recovery, we can weaken it in the following ways to be applicable in programs with side-effects:

  1. allow idempotent side-effects.
  2. allow partial idempotence.

1. Idempotent Side-Effects

Note, that idempotence is a property of a program’s re-execution. This means that we can use idempotence, if we can ensure that the re-execution of a program is side-effect free. More generally, we can weaken this definition to state the following:

If the re-execution of a program has the same side-effects as the original execution then the program is idempotent.

That is to say, if the side-effects of a program are idempotent, then the program is idempotent. For instance, consider the following code snippet:

x = 0

def setx(n):
    global x
    x = 3

setx has a side-effect because it modifies a global variable x. However, it is idempotent because multiple executions has the same effect on the system state as the first application.

2. Partial Idempotence

The second way to weaken idempotence is to allow partial idempotence. That is to say, if we are looking to use idempotence for recovery, we can allow regions of the program to be idempotent. For instance, consider the following code snippet:

b = []

def list_push(a, n):
    o = len(a) == 10
    if o: a = b
    a.append(n)

The semantics of the function clearly preclude idempotence: even if there is no overflow, re-executing the function will put the element to the end of an already modified list (after the element that was pushed during the original execution).

However, there are regions of the program that are idempotent. For instance, we could split the function into 3 regions as in the control flow graph below:

flowchart TD
    a["`o = len(a) == 10 
    if o: a = b`"] --> b["`a = b`"]
    a["`o = len(a) == 10 
    if o: a = b`"] --> c["`a.append(n)`"]
    b["`a = b`"] --> c["`a.append(n)`"]

Idempotent Code Regions

We can define idempotent code regions as follows:

A region of code (assume linear instruction sequence for now) is idempotent if the effect of executing the region multiple times is identical to executing it once.

An intuitive way to think about this is a region that does not overwrite its inputs. A region that overwrites its inputs will read the overwritten values on re-execution, changing its behavior. But what are the inputs of a region?

Region Inputs and Dependencies

Region Inputs are variables that are LIVE-IN to the region; i.e., live at the entry of the region. Such a variable has a definition that reaches the region’s entry and has some use within the region.

By definition of liveness, region inputs have a RAW Read-After-Write (RAW) dependency is between a definition and following use of a variable. A.K.A flow dependency. dependency that spans the region’s entry. However, because the definition the variable must be defined before entry, the definition is not inside the region, and hence no definition precedes the first use of the variable.

classDiagram
    Observation : A region input has no RAW dependency\n before the first use of that variable in the region\n*

Since region inputs are live at the entry of a region, and have no RAW dependency before the first use, any overwriting of the variable will have to occur after the first use of the variable.

In other words, an overwriting of region inputs has a WAR Write-After-Read (WAR) dependency is between a use and a following (re-)definition of a variable. A.K.A anti-dependency. dependency after the absence of a RAW dependency.

classDiagram
    Observation : Overwriting a region input has a WAR dependency\n after the absence of a RAW dependency\n*

The table below summarizes the relationship between dependencies and idempotence.

Dependency RAW RAW->WAR WAR
Example x = 1
y = x
x = 1
y = x
x = 2
y = x
x = 2
Idempotent? Yes Yes No

From the above, we can see that one can identify a idempotent code regions by looking for regions that have no WAR dependencies after the absence of a RAW dependency.

Conclusion

Idempotence is a powerful property that can be used to implement recovery. However, it is not easy to apply in practice due to the presence of side-effects in real-world programs. In this post, we have explored the definition of idempotence and how it can be applied to real-world programs. Particularly, we have seen that idempotence can be applied to regions of code that do not overwrite their inputs. We have also explored a simple way to identify such regions but leave implementing it as a static analysis for a future post.