Note Created: 2024-04-14
Recovery capability is an important aspect of modern systems. In most cases, recovery is used to repair the state of the program in the rare event that an execution failure occurs.
Checkpoints provide a simple way to implement recovery. However, they have several challenges:
To address these challenges, we could study the core property that enables simple recovery via re-execution: idempotence. There are approaches like transactional semantics that can be used to implement efficient recovery. However, they require the programming language to support a specific programming model.
Idempotence is the property of a program that re-execution is free of side-effects.
In contrast, to explicit checkpointing, idempotence allows the architecture state at the beginning of a code region to be used as an implicit checkpoint (that is never saved). On execution failure, idempotence allows repairing the state via a simple re-execution.
The definition of idempotence looks simple, but is not easy to apply in practice. Particularly challenging is the fact that most real-world programs are not completely idempotent; they have some side-effects. Let’s define side-effects more formally.
A side-effect is a change in the state of the program that is not visible to the caller. For instance, consider the following code snippet:
def foo(x, y):
return x + y
The function foo
has no side-effects, since it manipulates only the arguments passed to it. In contrast, consider the following code snippet:
total = 0
def bar(x, y):
global total
total += x + y
return total
In this version of the function bar
we have added a global variable total
outside of the function that is modified. This creates a side effect, since the function now affects something outside of its immeditate scope.
Real-world programs look more like bar
. They have side-effects that are not visible to the caller. Does this mean that we cannot use idempotence to implement recovery in real-world programs? Not necessarily.
To make idempotence practical for recovery, we can weaken it in the following ways to be applicable in programs with side-effects:
1. Idempotent Side-Effects
Note, that idempotence is a property of a program’s re-execution. This means that we can use idempotence, if we can ensure that the re-execution of a program is side-effect free. More generally, we can weaken this definition to state the following:
If the re-execution of a program has the same side-effects as the original execution then the program is idempotent.
That is to say, if the side-effects of a program are idempotent, then the program is idempotent. For instance, consider the following code snippet:
x = 0
def setx(n):
global x
x = 3
setx
has a side-effect because it modifies a global variable x
. However, it is idempotent because multiple executions has the same effect on the system state as the first application.
2. Partial Idempotence
The second way to weaken idempotence is to allow partial idempotence. That is to say, if we are looking to use idempotence for recovery, we can allow regions of the program to be idempotent. For instance, consider the following code snippet:
b = []
def list_push(a, n):
o = len(a) == 10
if o: a = b
a.append(n)
The semantics of the function clearly preclude idempotence: even if there is no overflow, re-executing the function will put the element to the end of an already modified list (after the element that was pushed during the original execution).
However, there are regions of the program that are idempotent. For instance, we could split the function into 3 regions as in the control flow graph below:
flowchart TD a["`o = len(a) == 10 if o: a = b`"] --> b["`a = b`"] a["`o = len(a) == 10 if o: a = b`"] --> c["`a.append(n)`"] b["`a = b`"] --> c["`a.append(n)`"]
We can define idempotent code regions as follows:
A region of code (assume linear instruction sequence for now) is idempotent if the effect of executing the region multiple times is identical to executing it once.
An intuitive way to think about this is a region that does not overwrite its inputs. A region that overwrites its inputs will read the overwritten values on re-execution, changing its behavior. But what are the inputs of a region?
classDiagram Observation : A region input has no RAW dependency\n before the first use of that variable in the region\n*Since region inputs are live at the entry of a region, and have no RAW dependency before the first use, any overwriting of the variable will have to occur after the first use of the variable. In other words, an overwriting of region inputs has a **WAR** Write-After-Read (WAR) dependency is between a use and a following (re-)definition of a variable. A.K.A _anti-dependency_. dependency after the absence of a RAW dependency.
classDiagram Observation : Overwriting a region input has a WAR dependency\n after the absence of a RAW dependency\n*
The table below summarizes the relationship between dependencies and idempotence.
Dependency | RAW | RAW->WAR | WAR |
---|---|---|---|
Example | x = 1 |
x = 1 |
y = x |
Idempotent? | Yes | Yes | No |
From the above, we can see that one can identify a idempotent code regions by looking for regions that have no WAR dependencies after the absence of a RAW dependency.
Idempotence is a powerful property that can be used to implement recovery. However, it is not easy to apply in practice due to the presence of side-effects in real-world programs. In this post, we have explored the definition of idempotence and how it can be applied to real-world programs. Particularly, we have seen that idempotence can be applied to regions of code that do not overwrite their inputs. We have also explored a simple way to identify such regions but leave implementing it as a static analysis for a future post.