Sometimes we are not only interested in fuzzing as many as possible diverse program inputs, but in deriving specific test inputs that achieve some objective, such as reaching specific statements in a program. When we have an idea of what we are looking for, then we can search for it. Search algorithms are at the core of computer science, but applying classic search algorithms like breadth or depth first search to search for tests is unrealistic, because these algorithms potentially require us to look at all possible inputs. However, domain-knowledge can be used to overcome this problem. For example, if we can estimate which of several program inputs is closer to the one we are looking for, then this information can guide us to reach the target quicker – this information is known as a heuristic. The way heuristics are applied systematically is captured in meta-heuristic search algorithms. The "meta" denotes that these algorithms are generic and can be instantiated differently to different problems. Meta-heuristics often take inspiration from processes observed in nature. For example, there are algorithms mimicking evolutionary processes, swarm intelligence, or chemical reactions. In general, they are much more efficient than exhaustive search approaches such that they can be applied to vast search spaces – search spaces as vast as the domain of program inputs are no problem for them.
Prerequisites
If we want to apply a meta-heuristic search algorithm to generate test data for a program, then we have to make several choices: First, we need to decide on what exactly our search space is in the first place. The search space is defined by how we represent what we are looking for. Are we looking for single integer values? Tuples of values? Objects? XML documents?
The representation is highly dependent on the particular testing problem we are solving --- we know which program we are testing, so the representation needs to encode whatever an input to our target program is. Let's consider the example function test_me()
as our function under test:
def test_me(x, y):
if x == 2 * (y + 1):
return True
else:
return False
The test_me()
function has two input parameters, and returns True
or False
depending on how the two relate to each other. A test input to test_me()
consists of a pair of values, one for x
and one for y
. For example:
test_me(0, 0)
False
test_me(4, 2)
False
test_me(22, 10)
True
Our search space is only concerned with inputs, thus a simple representation for test data would be input tuples (x, y)
. Each point in this input space has eight neighbors:
x-1, y-1
x-1, y
x-1, y+1
x, y+1
x+1, y+1
x+1, y
x+1, y-1
x, y-1
To keep things simple, let's restrict the size of our search space to start with (we will change this later). For example, let's assume we only want values in the range of -1000 to 1000:
MAX = 1000
MIN = -MAX
To retrieve the neighbors for any point in our search space, we define the function neighbors()
, which implements a basic Moore neighborhood. That is, we look at all 8 immediate neighbors, while considering the boundaries we just defined with MAX
and MIN
:
def neighbors(x, y):
return [(x + dx, y + dy) for dx in [-1, 0, 1]
for dy in [-1, 0, 1]
if (dx != 0 or dy != 0)
and ((MIN <= x + dx <= MAX)
and (MIN <= y + dy <= MAX))]
print(neighbors(10, 10))
[(9, 9), (9, 10), (9, 11), (10, 9), (10, 11), (11, 9), (11, 10), (11, 11)]
This fully defines our search space: We have a representation, and we know how individuals are related to each other through their neighborhood. Now we just need to find an algorithm to explore this neighborhood, and a heuristic that guides the algorithm.
All meta-heuristics are based on the use of a heuristic function that estimates how good a given candidate solution is; this "goodness" is typically called the fitness of an individual, and the heuristic that estimates the fitness is the fitness function. The fitness function is a function that maps any point in the search space to a numerical value, the fitness value. The better a candidate solution in the search space with respect to being an optimal solution, the better its fitness value. Thus, if you plot each point in the search space with its fitness value as the height, you get a landscape with the optimal solution represented as the highest peak.
The fitness function depends on the objective one would like to achieve with generating the test data. Suppose that we are interested in covering the true branch of the if-condition in the test_me()
function, i.e. x == 2 * (y + 1)
.
How close is a given input tuple for this function from reaching the target branch? Let's consider an arbitrary point in the search space, e.g. (274, 153)
. The if-condition compares the following values:
x = 274
y = 153
x, 2 * (y + 1)
(274, 308)
In order to make the branch true, both values need to be the same. Thus, the more they differ, the further we are away from making the comparison true, and the less they differ, the closer we are from making the comparison true. Thus, we can quantify "how false" the comparison is by calculating the difference between x
and 2 * (y + 1)
. Thus, we can calculate this distance as abs(x - 2 * (y + 1))
:
def calculate_distance(x, y):
return abs(x - 2 * (y + 1))
calculate_distance(274, 153)
34
We can use this distance value as our fitness function, since we can nicely measure how close we are to an optimal solution. Note, however, that "better" doesn't mean "bigger" in this case; the smaller the distance the better. This is not a problem, since any algorithm that can maximize a value can also be made to minimize it instead.
For each value in the search space of integer tuples, this distance value defines the elevation in our search landscape. Since our example search space is two-dimensional, the search landscape is three-dimensional and we can plot it to see what it looks like:
%matplotlib inline
xx = np.outer(np.linspace(-10, 10, 30), np.ones(30))
yy = xx.copy().T
zz = calculate_distance(xx, yy)
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.plot_surface(xx, yy, zz, cmap=plt.cm.jet, rstride=1, cstride=1, linewidth=0);
The optimal values, i.e. those that make the if-condition true, have fitness value 0 and can be clearly seen at the bottom of the plot. The further away from the optimal values, the higher elevated the points in the search space.
The fitness function should calculate the distance value for concrete test executions. That is, we want to run the program, and then learn the distance value of this execution. However, the branching condition is hidden within the source code of the target function, and its value may in principle be the result of various calculations along the execution path reaching it. Even though in our example the condition is an equation using the input values of the function directly, this may not be the case in general; it might as well be derived values. Thus, the values we need to calculate our distance metric need to be observed directly at the conditional statement.
This is typically done with instrumentation: We add new code immediately before or after the branching condition to keep track of the values observed and calculate the distance using these values. The following is an instrumented version of our program under test, which prints out the distance values as it is executed:
def test_me_instrumented(x, y):
print("Instrumentation: Input = (%d, %d), distance = %d" %
(x, y, calculate_distance(x, y)))
if x == 2 * (y + 1):
return True
else:
return False
Let's try this out for a couple of example values:
test_me_instrumented(0, 0)
Instrumentation: Input = (0, 0), distance = 2
False
test_me_instrumented(5, 2)
Instrumentation: Input = (5, 2), distance = 1
False
test_me_instrumented(22, 10)
Instrumentation: Input = (22, 10), distance = 0
True
When calculating a fitness value, we will execute the instrumented program version, but we need some means to access the distance value calculated during that execution. As a simple first solution to this problem, we can just add a global variable and store the value of the distance calculation there.
distance = 0
def test_me_instrumented(x, y): # type: ignore
global distance
distance = calculate_distance(x, y)
if x == 2 * (y + 1):
return True
else:
return False
Using this instrumented version of test_me()
, we can now finally define our fitness function, which simply runs the instrumented test_me_instrumented()
function, and then retrieves the value of the global distance
variable:
def get_fitness(x, y):
global distance
test_me_instrumented(x, y)
fitness = distance
return fitness
Let's try this on some example inputs:
get_fitness(0, 0)
2
get_fitness(1, 2)
5
get_fitness(22, 10)
0
Having decided on a representation (2-tuples of integers) and a fitness function (distance to target branch), we can now finally go ahead and implement our search algorithm. Let's explore this search space using the simplest possible meta-heuristic algorithm: Hillclimbing. The metaphor captures aptly what is happening: The algorithm tries to climb a hill in the search space defined by our representation. Except, that in our search landscape the best values are not those high up but down low, so technically we are descending into valleys.
The hillclimbing algorithm itself is very simple:
The hillclimber starts with a random test input, i.e., random values for x
and y
. For any pair of random integer numbers, the chances of them satisfying the condition x == 2 * (y + 1)
are rather slim. Suppose the random values are (274, 153)
. The right-hand side of the equation, 2 * (y + 1)
, evaluates to 308, so the condition is clearly false. Where should the hillclimber go to now? Let's look at the fitness values of this test input and its neighbors:
x, y = 274, 153
print("Origin %d, %d has fitness %d" % (x, y, get_fitness(x, y)))
for nx, ny in neighbors(x, y):
print("neighbor %d, %d has fitness %d" % (nx, ny, get_fitness(nx, ny)))
Origin 274, 153 has fitness 34 neighbor 273, 152 has fitness 33 neighbor 273, 153 has fitness 35 neighbor 273, 154 has fitness 37 neighbor 274, 152 has fitness 32 neighbor 274, 154 has fitness 36 neighbor 275, 152 has fitness 31 neighbor 275, 153 has fitness 33 neighbor 275, 154 has fitness 35
Increasing y
by one increases the value of the right-hand side of the equation to 310
. Thus, the value on the left-hand side of the equation thus differs even more to the value on the right-hand side of the equation than it did before the increase! So, increasing y
does not seem like a good idea. On the other hand, increasing x
by one improves things: The left-hand side and the right-hand side of the equation become more similar; they are "less unequal". Thus, out of the eight possible neighbors of (274, 153)
, the neighbor that increases x
and decreases y
((275, 152)
) seems best intuitively --- the outcome of the condition is still false, but it is "less so" than for the original value.
Let's now implement the hillcimbing algorithm.
LOG_VALUES = 20 # Number of values to log
def hillclimber():
# Create and evaluate starting point
x, y = random.randint(MIN, MAX), random.randint(MIN, MAX)
fitness = get_fitness(x, y)
print("Initial value: %d, %d at fitness %.4f" % (x, y, fitness))
iterations = 0
logs = 0
# Stop once we have found an optimal solution
while fitness > 0:
iterations += 1
# Move to first neighbor with a better fitness
for (nextx, nexty) in neighbors(x, y):
new_fitness = get_fitness(nextx, nexty)
# Smaller fitness values are better
if new_fitness < fitness:
x, y = nextx, nexty
fitness = new_fitness
if logs < LOG_VALUES:
print("New value: %d, %d at fitness %.4f" % (x, y, fitness))
elif logs == LOG_VALUES:
print("...")
logs += 1
break
print("Found optimum after %d iterations at %d, %d" % (iterations, x, y))
hillclimber()
Initial value: 201, -956 at fitness 2111.0000 New value: 200, -956 at fitness 2110.0000 New value: 199, -956 at fitness 2109.0000 New value: 198, -956 at fitness 2108.0000 New value: 197, -956 at fitness 2107.0000 New value: 196, -956 at fitness 2106.0000 New value: 195, -956 at fitness 2105.0000 New value: 194, -956 at fitness 2104.0000 New value: 193, -956 at fitness 2103.0000 New value: 192, -956 at fitness 2102.0000 New value: 191, -956 at fitness 2101.0000 New value: 190, -956 at fitness 2100.0000 New value: 189, -956 at fitness 2099.0000 New value: 188, -956 at fitness 2098.0000 New value: 187, -956 at fitness 2097.0000 New value: 186, -956 at fitness 2096.0000 New value: 185, -956 at fitness 2095.0000 New value: 184, -956 at fitness 2094.0000 New value: 183, -956 at fitness 2093.0000 New value: 182, -956 at fitness 2092.0000 New value: 181, -956 at fitness 2091.0000 ... Found optimum after 1656 iterations at -1000, -501
The hillclimber starts by choosing random values for x
and y
. We use low values in the range of -1000
--1000
(which we defined MIN
and MAX
to be earlier) to reduce the time search takes when playing with the example. Then, we determine the fitness value of this starting point by calling get_fitness()
. Recall that we are trying to find the smallest possible fitness value, therefore we now loop until we have found a fitness value of 0
(i.e., an optimal value).
In this loop, we iterate over all neighbors (neighbors
), and evaluate the fitness value of each of the neighbors. As soon as we have found a neighbor with better (smaller) fitness, the hillclimber exits the loop and uses this as the new starting point. An alternative variant of this simple hillclimbing algorithm would be to remove the break
statement: By doing so, all neighbors would be evaluated, and the best neighbor would be chosen. This is known as steepest ascent hillclimbing. You will see that the number of iterations necessary to reach the optimum is lower, although for each iteration more tests are executed.
def steepest_ascent_hillclimber():
# Create and evaluate starting point
x, y = random.randint(MIN, MAX), random.randint(MIN, MAX)
fitness = get_fitness(x, y)
print("Initial value: %d, %d at fitness %.4f" % (x, y, fitness))
iterations = 0
logs = 0
# Stop once we have found an optimal solution
while fitness > 0:
iterations += 1
# Move to first neighbor with a better fitness
for (nextx, nexty) in neighbors(x, y):
new_fitness = get_fitness(nextx, nexty)
if new_fitness < fitness:
x, y = nextx, nexty
fitness = new_fitness
if logs < LOG_VALUES:
print("New value: %d, %d at fitness %.4f" % (x, y, fitness))
elif logs == LOG_VALUES:
print("...")
logs += 1
print("Found optimum after %d iterations at %d, %d" % (iterations, x, y))
steepest_ascent_hillclimber()
Initial value: -258, 645 at fitness 1550.0000 New value: -259, 644 at fitness 1549.0000 New value: -258, 644 at fitness 1548.0000 New value: -257, 644 at fitness 1547.0000 New value: -258, 643 at fitness 1546.0000 New value: -257, 643 at fitness 1545.0000 New value: -256, 643 at fitness 1544.0000 New value: -257, 642 at fitness 1543.0000 New value: -256, 642 at fitness 1542.0000 New value: -255, 642 at fitness 1541.0000 New value: -256, 641 at fitness 1540.0000 New value: -255, 641 at fitness 1539.0000 New value: -254, 641 at fitness 1538.0000 New value: -255, 640 at fitness 1537.0000 New value: -254, 640 at fitness 1536.0000 New value: -253, 640 at fitness 1535.0000 New value: -254, 639 at fitness 1534.0000 New value: -253, 639 at fitness 1533.0000 New value: -252, 639 at fitness 1532.0000 New value: -253, 638 at fitness 1531.0000 New value: -252, 638 at fitness 1530.0000 ... Found optimum after 517 iterations at 258, 128
Our example program has a very nice fitness landscape – there is a perfect gradient, and the hillclimber will always find a solution. We can see this nice gradient if we plot the fitness values observed over time:
def plotting_hillclimber(fitness_function):
data = []
# Create and evaluate starting point
x, y = random.randint(MIN, MAX), random.randint(MIN, MAX)
fitness = fitness_function(x, y)
data += [fitness]
iterations = 0
# Stop once we have found an optimal solution
while fitness > 0:
iterations += 1
# Move to first neighbor with a better fitness
for (nextx, nexty) in neighbors(x, y):
new_fitness = fitness_function(nextx, nexty)
if new_fitness < fitness:
x, y = nextx, nexty
fitness = new_fitness
data += [fitness]
break
print("Found optimum after %d iterations at %d, %d" % (iterations, x, y))
return data
data = plotting_hillclimber(get_fitness)
Found optimum after 429 iterations at -1000, -501
fig = plt.figure()
ax = plt.axes()
xs = range(len(data))
ax.plot(xs, data);
This gradient is the result of an ideal fitness landscape. In practice, we won't always have such a nice gradient. In particular, our hillclimber only works well as long as there is at least one neighbor that has a better fitness value. What if we reach a point where none of the neighbors actually has a better fitness value? Consider the following function test_me2
:
def test_me2(x, y):
if(x * x == y * y * (x % 20)):
return True
else:
return False
If we want to cover the true-branch of the if-condition in test_me2
again, then we can calculate the distance in the same way as previously, i.e., by calculating the difference between the two sides of the comparison. Let's instrument the test_me2()
function:
def test_me2_instrumented(x, y):
global distance
distance = abs(x * x - y * y * (x % 20))
if(x * x == y * y * (x % 20)):
return True
else:
return False
With this instrumented version, we just need a fitness function that calls the instrumented version and reads out the global distance
variable.
def bad_fitness(x, y):
global distance
test_me2_instrumented(x, y)
fitness = distance
return fitness
Before we run the hillclimber on this example, let's have a look at the search landscape again by plotting it:
xx = np.outer(np.linspace(-10, 10, 30), np.ones(30))
yy = xx.copy().T
zz = abs(xx * xx - yy * yy * (xx % 20)) # type: ignore
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.plot_surface(xx, yy, zz, cmap=plt.cm.jet, rstride=1, cstride=1, linewidth=0); # type: ignore
At this point it would be nice to run the hillclimber using the new fitness function, but there is a problem: Running our hillclimber with this fitness function is not a good idea, because it might never terminate. Suppose we've reached a point where all neighbors have the same or a worse fitness value. The hillclimber can move no where and is stuck there forever! Such a point in the search landscape is called a local optimum. If one reaches such a point, the easiest way out is to give up and restart from a new random point. This is what we will do in our hillclimber with random restarts.
def restarting_hillclimber(fitness_function):
data = []
# Create and evaluate starting point
x, y = random.randint(MIN, MAX), random.randint(MIN, MAX)
fitness = fitness_function(x, y)
data += [fitness]
print("Initial value: %d, %d at fitness %.4f" % (x, y, fitness))
iterations = 0
# Stop once we have found an optimal solution
while fitness > 0:
changed = False
iterations += 1
# Move to first neighbor with a better fitness
for (nextx, nexty) in neighbors(x, y):
new_fitness = fitness_function(nextx, nexty)
if new_fitness < fitness:
x, y = nextx, nexty
fitness = new_fitness
data += [fitness]
changed = True
break
if not changed:
x, y = random.randint(MIN, MAX), random.randint(MIN, MAX)
fitness = fitness_function(x, y)
data += [fitness]
print("Found optimum after %d iterations at %d, %d" % (iterations, x, y))
return data
The change is trivial: We simply keep track whether any movement has occurred with a boolean flag, and if we did not move to any of the neighbors, we pick a new random position to start over at. For convenience, we also made the hillclimber parameterizable with fitness functions. Let's try it out with our bad_fitness
and plot the resulting fitness values we observe:
MAX = 1000
MIN = -MAX
data = restarting_hillclimber(bad_fitness)
Initial value: 333, 231 at fitness 582804.0000 Found optimum after 165 iterations at 521, 521
fig = plt.figure()
ax = plt.axes()
xs = range(len(data))
ax.plot(xs, data);
Run the example a couple of times. Sometimes, we are lucky and there is a gradient that takes it straight to an optimal solution. But sometimes you'll see restarts throughout the search before reaching an optimal value.
We restricted initial values of x
and y
to rather small range of [MIN, MAX]
. This is a common trick in test generation, as in most cases solutions tend to consist of small values, and using small values to start the search makes the search quicker in many cases. However, what if the solution we need is at a completely different place in our search space? Our bias towards smaller solutions would mean that the hillclimber would take very long to find the solution, and given a fixed search budget it would thus be less likely to actually find a solution. To see what effects this would have, we could simply replace the 1000
with, say, 1000000
or more. We can play around with the range to see the performance we get for our simple search problems.
MAX = 100000
MIN = -MAX
with Timer() as t:
restarting_hillclimber(get_fitness)
print("Search time: %.2fs" % t.elapsed_time())
Initial value: 64543, -55357 at fitness 175255.0000 Found optimum after 169899 iterations at -100000, -50001 Search time: 0.40s
In most cases the search now will take much longer until a solution is found --- likely longer than we are prepared to wait for such a simple example function! (Although sometimes we will get lucky and randomly hit a good starting position). How is this ever going to work on "real" examples? Not to imagine if there were even more parameters and a bigger neighborhood!
Let's turn to a slightly more complex program: The CGI decoder you already know from the Coverage chapter.
def cgi_decode(s):
"""Decode the CGI-encoded string `s`:
* replace "+" by " "
* replace "%xx" by the character with hex number xx.
Return the decoded string. Raise `ValueError` for invalid inputs."""
# Mapping of hex digits to their integer values
hex_values = {
'0': 0, '1': 1, '2': 2, '3': 3, '4': 4,
'5': 5, '6': 6, '7': 7, '8': 8, '9': 9,
'a': 10, 'b': 11, 'c': 12, 'd': 13, 'e': 14, 'f': 15,
'A': 10, 'B': 11, 'C': 12, 'D': 13, 'E': 14, 'F': 15,
}
t = ""
i = 0
while i < len(s):
c = s[i]
if c == '+':
t += ' '
elif c == '%':
digit_high, digit_low = s[i + 1], s[i + 2]
i += 2
if digit_high in hex_values and digit_low in hex_values:
v = hex_values[digit_high] * 16 + hex_values[digit_low]
t += chr(v)
else:
raise ValueError("Invalid encoding")
else:
t += c
i += 1
return t
The cgi_decode()
function has one input of type string, and one possible way to define the neighborhood of a string is by all possible strings that have an edit distance of 1. For example, string test
would have two neighbors for each of its four characters:
uest
tfst
tett
tesu
sest
tdst
tert
tess
In addition, prepending any character or appending any character would also have an edit distance of 1 and could be considered neighbors. To keep things simple, let's keep the length of our input strings fixed to a reasonable value (e.g. 10). In this case, each individual has 20 neighbors (i.e., each character has two neighbors).
Let's implement a new neighbor_strings()
function that produces these neighboring strings:
def neighbor_strings(x):
n = []
for pos in range(len(x)):
c = ord(x[pos])
if c < 126:
n += [x[:pos] + chr(c + 1) + x[pos + 1:]]
if c > 32:
n += [x[:pos] + chr(c - 1) + x[pos + 1:]]
return n
The neighbor_strings()
function gets the numerical value of each character in the input string, and creates a new string with the character replaced with the preceding and succeeding characters in the alphabet. To start with, we only consider printable ASCII characters, which are in the range 32-126.
print(neighbor_strings("Hello"))
['Iello', 'Gello', 'Hfllo', 'Hdllo', 'Hemlo', 'Heklo', 'Helmo', 'Helko', 'Hellp', 'Helln']
Thus, we have defined the search space for the cgi_decode()
function. The next ingredient to searching for individuals in this search space is a fitness function.
The test_me()
function consisted of a single if-condition, in which two integer numbers were compared. In the cgi_decode()
function we have three if-conditions and one while loop, and they all compare characters. Fortunately, as we have already seen, we can treat characters like numbers, so we can use the same distance estimate we used in the test_me()
example. However, there are also two conditions which check whether a character is contained in a set, e.g. digit_high in hex_values
. How close is a value to being contained in the set? An obvious solution would be to consider the distance to the closest value in the set as the estimate.
def distance_character(target, values):
# Initialize with very large value so that any comparison is better
minimum = sys.maxsize
for elem in values:
distance = abs(target - elem)
if distance < minimum:
minimum = distance
return minimum
distance_character(10, [1, 5, 12, 100])
2
distance_character(10, [0, 50, 80, 200])
10
A further simplification we have made so far was to assume that we would always want conditions to evaluate to true. In practice, we might want to have if-conditions evaluate to false just as well. Thus, each if-condition actually has two distance estimates, one to estimate how close it is to being true, and one how close it is to being false. If the condition is true, then the true distance is 0; if the condition is false, then the false distance is 0. That is, in a comparison a == b
, if a
is smaller than b
, then the false distance is 0
by definition.
What is the distance of a == b
being false when a
equals b
? Any change to either a
or b
would make the condition evaluate to false, so we can define the distance simply as 1 in this case.
More generally, there can be other types of comparisons, for example using relational operators. Consider the loop condition in cgi_decode()
: i < len(s)
, i.e., it uses a less-than comparison operator. It is quite straight forward to extend our notion of branch distance to cover different types of comparisons, and to calculate true and false distances. The following table shows how to calculate the distance for different types of comparisons:
Condition | Distance True | Distance False |
---|---|---|
a == b | abs(a - b) | 1 |
a != b | 1 | abs(a - b) |
a < b | b - a + 1 | a - b |
a <= b | b - a | a - b + 1 |
a > b | a - b + 1 | b - a |
Note that several of the calculations add a constant 1
. The reason for this is quite simple: Suppose we want to have a < b
evaluate to true, and let a = 27
and b = 27
. The condition is not true, but simply taking the difference would give us a result of 0
. To avoid this, we have to add a constant value. It is not important whether this value is 1
-- any positive constant works.
In the cgi_decode()
function, we can also find a somewhat more complex predicate which consists of two conditions joined by a logical and
:
if digit_high in hex_values and digit_low in hex_values:
In principle, the branch distance is defined such that the distance to make a conjunction A and B
true equals the sum of the branch distances for A
and B
, as both of the two conditions would need to be true. Similarly, the branch distance to make A or B
true would be the minimum of the two branch distances of A
and B
, as it suffices if one of the two conditions is true to make the entire expression true.
However, it is not as easy as that in practice: Predicates can consist of nested conditions and negations, and one would need to convert the expression to canonical form before being able to apply this calculation. Furthermore, most modern programming languages use short-circuit evaluation: If there is a condition A or B
, and A
is true, then B
is never evaluated. If B
is an expression with side effects, then by calculating the branch distance of B
even though short-circuit evaluation would avoid its execution, we would potentially be changing the program behavior (by invoking the side-effect that would in normal behavior not be executed), and that is not acceptable.
Furthermore, what if the branching condition has side effects? For example, suppose that the branching condition were x == 2 * foo(y)
, where foo()
is a function that takes an integer as input. Naively instrumenting would lead to the following code:
distance = abs(x - 2 * foo(y))
if x == 2 * foo(y):
...
Thus, the instrumentation would lead to foo()
being executed twice. Suppose foo()
changes the state of the system (e.g., by printing something, accessing the file system, changing some state variables, etc.), then clearly invoking foo()
a second time is a bad idea. One way to overcome this problem is to transform the conditions, rather than adding tracing calls. For example, one can create temporary variables that hold the values necessary for the distance calculation and then use these in the branching condition:
tmp1 = x
tmp2 = 2 * foo(y)
distance = abs(tmp1 - tmp2)
if tmp1 == tmp2:
...
Besides these issues, the approach of adding a global variable and method call to the program seems like a rather clumsy approach --- surely we cannot start thinking about every branch in our program on its own and instrument the program we want to test manually, in particular if programs have multiple branches like the cgi_decode()
function. Rather, we should be looking at how to automatically instrument programs to contain the necessary added statements such that we can calculate fitness values.
An alternative approach to using the global and temporary variables is to replace the actual comparison with a call to a helper function, where the original expressions are evaluated as arguments, and the operator is an additional argument. Assume we have a function evaluate_condition()
which takes four parameters:
num
is a unique id that identifies the condition;op
is the operator of the comparison;lhs
and rhs
are the operands.The function calculates two distances for the condition: The distance to the condition evaluating to true, and the distance to the condition evaluating to false. One of the two outcomes will always be true, and thus one of them will always have distance 0
. Since the function replaces the original comparison, it returns true or false, depending on which distance is 0
. That means, the example expression
if x == 2 * foo(y)
would be replaced by
if evaluate_condition(0, "Eq", x, 2 * foo(y))
such that the arguments are only evaluated once, and side effects are thus handled correctly. Here is how the evaluate_condition()
function looks like:
def evaluate_condition(num, op, lhs, rhs):
distance_true = 0
distance_false = 0
if op == "Eq":
if lhs == rhs:
distance_false = 1
else:
distance_true = abs(lhs - rhs)
# ... code for other types of conditions
if distance_true == 0:
return True
else:
return False
Note that we are initializing distance_true
and distance_false
with 0
. Thus, if lhs
equals rhs
, then the variable distance_true
remains 0, and vice versa.
evaluate_condition(1, "Eq", 10, 20)
False
evaluate_condition(2, "Eq", 20, 20)
True
What the evaluate_condition()
function does not yet do is store the distances observed. Obviously, we will need to store the values somewhere so that we can access it from our fitness function. Since the cgi_decode()
program consists of several conditions, and for each condition we might be interested in the true and the false distance, we simply use two global dictionaries, distances_true
and distances_false
, and define a helper function that stores the distance values observed in the dictionary:
def update_maps(condition_num, d_true, d_false):
global distances_true, distances_false
if condition_num in distances_true.keys():
distances_true[condition_num] = min(
distances_true[condition_num], d_true)
else:
distances_true[condition_num] = d_true
if condition_num in distances_false.keys():
distances_false[condition_num] = min(
distances_false[condition_num], d_false)
else:
distances_false[condition_num] = d_false
The variable condition_num
is the unique ID of the condition that we've just evaluated. If this is the first time that we have executed this particular condition, then the true and false distances are simply stored in the corresponding dictionaries. However, it is possible that the same test executes a condition multiple times. For example, the loop condition i < len(s)
in the cgi_decode()
function is evaluated before every single loop iteration. In the end, however, we want to have a single fitness value for a test. As covering a branch just requires that at least one of the executions reaches the branch, we consider only the closest one. Therefore, if the distances_true
and distances_false
dictionaries already contain the distance from a previous execution, we only replace that value if the new execution was closer to reaching the branch; this is implemented using the min()
function.
We now need to call this function from within evaluate_condition()
. Let's also add the calculation of distance for the in
operator and the <
comparison, since we need both of them for the cgi_decode()
example. Furthermore, cgi_decode()
actually compares characters and numbers. To make sure we use the correct types, we first have to convert the characters to numbers to calculate the distances. This is done using Python's ord()
function.
def evaluate_condition(num, op, lhs, rhs): # type: ignore
distance_true = 0
distance_false = 0
# Make sure the distance can be calculated on number and character
# comparisons
if isinstance(lhs, str):
lhs = ord(lhs)
if isinstance(rhs, str):
rhs = ord(rhs)
if op == "Eq":
if lhs == rhs:
distance_false = 1
else:
distance_true = abs(lhs - rhs)
elif op == "Lt":
if lhs < rhs:
distance_false = rhs - lhs
else:
distance_true = lhs - rhs + 1
# ...
# handle other comparison operators
# ...
elif op == "In":
minimum = sys.maxsize
for elem in rhs.keys():
distance = abs(lhs - ord(elem))
if distance < minimum:
minimum = distance
distance_true = minimum
if distance_true == 0:
distance_false = 1
update_maps(num, distance_true, distance_false)
if distance_true == 0:
return True
else:
return False
The following shows the instrumentation of the conjunction from cgi_decode()
to make use of the evaluate_condition()
function. There are two calls to evaluate_condition
corresponding to the two conditions, and the operator and
with which they are conjoined ensures that the original short-circuiting behavior is preserved:
if (evaluate_condition(4, 'In', digit_high, hex_values) and evaluate_condition(5, 'In', digit_low, hex_values))
Of course we would like to automatically produce this instrumented version.
Replacing comparisons automatically is actually quite easy in Python, using the abstract syntax tree (AST) of the program. In the AST, a comparison will typically be a tree node with an operator attribute and two children for the left-hand and right-hand operators. To replace such comparisons with a call to evaluate_condition()
one simply needs to replace the comparison node in the AST with a function call node, and this is what the BranchTransformer
class does use a NodeTransformer
from Python's ast
module:
class BranchTransformer(ast.NodeTransformer):
branch_num = 0
def visit_FunctionDef(self, node):
node.name = node.name + "_instrumented"
return self.generic_visit(node)
def visit_Compare(self, node):
if node.ops[0] in [ast.Is, ast.IsNot, ast.In, ast.NotIn]:
return node
self.branch_num += 1
return ast.Call(func=ast.Name("evaluate_condition", ast.Load()),
args=[ast.Num(self.branch_num),
ast.Str(node.ops[0].__class__.__name__),
node.left,
node.comparators[0]],
keywords=[],
starargs=None,
kwargs=None)
The BranchTransformer
parses a target Python program using the built-in parser ast.parse()
, which returns the AST. Python provides an API to traverse and modify this AST. To replace the comparison with a function call we use an ast.NodeTransformer
, which uses the visitor pattern where there is one visit_*
function for each type of node in the AST. As we are interested in replacing comparisons, we override visit_Compare
, where instead of the original comparison node we return a new node of type ast.Func
, which is a function call node. The first parameter of this node is the name of the function evaluate_condition()
, and the arguments are the four arguments that our evaluate_condition()
function expects:
branch_num
),Note that Python allows comparisons of multiple expressions (e.g. 1 < x < 10
); to keep the code simple we only deal with individual comparisons here, but it would be straight forward to extend the code by treating each comparison with an individual call to evaluate_condition
. You will notice that we also override visit_FunctionDef
; this is just to change the name of the method by appending _instrumented
, so that we can continue to use the original function together with the instrumented one.
The following code parses the source code of the cgi_decode()
function to an AST, then transforms it, and prints it out again (using the unparse()
function from the ast
library):
source = inspect.getsource(cgi_decode)
node = ast.parse(source)
BranchTransformer().visit(node)
# Make sure the line numbers are ok before printing
node = ast.fix_missing_locations(node)
print_content(ast.unparse(node), '.py')
def cgi_decode_instrumented(s): """Decode the CGI-encoded string `s`: * replace "+" by " " * replace "%xx" by the character with hex number xx. Return the decoded string. Raise `ValueError` for invalid inputs.""" hex_values = {'0': 0, '1': 1, '2': 2, '3': 3, '4': 4, '5': 5, '6': 6, '7': 7, '8': 8, '9': 9, 'a': 10, 'b': 11, 'c': 12, 'd': 13, 'e': 14, 'f': 15, 'A': 10, 'B': 11, 'C': 12, 'D': 13, 'E': 14, 'F': 15} t = '' i = 0 while evaluate_condition(1, 'Lt', i, len(s)): c = s[i] if evaluate_condition(2, 'Eq', c, '+'): t += ' ' elif evaluate_condition(3, 'Eq', c, '%'): (digit_high, digit_low) = (s[i + 1], s[i + 2]) i += 2 if evaluate_condition(4, 'In', digit_high, hex_values) and evaluate_condition(5, 'In', digit_low, hex_values): v = hex_values[digit_high] * 16 + hex_values[digit_low] t += chr(v) else: raise ValueError('Invalid encoding') else: t += c i += 1 return t
To calculate a fitness value with the instrumented version, we need to compile the instrumented AST again, which is done using Python's compile()
function. We then need to make the compiled function accessible, for which we first retrieve the current module from sys.modules
, and then add the compiled code of the instrumented function to the list of functions of the current module using exec
. After this, the cgi_decode_instrumented()
function can be accessed.
def create_instrumented_function(f):
source = inspect.getsource(f)
node = ast.parse(source)
node = BranchTransformer().visit(node)
# Make sure the line numbers are ok so that it compiles
node = ast.fix_missing_locations(node)
# Compile and add the instrumented function to the current module
current_module = sys.modules[__name__]
code = compile(cast(ast.Module, node), filename="<ast>", mode="exec")
exec(code, current_module.__dict__)
# Set up the global maps
distances_true: Dict[int, int] = {}
distances_false: Dict[int, int] = {}
# ignore
def cgi_decode_instrumented(s: str) -> str:
return "" # make mypy happy
# Create instrumented function
# cgi_decode_instrumented =
create_instrumented_function(cgi_decode)
assert cgi_decode("Hello+Reader") == cgi_decode_instrumented("Hello+Reader")
cgi_decode_instrumented("Hello+Reader")
'Hello Reader'
distances_true
{1: 0, 2: 0, 3: 35}
distances_false
{1: 0, 2: 0, 3: 0}
As we can see from the distances_true
and distances_false
maps, conditions 1 and 2 have evaluated to true and to false, whereas condition 3 has only evaluated to false, when executed on cgi_decode_instrumented
. This is as expected, since the while-loop was entered and left, and there was one white space but no %
-character in the input string.
As an example, let's take as objective to test the part of cgi_decode()
that decodes valid hexadecimal codes. This means that we want to make condition 1 true, 2 false, 3 true, and 4 true. To represent such a path, we can simply sum up the branch distances for exactly these branches. However, there is a potential issue with simply summing up branch distances: If the distance for one condition depends on a comparison of very large values and the distance calculation for another condition depends on small values, then an improvement of the large values would very likely lead to a better fitness improvement, and thus bias the search. To avoid this, we need to normalize branch distances before adding them up.
A normalization function for a range [a, b]
takes a number as input and returns a value that is >=a
and <=b
. The important thing about the function is that for any two numbers x
and y
the ordering needs to be preserved by the normalization. That is, if x<y
then it must also hold that normalize(x) < normalize(y)
. There are many different functions that could achieve this result; a simple one is normalize(x) = x/(x+k)
: It is computationally cheap, and will normalize any positive value in the range [0,1]
(to change this to [0,b]
one would just need to multiply by b
). If we use this normalization function, we also know the maximum value: it is 1.0
. The function assumes that the value to be normalized is positive. The value of the factor k
defines the steepness of the curve. For example, for k=1
(which is a good default value) the curve is very steep, with values quickly approaching, but never reaching, 1
.