Producing inputs from grammars gives all possible expansions of a rule the same likelihood. For producing a comprehensive test suite, however, it makes more sense to maximize *variety* – for instance, by not repeating the same expansions over and over again. In this chapter, we explore how to systematically *cover* elements of a grammar such that we maximize variety and do not miss out individual elements.

The aim of test generation is to cover all functionality of a program – hopefully including the failing functionality, of course. This functionality, however, is tied to the structure of the input: If we fail to produce certain input elements, then the associated code and functionality will not be triggered either, nixing our chances to find a bug in there.

As an example, consider our expression grammar `EXPR_GRAMMAR`

from the chapter on grammars.. If we do not produce negative numbers, then negative numbers will not be tested. If we do not produce floating-point numbers, then floating-point numbers will not be tested. Our aim must thus be to *cover all possible expansions*.

*track* the expansions that occur during grammar production: If we already have seen some expansion, we can prefer other possible expansion candidates out of the set of possible expansions. Consider the following rule in our expression grammar:

In [3]:

```
EXPR_GRAMMAR["<factor>"]
```

Out[3]:

['+<factor>', '-<factor>', '(<expr>)', '<integer>.<integer>', '<integer>']

`<integer>`

in the first expansion of `<factor>`

. As it comes to expand the next factor, we would mark the `<integer>`

expansion as already covered, and choose one of the yet uncovered alternatives such as `-<factor>`

(a negative number) or `<integer>.<integer>`

(a floating-point number). Only when we have covered all alternatives would we go back and reconsider expansions covered before.

This concept of *grammar coverage* is very easy to implement. We introduce a class `GrammarCoverageFuzzer`

that keeps track of the current grammar coverage achieved:

In [6]:

```
class TrackingGrammarCoverageFuzzer(GrammarFuzzer):
def __init__(self, *args, **kwargs):
# invoke superclass __init__(), passing all arguments
super().__init__(*args, **kwargs)
self.reset_coverage()
def reset_coverage(self):
self.covered_expansions = set()
def expansion_coverage(self):
return self.covered_expansions
```

In this set `covered_expansions`

, we store individual expansions seen as pairs of (*symbol*, *expansion*), using the function `expansion_key()`

to generate a string representation for the pair.

In [7]:

```
def expansion_key(symbol, expansion):
"""Convert (symbol, children) into a key. `children` can be an expansion string or a derivation tree."""
if isinstance(expansion, tuple):
expansion = expansion[0]
if not isinstance(expansion, str):
children = expansion
expansion = all_terminals((symbol, children))
return symbol + " -> " + expansion
```

In [8]:

```
expansion_key(START_SYMBOL, EXPR_GRAMMAR[START_SYMBOL][0])
```

Out[8]:

'<start> -> <expr>'

Instead of *expansion*, we can also pass a list of children as argument, which will then automatically be converted into a string.

In [9]:

```
children = [("<expr>", None), (" + ", []), ("<term>", None)]
expansion_key("<expr>", children)
```

Out[9]:

'<expr> -> <expr> + <term>'

We can compute the set of possible expansions in a grammar by enumerating all expansions. The method `max_expansion_coverage()`

traverses the grammar recursively starting from the given symbol (by default: the grammar start symbol) and accumulates all expansions in the set `expansions`

. With the `max_depth`

parameter (default: $\infty$), we can control how deep the grammar exploration should go; we will need this later in the chapter.

In [10]:

```
class TrackingGrammarCoverageFuzzer(TrackingGrammarCoverageFuzzer):
def _max_expansion_coverage(self, symbol, max_depth):
if max_depth <= 0:
return set()
self._symbols_seen.add(symbol)
expansions = set()
for expansion in self.grammar[symbol]:
expansions.add(expansion_key(symbol, expansion))
for nonterminal in nonterminals(expansion):
if nonterminal not in self._symbols_seen:
expansions |= self._max_expansion_coverage(
nonterminal, max_depth - 1)
return expansions
def max_expansion_coverage(self, symbol=None, max_depth=float('inf')):
"""Return set of all expansions in a grammar starting with `symbol`"""
if symbol is None:
symbol = self.start_symbol
self._symbols_seen = set()
cov = self._max_expansion_coverage(symbol, max_depth)
if symbol == START_SYMBOL:
assert len(self._symbols_seen) == len(self.grammar)
return cov
```

We can use `max_expansion_coverage()`

to compute all the expansions within the expression grammar:

In [11]:

```
expr_fuzzer = TrackingGrammarCoverageFuzzer(EXPR_GRAMMAR)
expr_fuzzer.max_expansion_coverage()
```

Out[11]:

{'<digit> -> 0', '<digit> -> 1', '<digit> -> 2', '<digit> -> 3', '<digit> -> 4', '<digit> -> 5', '<digit> -> 6', '<digit> -> 7', '<digit> -> 8', '<digit> -> 9', '<expr> -> <term>', '<expr> -> <term> + <expr>', '<expr> -> <term> - <expr>', '<factor> -> (<expr>)', '<factor> -> +<factor>', '<factor> -> -<factor>', '<factor> -> <integer>', '<factor> -> <integer>.<integer>', '<integer> -> <digit>', '<integer> -> <digit><integer>', '<start> -> <expr>', '<term> -> <factor>', '<term> -> <factor> * <term>', '<term> -> <factor> / <term>'}

During expansion, we can keep track of expansions seen. To do so, we hook into the method `choose_node_expansion()`

, expanding a single node in our Grammar fuzzer.

In [12]:

```
class TrackingGrammarCoverageFuzzer(TrackingGrammarCoverageFuzzer):
def add_coverage(self, symbol, new_children):
key = expansion_key(symbol, new_children)
if self.log and key not in self.covered_expansions:
print("Now covered:", key)
self.covered_expansions.add(key)
def choose_node_expansion(self, node, possible_children):
(symbol, children) = node
index = super().choose_node_expansion(node, possible_children)
self.add_coverage(symbol, possible_children[index])
return index
```

`missing_expansion_coverage()`

is a helper method that returns the expansions that still have to be covered:

In [13]:

```
class TrackingGrammarCoverageFuzzer(TrackingGrammarCoverageFuzzer):
def missing_expansion_coverage(self):
return self.max_expansion_coverage() - self.expansion_coverage()
```

Let us show how tracking works. To keep things simple, let us focus on `<digit>`

expansions only.

In [14]:

```
digit_fuzzer = TrackingGrammarCoverageFuzzer(
EXPR_GRAMMAR, start_symbol="<digit>", log=True)
digit_fuzzer.fuzz()
```

Tree: <digit> Expanding <digit> randomly Now covered: <digit> -> 9 Tree: 9 '9'

Out[14]:

'9'

In [15]:

```
digit_fuzzer.fuzz()
```

Tree: <digit> Expanding <digit> randomly Now covered: <digit> -> 0 Tree: 0 '0'

Out[15]:

'0'

In [16]:

```
digit_fuzzer.fuzz()
```

Tree: <digit> Expanding <digit> randomly Now covered: <digit> -> 5 Tree: 5 '5'

Out[16]:

'5'

Here's the set of covered expansions so far:

In [17]:

```
digit_fuzzer.expansion_coverage()
```

Out[17]:

{'<digit> -> 0', '<digit> -> 5', '<digit> -> 9'}

This is the set of all expansions we can cover:

In [18]:

```
digit_fuzzer.max_expansion_coverage()
```

Out[18]:

{'<digit> -> 0', '<digit> -> 1', '<digit> -> 2', '<digit> -> 3', '<digit> -> 4', '<digit> -> 5', '<digit> -> 6', '<digit> -> 7', '<digit> -> 8', '<digit> -> 9'}

This is the missing coverage:

In [19]:

```
digit_fuzzer.missing_expansion_coverage()
```

Out[19]:

{'<digit> -> 1', '<digit> -> 2', '<digit> -> 3', '<digit> -> 4', '<digit> -> 6', '<digit> -> 7', '<digit> -> 8'}

On average, how many characters do we have to produce until all expansions are covered?

In [20]:

```
def average_length_until_full_coverage(fuzzer):
trials = 50
sum = 0
for trial in range(trials):
# print(trial, end=" ")
fuzzer.reset_coverage()
while len(fuzzer.missing_expansion_coverage()) > 0:
s = fuzzer.fuzz()
sum += len(s)
return sum / trials
```

In [21]:

```
digit_fuzzer.log = False
average_length_until_full_coverage(digit_fuzzer)
```

Out[21]:

28.4

For full expressions, this takes a bit longer:

In [22]:

```
expr_fuzzer = TrackingGrammarCoverageFuzzer(EXPR_GRAMMAR)
average_length_until_full_coverage(expr_fuzzer)
```

Out[22]:

138.12

Let us now not only track coverage, but actually *produce* coverage. The idea is as follows:

- We determine children yet uncovered (in
`uncovered_children`

) - If all children are covered, we fall back to the original method (i.e., choosing one expansion randomly)
- Otherwise, we select a child from the uncovered children and mark it as covered.

To this end, we introduce a new fuzzer `SimpleGrammarCoverageFuzzer`

that implements this strategy in the `choose_node_expansion()`

method.

In [23]:

```
class SimpleGrammarCoverageFuzzer(TrackingGrammarCoverageFuzzer):
def choose_node_expansion(self, node, possible_children):
# Prefer uncovered expansions
(symbol, children) = node
uncovered_children = [c for (i, c) in enumerate(possible_children)
if expansion_key(symbol, c) not in self.covered_expansions]
index_map = [i for (i, c) in enumerate(possible_children)
if c in uncovered_children]
if len(uncovered_children) == 0:
# All expansions covered - use superclass method
return self.choose_covered_node_expansion(node, possible_children)
# Select from uncovered nodes
index = self.choose_uncovered_node_expansion(node, uncovered_children)
return index_map[index]
```

The two methods `choose_covered_node_expansion()`

and `choose_uncovered_node_expansion()`

are provided for subclasses to hook in:

In [24]:

```
class SimpleGrammarCoverageFuzzer(SimpleGrammarCoverageFuzzer):
def choose_uncovered_node_expansion(self, node, possible_children):
return TrackingGrammarCoverageFuzzer.choose_node_expansion(
self, node, possible_children)
def choose_covered_node_expansion(self, node, possible_children):
return TrackingGrammarCoverageFuzzer.choose_node_expansion(
self, node, possible_children)
```

By returning the set of expansions covered so far, we can invoke the fuzzer multiple times, each time adding to the grammar coverage. Using the `EXPR_GRAMMAR`

grammar to produce digits, for instance, the fuzzer produces one digit after the other:

In [25]:

```
f = SimpleGrammarCoverageFuzzer(EXPR_GRAMMAR, start_symbol="<digit>")
f.fuzz()
```

Out[25]:

'5'

In [26]:

```
f.fuzz()
```

Out[26]:

'2'

In [27]:

```
f.fuzz()
```

Out[27]:

'1'

Here's the set of covered expansions so far:

In [28]:

```
f.expansion_coverage()
```

Out[28]:

{'<digit> -> 1', '<digit> -> 2', '<digit> -> 5'}

Let us fuzz some more. We see that with each iteration, we cover another expansion:

In [29]:

```
for i in range(7):
print(f.fuzz(), end=" ")
```

0 9 7 4 8 3 6

At the end, all expansions are covered:

In [30]:

```
f.missing_expansion_coverage()
```

Out[30]:

set()

Let us apply this on a more complex grammar – e.g., the full expression grammar. We see that after a few iterations, we cover each and every digit, operator, and expansion:

In [31]:

```
f = SimpleGrammarCoverageFuzzer(EXPR_GRAMMAR)
for i in range(10):
print(f.fuzz())
```

Again, all expansions are covered:

In [32]:

```
f.missing_expansion_coverage()
```

Out[32]:

set()

We see that our strategy is much more effective in achieving coverage than the random approach:

In [33]:

```
average_length_until_full_coverage(SimpleGrammarCoverageFuzzer(EXPR_GRAMMAR))
```

Out[33]:

52.28

Selecting expansions for individual rules is a good start; however, it is not sufficient, as the following example shows. We apply our coverage fuzzer on the CGI grammar from the chapter on coverage:

In [34]:

```
CGI_GRAMMAR
```

Out[34]:

{'<start>': ['<string>'], '<string>': ['<letter>', '<letter><string>'], '<letter>': ['<plus>', '<percent>', '<other>'], '<plus>': ['+'], '<percent>': ['%<hexdigit><hexdigit>'], '<hexdigit>': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'], '<other>': ['0', '1', '2', '3', '4', '5', 'a', 'b', 'c', 'd', 'e', '-', '_']}

In [35]:

```
f = SimpleGrammarCoverageFuzzer(CGI_GRAMMAR)
for i in range(10):
print(f.fuzz())
```

c +%a6++ +- + ++ %18%b7 +e _ d2+%e3 %d0

After 10 iterations, we still have a number of expansions uncovered:

In [36]:

```
f.missing_expansion_coverage()
```

Out[36]:

{'<hexdigit> -> 2', '<hexdigit> -> 4', '<hexdigit> -> 5', '<hexdigit> -> 9', '<hexdigit> -> c', '<hexdigit> -> f', '<other> -> 0', '<other> -> 1', '<other> -> 3', '<other> -> 4', '<other> -> 5', '<other> -> a', '<other> -> b'}

Why is that so? The problem is that in the CGI grammar, the largest number of variations to be covered occurs in the `hexdigit`

rule. However, we first need to *reach* this expansion. When expanding a `<letter>`

symbol, we have the choice between three possible expansions:

In [37]:

```
CGI_GRAMMAR["<letter>"]
```

Out[37]:

['<plus>', '<percent>', '<other>']

`choose_node_expansion()`

above will choose one randomly – even if there may be more expansions to cover when choosing `<percent>`

.

What we need is a better strategy that will pick `<percent>`

if there are more uncovered expansions following – even if `<percent>`

is covered. Such a strategy was first discussed by W. Burkhardt \cite{Burkhardt1967} under the name of "Shortest Path Selection":

This version selects, from several alternatives for development, that syntactic unit under which there is still an unused unit available, starting with the shortest path.

This is what we will implement in the next steps.

To address this problem, we introduce a new class `GrammarCoverageFuzzer`

that builds on `SimpleGrammarCoverageFuzzer`

, but with a better strategy. First, we need to compute the *maximum set of expansions* that can be reached from a particular symbol, as we already have implemented in `max_expansion_coverage()`

. The idea is to later compute the *intersection* of this set and the expansions already covered, such that we can favor those expansions with a non-empty intersection.

`symbol`

parameter to `max_expansion_coverage()`

, we can compute the possible expansions for every symbol:

In [38]:

```
f = SimpleGrammarCoverageFuzzer(EXPR_GRAMMAR)
f.max_expansion_coverage('<integer>')
```

Out[38]:

{'<digit> -> 0', '<digit> -> 1', '<digit> -> 2', '<digit> -> 3', '<digit> -> 4', '<digit> -> 5', '<digit> -> 6', '<digit> -> 7', '<digit> -> 8', '<digit> -> 9', '<integer> -> <digit>', '<integer> -> <digit><integer>'}

In [39]:

```
f.max_expansion_coverage('<digit>')
```

Out[39]:

We can now start to implement `GrammarCoverageFuzzer`

. Computing `max_expansion_coverage()`

allows us to determine the *missing coverage* for each child. To this end, we *subtract* the coverage already seen (`expansion_coverage()`

) from the coverage that could be obtained.

In [40]:

```
class GrammarCoverageFuzzer(SimpleGrammarCoverageFuzzer):
def _new_child_coverage(self, children, max_depth):
new_cov = set()
for (c_symbol, _) in children:
if c_symbol in self.grammar:
new_cov |= self.max_expansion_coverage(
c_symbol, max_depth)
return new_cov
def new_child_coverage(self, symbol, children, max_depth=float('inf')):
"""Return new coverage that would be obtained by expanding (symbol, children)"""
new_cov = self._new_child_coverage(children, max_depth)
new_cov.add(expansion_key(symbol, children))
new_cov -= self.expansion_coverage() # -= is set subtraction
return new_cov
```

Let us illustrate `new_child_coverage()`

. We again start fuzzing, choosing expansions randomly.

In [41]:

```
f = GrammarCoverageFuzzer(EXPR_GRAMMAR, start_symbol="<digit>", log=True)
f.fuzz()
```

Tree: <digit> Expanding <digit> randomly Now covered: <digit> -> 2 Tree: 2 '2'

Out[41]:

'2'

This is our current coverage:

In [42]:

```
f.expansion_coverage()
```

Out[42]:

{'<digit> -> 2'}

When we go through the individual expansion possibilities for `<digit>`

, we see that all expansions offer additional coverage, *except* for the one we have just seen.

In [43]:

```
for expansion in EXPR_GRAMMAR["<digit>"]:
children = f.expansion_to_children(expansion)
print(expansion, f.new_child_coverage("<digit>", children))
```

`new_child_coverage()`

and choose among the expansions that offer the greatest new (unseen) coverage.

When choosing a child, we do not look out for the maximum overall coverage to be obtained, as this would have expansions with many uncovered possibilities totally dominate other expansions. Instead, we aim for a *breadth-first* strategy, first covering all expansions up to a given depth, and only then looking for a greater depth. The method `new_coverages()`

is at the heart of this strategy: Starting with a maximum depth (`max_depth`

) of zero, it increases the depth until it finds at least one uncovered expansion.

In [44]:

```
class GrammarCoverageFuzzer(GrammarCoverageFuzzer):
def new_coverages(self, node, possible_children):
"""Return coverage to be obtained for each child at minimum depth"""
(symbol, children) = node
for max_depth in range(len(self.grammar)):
new_coverages = [
self.new_child_coverage(
symbol, c, max_depth) for c in possible_children]
max_new_coverage = max(len(new_coverage)
for new_coverage in new_coverages)
if max_new_coverage > 0:
# Uncovered node found
return new_coverages
# All covered
return None
```

We can now define `choose_node_expansion()`

to make use of this strategy: First, we determine the possible coverages to be obtained (using `new_coverages()`

); then we (randomly) select among the children which sport the maximum coverage.

In [45]:

```
class GrammarCoverageFuzzer(GrammarCoverageFuzzer):
def choose_node_expansion(self, node, possible_children):
(symbol, children) = node
new_coverages = self.new_coverages(node, possible_children)
if new_coverages is None:
# All expansions covered - use superclass method
return self.choose_covered_node_expansion(node, possible_children)
max_new_coverage = max(len(cov) for cov in new_coverages)
children_with_max_new_coverage = [c for (i, c) in enumerate(possible_children)
if len(new_coverages[i]) == max_new_coverage]
index_map = [i for (i, c) in enumerate(possible_children)
if len(new_coverages[i]) == max_new_coverage]
# Select a random expansion
new_children_index = self.choose_uncovered_node_expansion(
node, children_with_max_new_coverage)
new_children = children_with_max_new_coverage[new_children_index]
# Save the expansion as covered
key = expansion_key(symbol, new_children)
if self.log:
print("Now covered:", key)
self.covered_expansions.add(key)
return index_map[new_children_index]
```

Our fuzzer is now complete. Let us apply it on a series of examples. On expressions, it quickly covers all digits and operators:

In [46]:

```
f = GrammarCoverageFuzzer(EXPR_GRAMMAR, min_nonterminals=3)
f.fuzz()
```

Out[46]:

'-4.02 / (1) * +3 + 5.9 / 7 * 8 - 6'

In [47]:

```
f.max_expansion_coverage() - f.expansion_coverage()
```

Out[47]:

set()

On average, it is again faster than the simple strategy:

In [48]:

```
average_length_until_full_coverage(GrammarCoverageFuzzer(EXPR_GRAMMAR))
```

Out[48]:

50.74

On the CGI grammar, it takes but a few iterations to cover all letters and digits:

In [49]:

```
f = GrammarCoverageFuzzer(CGI_GRAMMAR, min_nonterminals=5)
while len(f.max_expansion_coverage() - f.expansion_coverage()) > 0:
print(f.fuzz())
```

%18%d03 %c3%94%7f+cd %a6%b5%e2%5e%4c-54e01a2 %5eb%7cb_ec%a0+

This improvement can also be seen in comparing the random, expansion-only, and deep foresight strategies on the CGI grammar:

In [50]:

```
average_length_until_full_coverage(TrackingGrammarCoverageFuzzer(CGI_GRAMMAR))
```

Out[50]:

211.34

In [51]:

```
average_length_until_full_coverage(SimpleGrammarCoverageFuzzer(CGI_GRAMMAR))
```

Out[51]:

68.64

In [52]:

```
average_length_until_full_coverage(GrammarCoverageFuzzer(CGI_GRAMMAR))
```

Out[52]:

40.38

Sometimes, grammar elements are used in more than just one place. In our expression grammar, for instance, the `<integer>`

symbol is used for integer numbers as well as for floating point numbers:

In [53]:

```
EXPR_GRAMMAR["<factor>"]
```

Out[53]:

['+<factor>', '-<factor>', '(<expr>)', '<integer>.<integer>', '<integer>']

`<integer>`

expansions (i.e., all `<digit>`

expansions) are covered. However, the individual digits would be *distributed* across all occurrences of `<integer>`

in the grammar. If our coverage-based fuzzer produces, say, `1234.56`

and `7890`

, we would have full coverage of all digit expansions. However, `<integer>.<integer>`

and `<integer>`

in the `<factor>`

expansions above would individually cover only a fraction of the digits. If floating-point numbers and whole numbers have different functions that read them in, we would like each of these functions to be tested with all digits; maybe we would also like the whole and fractional part of a floating-point number to be tested with all digits each.

Ignoring the context in which a symbol is used (in our case, the various uses of `<integer>`

and `<digit>`

in the `<factor>`

context) can be useful if we can assume that all occurrences of this symbol are treated alike anyway. If not, though, one way to ensure that an occurrence of a symbol is systematically covered independently of other occurrences is to assign the occurrence to a new symbol which is a *duplicate* of the old symbol. We will first show how to *manually* create such duplicates, and then a dedicated function which does it automatically.

As stated above, one simple way to achieve coverage in context is by *duplicating* symbols as well as the rules they reference to. For instance, we could replace `<integer>.<integer>`

by `<integer-1>.<integer-2>`

and give `<integer-1>`

and `<integer-2>`

the same definitions as the original `<integer>`

. This would mean that not only all expansions of `<integer>`

, but also all expansions of `<integer-1>`

and `<integer-2>`

would be covered.

Let us illustrate this with actual code:

In [54]:

```
dup_expr_grammar = extend_grammar(EXPR_GRAMMAR,
{
"<factor>": ["+<factor>", "-<factor>", "(<expr>)", "<integer-1>.<integer-2>", "<integer>"],
"<integer-1>": ["<digit-1><integer-1>", "<digit-1>"],
"<integer-2>": ["<digit-2><integer-2>", "<digit-2>"],
"<digit-1>":
["0", "1", "2", "3", "4",
"5", "6", "7", "8", "9"],
"<digit-2>":
["0", "1", "2", "3", "4",
"5", "6", "7", "8", "9"]
}
)
```

If we now run our coverage-based fuzzer on the extended grammar, we will cover all digits both of regular integers, as well as all digits in the whole and fraction part of floating-point numbers:

In [56]:

```
f = GrammarCoverageFuzzer(dup_expr_grammar, start_symbol="<factor>")
for i in range(10):
print(f.fuzz())
```

If we want to enhance coverage in context, manually adapting our grammars may not be the perfect choice, since any change to the grammar will have to be replicated in all duplicates. Instead, we introduce a function that will do the duplication for us.

The function `duplicate_context()`

takes a grammar, a symbol in the grammar, and an expansion of this symbol (`None`

or not given: all expansions of symbol), and it changes the expansion to refer to a duplicate of all originally referenced rules. The idea is that we invoke it as

```
dup_expr_grammar = extend_grammar(EXPR_GRAMMAR)
duplicate_context(dup_expr_grammar, "<factor>", "<integer>.<integer>")
```

and get a similar result as with our manual changes, above.

Here is the code:

In [58]:

```
def duplicate_context(grammar, symbol, expansion=None, depth=float('inf')):
"""Duplicate an expansion within a grammar.
In the given grammar, take the given expansion of the given symbol
(if expansion is omitted: all symbols), and replace it with a
new expansion referring to a duplicate of all originally referenced rules.
If depth is given, limit duplication to `depth` references (default: unlimited)
"""
orig_grammar = extend_grammar(grammar)
_duplicate_context(grammar, orig_grammar, symbol,
expansion, depth, seen={})
# After duplication, we may have unreachable rules; delete them
for nonterminal in unreachable_nonterminals(grammar):
del grammar[nonterminal]
```

The bulk of the work takes place in this helper function. The additional parameter `seen`

keeps track of symbols already expanded and avoids infinite recursion.

In [60]:

```
def _duplicate_context(grammar, orig_grammar, symbol, expansion, depth, seen):
for i in range(len(grammar[symbol])):
if expansion is None or grammar[symbol][i] == expansion:
new_expansion = ""
for (s, c) in expansion_to_children(grammar[symbol][i]):
if s in seen: # Duplicated already
new_expansion += seen[s]
elif c == [] or depth == 0: # Terminal symbol or end of recursion
new_expansion += s
else: # Nonterminal symbol - duplicate
# Add new symbol with copy of rule
new_s = new_symbol(grammar, s)
grammar[new_s] = copy.deepcopy(orig_grammar[s])
# Duplicate its expansions recursively
# {**seen, **{s: new_s}} is seen + {s: new_s}
_duplicate_context(grammar, orig_grammar, new_s, expansion=None,
depth=depth - 1, seen={**seen, **{s: new_s}})
new_expansion += new_s
grammar[symbol][i] = new_expansion
```

Here's our above example of how `duplicate_context()`

works, now with results. We let it duplicate the `<integer>.<integer>`

expansion in our expression grammar, and obtain a new grammar with an `<integer-1>.<integer-2>`

expansion where both `<integer-1>`

and `<integer-2>`

refer to copies of the original rules:

In [61]:

```
dup_expr_grammar = extend_grammar(EXPR_GRAMMAR)
duplicate_context(dup_expr_grammar, "<factor>", "<integer>.<integer>")
dup_expr_grammar
```

Out[61]:

{'<start>': ['<expr>'], '<expr>': ['<term> + <expr>', '<term> - <expr>', '<term>'], '<term>': ['<factor> * <term>', '<factor> / <term>', '<factor>'], '<factor>': ['+<factor>', '-<factor>', '(<expr>)', '<integer-1>.<integer-2>', '<integer>'], '<integer>': ['<digit><integer>', '<digit>'], '<digit>': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], '<integer-1>': ['<digit-1><integer-1>', '<digit-2>'], '<digit-1>': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], '<digit-2>': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], '<integer-2>': ['<digit-3><integer-2>', '<digit-4>'], '<digit-3>': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], '<digit-4>': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']}

Just like above, using such a grammar for coverage fuzzing will now cover digits in a number of contexts. To be precise, there are five contexts: Regular integers, as well as single-digit and multi-digit whole and fractional parts of floating-point numbers.

In [62]:

```
f = GrammarCoverageFuzzer(dup_expr_grammar, start_symbol="<factor>")
for i in range(10):
print(f.fuzz())
```

(57.5) 2 +-(1 / 3 + 6 / 0 - 7 * 59 * 3 + 8 * 4) 374.88 5.709 0.93 01.1 892.27 219.50 6.636

The `depth`

parameter controls how deep the duplication should go. Setting `depth`

to 1 will duplicate only the next rule:

In [63]:

```
dup_expr_grammar = extend_grammar(EXPR_GRAMMAR)
duplicate_context(dup_expr_grammar, "<factor>", "<integer>.<integer>", depth=1)
dup_expr_grammar
```

Out[63]:

{'<start>': ['<expr>'], '<expr>': ['<term> + <expr>', '<term> - <expr>', '<term>'], '<term>': ['<factor> * <term>', '<factor> / <term>', '<factor>'], '<factor>': ['+<factor>', '-<factor>', '(<expr>)', '<integer-1>.<integer-2>', '<integer>'], '<integer>': ['<digit><integer>', '<digit>'], '<digit>': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], '<integer-1>': ['<digit><integer-1>', '<digit>'], '<integer-2>': ['<digit><integer-2>', '<digit>']}

By default, `depth`

is set to $\infty$, indicating unlimited duplication. True unbounded duplication could lead to problems for a recursive grammar such as `EXPR_GRAMMAR`

, so `duplicate_context()`

is set to no longer duplicate symbols once duplicated. Still, if we apply it to duplicate *all* `<expr>`

expansions, we obtain a grammar with no less than 296 rules:

In [65]:

```
dup_expr_grammar = extend_grammar(EXPR_GRAMMAR)
duplicate_context(dup_expr_grammar, "<expr>")
```

In [66]:

```
assert is_valid_grammar(dup_expr_grammar)
len(dup_expr_grammar)
```

Out[66]:

292

This gives us almost 2000 expansions to cover:

In [67]:

```
f = GrammarCoverageFuzzer(dup_expr_grammar)
len(f.max_expansion_coverage())
```

Out[67]:

1981

Duplicating one more time keeps on both growing the grammar and the coverage requirements:

In [68]:

```
dup_expr_grammar = extend_grammar(EXPR_GRAMMAR)
duplicate_context(dup_expr_grammar, "<expr>")
duplicate_context(dup_expr_grammar, "<expr-1>")
len(dup_expr_grammar)
```

Out[68]:

594

In [69]:

```
f = GrammarCoverageFuzzer(dup_expr_grammar)
len(f.max_expansion_coverage())
```

Out[69]:

3994

At this point, plenty of contexts can be covered individually – for instance, multiplications of elements within additions:

In [70]:

```
dup_expr_grammar["<expr>"]
```

Out[70]:

['<term-1> + <expr-4>', '<term-5> - <expr-8>', '<term-9>']

In [71]:

```
dup_expr_grammar["<term-1-1>"]
```

Out[71]:

['<factor-1-1> * <term-1-1>', '<factor-2-1> / <term-1-1>', '<factor-3-1>']

In [72]:

```
dup_expr_grammar["<factor-1-1>"]
```

Out[72]:

['+<factor-1-1>', '-<factor-1-1>', '(<expr-1-1>)', '<integer-1-1>.<integer-2-1>', '<integer-3-1>']

The resulting grammars may no longer be useful for human maintenance; but running a coverage-driven fuzzer such as `GrammarCoverageFuzzer()`

will then go and cover all these expansions in all contexts. If you want to cover elements in a large number of contexts, then `duplicate_context()`

followed by a coverage-driven fuzzer is your friend.

With or without context: By systematically covering all input elements, we get a larger variety in our inputs – but does this translate into a wider variety of program behaviors? After all, these behaviors are what we want to cover, including the unexpected behaviors.

`+`

, subtraction triggered by the presence of `-`

, and floating-point arithmetic triggered by the presence of floating-point numbers in the input.

*correlation between grammar coverage and code coverage*. In other words: If we can achieve a high grammar coverage, this also leads to a high code coverage.

Let us explore this relationship on one of our grammars – say, the CGI decoder from the chapter on coverage. We compute a mapping `coverages`

where in `coverages[x]`

= `{y_1, y_2, ...}`

, `x`

is the grammar coverage obtained, and `y_n`

is the code coverage obtained for the `n`

-th run.

We first compute the maximum coverage, as in the the chapter on coverage:

In [74]:

```
with Coverage() as cov_max:
cgi_decode('+')
cgi_decode('%20')
cgi_decode('abc')
try:
cgi_decode('%?a')
except:
pass
```

Now, we run our experiment:

In [75]:

```
f = GrammarCoverageFuzzer(CGI_GRAMMAR, max_nonterminals=2)
coverages = {}
trials = 100
for trial in range(trials):
f.reset_coverage()
overall_cov = set()
max_cov = 30
for i in range(10):
s = f.fuzz()
with Coverage() as cov:
cgi_decode(s)
overall_cov |= cov.coverage()
x = len(f.expansion_coverage()) * 100 / len(f.max_expansion_coverage())
y = len(overall_cov) * 100 / len(cov_max.coverage())
if x not in coverages:
coverages[x] = []
coverages[x].append(y)
```

We compute the averages for the `y`

-values:

In [76]:

```
xs = list(coverages.keys())
ys = [sum(coverages[x]) / len(coverages[x]) for x in coverages]
```

and create a scatter plot:

In [79]:

```
ax = plt.axes(label="coverage")
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
ax.xaxis.set_major_formatter(mtick.PercentFormatter())
plt.xlim(0, max(xs))
plt.ylim(0, max(ys))
plt.title('Coverage of cgi_decode() vs. grammar coverage')
plt.xlabel('grammar coverage (expansions)')
plt.ylabel('code coverage (lines)')
plt.scatter(xs, ys);
```

We see that the higher the grammar coverage, the higher the code coverage.

This also translates into a correlation coefficient of about 0.9, indicating a strong correlation:

In [81]:

```
np.corrcoef(xs, ys)
```

Out[81]:

array([[1. , 0.81663071], [0.81663071, 1. ]])

This is also confirmed by the Spearman rank correlation:

In [83]:

```
spearmanr(xs, ys)
```

Out[83]:

SpearmanrResult(correlation=0.937547293248041, pvalue=1.0928720949027369e-09)

Let us repeat this experiment on URL grammars. We use the same code as above, except for exchanging the grammars and the function in place:

In [85]:

```
with Coverage() as cov_max:
urlparse("http://foo.bar/path")
urlparse("https://foo.bar#fragment")
urlparse("ftp://user:password@foo.bar?query=value")
urlparse("ftps://127.0.0.1/?x=1&y=2")
```

Here comes the actual experiment:

In [86]:

```
f = GrammarCoverageFuzzer(URL_GRAMMAR, max_nonterminals=2)
coverages = {}
trials = 100
for trial in range(trials):
f.reset_coverage()
overall_cov = set()
for i in range(20):
s = f.fuzz()
with Coverage() as cov:
urlparse(s)
overall_cov |= cov.coverage()
x = len(f.expansion_coverage()) * 100 / len(f.max_expansion_coverage())
y = len(overall_cov) * 100 / len(cov_max.coverage())
if x not in coverages:
coverages[x] = []
coverages[x].append(y)
```

In [87]:

```
xs = list(coverages.keys())
ys = [sum(coverages[x]) / len(coverages[x]) for x in coverages]
```

In [88]:

```
ax = plt.axes(label="coverage")
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
ax.xaxis.set_major_formatter(mtick.PercentFormatter())
plt.xlim(0, max(xs))
plt.ylim(0, max(ys))
plt.title('Coverage of urlparse() vs. grammar coverage')
plt.xlabel('grammar coverage (expansions)')
plt.ylabel('code coverage (lines)')
plt.scatter(xs, ys);
```

Here, we have an even stronger correlation of more than .95:

In [89]:

```
np.corrcoef(xs, ys)
```

Out[89]:

array([[1. , 0.94110679], [0.94110679, 1. ]])

This is also confirmed by the Spearman rank correlation:

In [90]:

```
spearmanr(xs, ys)
```

Out[90]:

SpearmanrResult(correlation=1.0, pvalue=0.0)

The correlation observed for the CGI and URL examples will not hold for every program and every structure.

First, some grammar elements are treated uniformly by a program even though the grammar sees them as different symbols. In the host name of a URL, for instance, we can have many different characters, although a URL-handling program treats them all the same. Likewise, individual digits, once composed into a number, make less of a difference than the value of the number itself. Hence, achieving variety in digits or characters will not necessarily yield a large difference in functionality.

*differentiating elements dependent on their context*, and covering alternatives for each context, as discussed above. The key is to identify the contexts in which variety is required, and those where it is not.

Second, the way the data is processed can make a large difference. Consider the input to a *media player*, consisting of compressed media data. While processing the media data, the media player will show differences in behavior (notably in its output), but these differences cannot be directly triggered through individual elements of the media data. Likewise, a *machine learner* that is trained on a large set of inputs typically will not have its behavior controlled by a single syntactic element of the input. (Well, it could, but then, we would not need a machine learner.) In these cases of "deep" data processing, achieving structural coverage in the grammar will not necessarily induce code coverage.

*syntactic*, but actually *semantic* variety. In the chapter on fuzzing with constraints, we will see how to specifically generate and filter input values, especially numerical values. Such generators can also be applied in context, such that each and every facet of the input can be controlled individually.
Also, in the above examples, *some* parts of the input can still be covered structurally: *Metadata* (such as author name or composer for the media player) or *configuration data* (such as settings for the machine learner) can and should be covered systematically; we will see how this is done in the chapter on "Configuration fuzzing".

- Achieving
*grammar coverage*quickly results in a large variety of inputs. - Duplicating grammar rules allows to cover elements in specific
*contexts*. - Achieving grammar coverage can help in obtaining
*code coverage*.

The idea of ensuring that each expansion in the grammar is used at least once goes back to Burkhardt \cite{Burkhardt1967}, to be later rediscovered by Paul Purdom \cite{Purdom1972}. The relation between grammar coverage and code coverage was discovered by Nikolas Havrikov, who explores it in his PhD thesis.

Consider the Unix `ls`

program, used to list the contents of a directory. Create a grammar for invoking `ls`

:

In [91]:

```
LS_EBNF_GRAMMAR = {
'<start>': ['-<options>'],
'<options>': ['<option>*'],
'<option>': ['1', 'A', '@',
# many more
]
}
```

Use `GrammarCoverageFuzzer`

to test all options. Be sure to invoke `ls`

with each option set.

The value of `max_expansion_coverage()`

depends on the grammar only. Change the implementation such that the values are precomputed for each symbol and depth upon initialization (`__init__()`

); this way, `max_expansion_coverage()`

can simply lookup the value in the table.