# Fuzzing with Constraints¶

In previous chapters, we have seen how Grammar-Based Fuzzing allows us to efficiently generate myriads of syntactically valid inputs.
However, there are *semantic* input features that cannot be expressed in a context-free grammar, such as

- "$X$ is the length of $Y$";
- "$X$ is an identifier previously declared"; or
- "$X$ should be longer than 4,096 bytes".

In this chapter, we show how the ISLa framework allows us to express such features as *constraints* added to a grammar.
By having ISLa solve these constraints automatically, we produce inputs that are not only *syntactically* valid, but actually *semantically* valid.
Furthermore, such constraints allow us to very precisely *shape* the inputs we want for testing.

```
from bookutils import YouTubeVideo
YouTubeVideo("dgaGuwn-1OU")
```

**Prerequisites**

- You should have read the chapter on grammars.
- The chapter on generators and filters addresses a similar problem, but with program code instead of constraints.

```
import bookutils
```

## Synopsis¶

To use the code provided in this chapter, write

```
>>> from fuzzingbook.FuzzingWithConstraints import <identifier>
```

and then make use of the following features.

This chapter introduces the ISLa framework, consisting of

- the
*ISLa specification language*, allowing to add*constraints*to a grammar - the
*ISLa solver*, solving these constraints to produce semantically (and syntactically) valid inputs - the
*ISLa checker*, checking given inputs for whether they satisfy these constraints.

A typical usage of the ISLa solver is as follows. First, install ISLa, using

```
$ pip install isla-solver
```

Then, you can import the solver as

```
>>> from isla.solver import ISLaSolver
```

The ISLa solver needs two things. First, a *grammar* - say, US phone numbers.

```
>>> from Grammars import US_PHONE_GRAMMAR
```

Second, you need *constraints* – a string expressing a condition over one or more grammar elements.
Common functions include

`str.len()`

, returning the length of a string`str.to.int()`

, converting a string to an integer

Here, we instantiate the ISLa solver with a constraint stating that the area code should be above 900:

```
>>> solver = ISLaSolver(US_PHONE_GRAMMAR,
>>> """
>>> str.to.int(<area>) > 900
>>> """)
```

With that, invoking `solver.solve()`

returns a *solution* for the constraints.

```
>>> str(solver.solve())
'(901)638-5426'
```

`solve()`

returns a derivation tree, which typically is converted into a string using `str()`

as above. The `print()`

function does this implicitly.

Subsequent calls of `solve()`

return more solutions:

```
>>> for _ in range(10):
>>> print(solver.solve())
(901)317-9701
(901)797-2646
(901)249-8254
(901)845-5106
(901)547-7322
(901)416-7444
(901)971-8098
(901)434-9728
(901)750-3441
(904)598-0755
```

We see that the solver produces a number of inputs that all satisfy the constraint - the area code is always more than 900.

The `ISLaSolver()`

constructor provides several additional parameters to configure the solver, as documented below.
Additional `ISLaSolver`

methods allow to check inputs against constraints, and provide additional functionality.

The ISLa functionality is also available on the command line:

```
>>> !isla --help
usage: isla [-h] [-v] {solve,fuzz,check,parse,repair,mutate,create,config} ...
The ISLa command line interface.
options:
-h, --help show this help message and exit
-v, --version Print the ISLa version number
Commands:
{solve,fuzz,check,parse,repair,mutate,create,config}
solve create solutions to ISLa constraints or check their
unsatisfiability
fuzz pass solutions to an ISLa constraint to a test subject
check check whether an input satisfies an ISLa constraint
parse parse an input into a derivation tree if it satisfies
an ISLa constraint
repair try to repair an existing input such that it satisfies
an ISLa constraint
mutate mutate an input such that the result satisfies an ISLa
constraint
create create grammar and constraint stubs
config dumps the default configuration file
```

## Semantic Input Properties¶

In this book, we have frequently used grammars to systematically generate inputs that cover input structure and more.
But while it is relatively easy to express the *syntax* of an input using a grammar, there are input properties that *cannot* be expressed using a grammar.
Such input properties are called *semantic* properties.

Let us illustrate semantic properties using a simple example.
We want to test some system that is configured by two settings, a *page size* and a *buffer size*.
Both these come as integer numbers as part of a human-readable configuration file.
The *syntax* of this file is given by the following grammar:

```
from Grammars import Grammar, is_valid_grammar, syntax_diagram, crange
```

```
import string
```

```
CONFIG_GRAMMAR: Grammar = {
"<start>": ["<config>"],
"<config>": [
"pagesize=<pagesize>\n"
"bufsize=<bufsize>"
],
"<pagesize>": ["<int>"],
"<bufsize>": ["<int>"],
"<int>": ["<leaddigit><digits>"],
"<digits>": ["", "<digit><digits>"],
"<digit>": list("0123456789"),
"<leaddigit>": list("123456789")
}
```

```
assert is_valid_grammar(CONFIG_GRAMMAR)
```

Here's a visualization of this grammar as a railroad diagram, showing its structure:

Using this grammar, we can now use any of our grammar-based fuzzers to generate valid inputs. For instance:

```
from GrammarFuzzer import GrammarFuzzer, DerivationTree
```

```
fuzzer = GrammarFuzzer(CONFIG_GRAMMAR)
```

```
for i in range(10):
print(i)
print(fuzzer.fuzz())
```

So far, so good - and indeed, these random values will help us test our (hypothetical) system.
But what if we want to *control* these values further, putting our system to the test?

A grammar gives us *some* control. If we want to ensure a page size of at least 100,000, for instance, a rule like

```
"<bufsize>": ["<leaddigit><digit><digit><digit><digit><digit>"]
```

would do the job. We could also express that the page size should be an odd number, by having it end in an odd digit. But if we want to state that the page size should be, say, a multiple of 8, or larger or less than the buffer size, we are out of luck.

In the chapter on fuzzing with generators, we have seen how to attach *program code* to individual rules - program code that would either *generate* individual elements right away or *filter* only these that satisfy specific conditions.

Attaching code makes things very flexible, but also has several disadvantages:

- First, it is pretty hard to generate inputs that satisfy multiple constraints at once.
In essence, you have to code your own
*strategy*for generating inputs, which at some point negates the advantage of having an abstract representation such as a grammar. - Second, your code is not portable. While a grammar can be easily adapted to
*any*grammar-based fuzzer, adding, say, Python code, ties you to the Python environment forever. - Third, program code can only be used for
*producing*inputs or*checking*inputs, but not both. This, again, is a downside compared to a pure grammar representation.

Hence, we are looking to a more *general* way to express semantic properties - and also a more *declarative* way to express semantic properties.

## Unrestricted Grammars

One very general solution to this problem would be to use *unrestricted* grammars rather than the *context-free* grammars we have used so far.
In an unrestricted grammar, one can have multiple symbols also on the left hand side of an expansion rule, making them very flexible.
In fact, unrestricted grammars are *Turing-universal*, meaning that they can express any feature that could also be expressed in program code; and they could thus check and produce arbitrary strings with arbitrary features. (If they finish, that is – unrestricted grammars also suffer from the halting problem.)
The downside is that there is literally no programming support for unrestricted grammars – we'd have to implement all of arithmetics, strings, and other functionality from scratch in a grammar, which is - well - not fun.

## Specifying Constraints¶

In recent work, Dominic Steinhöfel and Andreas Zeller (one of the authors of this book) have presented an infrastructure that allows to produce inputs with *arbitrary properties*, but without having to go through the trouble of implementing producers or checkers.
Instead, they suggest a dedicated *language* for specifiying inputs, named ISLa (for input specification language).
*ISLa* combines a standard context-free *grammar* with *constraints* that express *semantic* properties of the inputs and their elements.
ISLa can be used as a *fuzzer* (producing inputs that satisfy the constraints) as well as a *checker* (checking inputs whether they satisfy the given constraints).

Let us illustrate ISLa by example. ISLa comes as a Python package named `isla-solver`

that can be easily installed using `pip`

:

```
$ pip install isla-solver
```

This also installs all dependent packages.

The core of ISLa is the *ISLa Solver* – the component that actually *solves* constraints to produce satisfying inputs.

```
import isla
```

```
from isla.solver import ISLaSolver
```

The constructor of an `ISLASolver`

takes two mandatory arguments.

- The
*grammar*is the grammar the solver should produce inputs from. - The
*constraint*is the constraint the produced inputs should satisfy.

To express a constraint, we have a variety of *functions* and *predicates* at our disposition.
These can be applied to individual elements of the grammar, notably their nonterminals.
The function `str.len()`

, for instance, returns the length of a string.
If we want to have inputs in which the page size has at least 6 digits, we can write:

```
solver = ISLaSolver(CONFIG_GRAMMAR, 'str.len(<pagesize>) >= 6')
```

The method `solve()`

returns the next produced string from the ISLa solver, as a *derivation tree* (seen in the Chapter on fuzzing with grammars).
To convert these into a string, we can use the `str()`

converter:

```
str(solver.solve())
```

The `print()`

method converts its arguments to strings implicityly.
To get, say, the next 10 solutions, we can thus write

```
for _ in range(10):
print(solver.solve())
```

... and we see that, indeed, each page size has exactly six digits.

## Derivation Trees

If you inspect a derivation tree as returned from `solve()`

directly, you will get quite a structure:

```
solution = solver.solve()
solution
```

We can easily visualize the tree, revealing its structure:

```
display_tree(solution)
```

By converting the derivation tree into a string, we get the represented string:

```
str(solution)
```

`print()`

does this implicitly, so `print`

ing the solution gives us the string:

```
print(solution)
```

Unless you want to inspect the derivation tree or access its elements, converting it into a string makes it more manageable.

To express a minimum numeric value, we can use a more elegant way.
The function `str.to_int()`

, for instance, converts a string into an integer.
To obtain a page size that of at least 100000, we can thus also write

```
solver = ISLaSolver(CONFIG_GRAMMAR,
'str.to.int(<pagesize>) >= 100000')
```

```
print(solver.solve())
```

If we want the page size to be in the range of 100 to 200, we can state this as a logical conjunction (using `and`

)

```
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
str.to.int(<pagesize>) >= 100 and
str.to.int(<pagesize>) <= 200
''')
print(solver.solve())
```

And if we want the page size to be a multiple of seven, we can write

```
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
str.to.int(<pagesize>) mod 7 = 0
''')
print(solver.solve())
```

```
from bookutils import quiz
```

Indeed, ISLa constraints can also involve multiple elements. Expressing equality between two elements is easy, and uses a single equal siqn. (There's no assignment in ISLa the `=`

could be confused with.)

```
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
<pagesize> = <bufsize>
''')
print(solver.solve())
```

We can also use numerical constraints, stating that the buffer size should always be exactly one more than the page size:

```
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
str.to.int(<pagesize>) > 1024 and
str.to.int(<bufsize>) = str.to.int(<pagesize>) + 1
''')
print(solver.solve())
```

All the above functions (like `str.to.int()`

) actually stem from the *SMT-LIB* library for *satisfiability modulo theories* (SMT), a standard for expressing constraints for constraint solvers (like ISLa).
The list of all theories defined in SMT-LIB lists dozens of functions and predicates that can be used in ISLa constraints.

## Using SMT-LIB Syntax

Instead of the above "infix" syntax which is familiar to programmers, ISLa also supports full SMT-LIB syntax. Instead of writing $f(x_1, x_2, x_3, \dots)$ for a function $f$ and its arguments $x_1 \dots x_n$, SMT-LIB uses a "prefix" LISP-like syntax in which all functions and operators are written as $(f x_1 x_2 x_3 \dots)$. The above predicate would thus be written as

```
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
(and
(> (str.to.int <pagesize>) 1024)
(= (str.to.int <bufsize>)
(+ (str.to.int <pagesize>) 1)))
''')
print(solver.solve())
```

Note that for boolean operators such as `and`

, we still use the ISLa infix syntax; having ISLa handle these operators is more efficient than passing them on to the constraint solver.

## ISLa on the Command Line¶

When you install `isla-solver`

, you also get an `isla`

command-line tool.
This allows you to create inputs from the command line or shell scripts.

Let us first create a *grammar file* suitable for `isla`

. `isla`

accepts grammars in Fuzzingbook format; they need to define a variable named `grammar`

.

```
with open('grammar.py', 'w') as f:
f.write('grammar = ')
f.write(str(CONFIG_GRAMMAR))
```

```
from bookutils import print_file
```

```
print_file('grammar.py')
```

With this, we can use `isla`

as a grammar fuzzer, plain and simple
By default, `isla solve`

produces one single matching output:

```
!isla solve grammar.py
```

The true power of `isla`

, however, comes to be as we (again) add *constraints* to be solved - either in separate *constraint files* or (easier) directly on the command line:

```
!isla solve grammar.py --constraint '<pagesize> = <bufsize>'
```

```
!isla solve grammar.py \
--constraint '<pagesize> = <bufsize> and str.to.int(<pagesize>) > 10000'
```

The `isla`

executable provides several options and commands, and is a great alternative on the command line.

```
!isla --help
```

## Accessing Elements¶

So far, we have accessed nonterminals simply by referencing their name, as in `<bufsize>`

or `<pagesize>`

.
However, in some cases, this simple method is not sufficient.
In our configuration grammar, for instance, we may want to access (or constrain) `<int>`

elements.
However, we do not want to constrain *all* integers at once, but only those in a particular *context* – say, those that occur as a part of a `<pagesize>`

element, or only those that occur as part of a `<config>`

element.

To this end, ISLa allows to reference *parts* of a given element, using two special operators.

The expression `<a>.<b>`

refers to the *immediate* subpart `<b>`

of some element `<a>`

.
That is, `<b>`

has to appear in one of the expansion rules of `<a>`

.
For instance, `<pagesize>.<int>`

refers to the `<int>`

element of a page size.
Here is an example using the dot operator:

```
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
<pagesize>.<int> = <bufsize>.<int>
''')
print(solver.solve())
```

The expression `<a>..<b>`

, however, refers to *any* subpart `<b>`

of some element `<a>`

. That is, `<b>`

can appear in the expansion of `<a>`

, but also in the expansion of any subelement (and any subelement thereof).
Here is an example using the double dot operator, enforcing *every* digit in a `<config>`

element to be `7`

:

```
from ExpectError import ExpectError
```

```
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
<config>..<digit> = "7" and <config>..<leaddigit> = "7"
''')
print(solver.solve())
```

To reason about dots and double dots, it helps visualizing the string in question as a *derivation tree* discussed in the chapter on grammar-based fuzzing.
The derivation tree of the input

```
pagesize=12
bufsize=34
```

for instance, looks like this:

In this tree, the `.`

syntax refers to *immediate* children. `<bufsize>.<int>`

is the one `<int>`

node that is the immediate descendant of `<bufsize>`

(but not any other `<int>`

node).
In contrast, `<config>..<digit>`

refers to *all* `<digit>`

descendants of the `<config>`

node.

If an element has multiple *immediate* children of the same type, one can use the special `<a>[$n$]`

syntax to access the $n$-th child of type `<a>`

. To access the first child, $n$ is equal to one, not zero, as in the XPath abbreviated syntax. In our configuration grammar, there is no expansion including the same nonterminal more than once, so we do not need this feature.

For a demonstration of indexed dots, consider the following grammar, which produces lines of three "a" or "b" characters:

```
LINES_OF_THREE_AS_OR_BS_GRAMMAR: Grammar = {
'<start>': ['<A>'],
'<A>': ['<B><B><B>', '<B><B><B>\n<A>'],
'<B>': ['a', 'b']
}
```

```
fuzzer = GrammarFuzzer(LINES_OF_THREE_AS_OR_BS_GRAMMAR)
for i in range(5):
print(i)
print(fuzzer.fuzz())
```

We can force, say, the second character in a line to always be a "b:"

```
solver = ISLaSolver(LINES_OF_THREE_AS_OR_BS_GRAMMAR,
'''
<A>.<B>[2] = "b"
''')
for i in range(5):
print(i)
print(solver.solve())
```

## Quantifiers¶

By default, all nonterminals in ISLa constraints are *universally* quantified - that is, any constraint applying to, say, some `<int>`

element applies to *all* `<int>`

elements in the resulting string.
If you only want to constrain *one* element, though, you have to (and can) specify this in ISLa, using an *existential quantifier*.

To use an existential quantifier in ISLa, use the construct

```
exists TYPE VARIABLE in CONTEXT:
(CONSTRAINT)
```

where `VARIABLE`

is some identifier, `TYPE`

is its type (as a nonterminal), and `CONTEXT`

is the context (again a nonterminal) in which the constraint should hold.
`CONSTRAINT`

is again a constraint expression, in which you now can make use of `VARIABLE`

as the element whose existence you assume.

Let us illustrate existential quantification again using a simple example. We want to make sure that at least one integer in our generated string has a value of more than 1000. So we write

```
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
exists <int> i in <start>:
str.to.int(i) > 1000
''')
for _ in range(10):
print(solver.solve())
```

We note that all generated inputs satisfy the constraint of having at least one integer that satisfies the constraint.

Specifying a variable name is optional; if you omit it, you can use the quantified nonterminal instead. The above constraint can thus also be expressed in a more compact fashion:

```
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
exists <int> in <start>:
str.to.int(<int>) > 1000
''')
print(solver.solve())
```

Besides existential quantification, there also is *universal* quantification, using the `forall`

keyword instead of `exists`

.
If we want *all* elements in some context to satisfy a constraint, this comes in handy.

```
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
forall <int> in <start>:
str.to.int(<int>) > 1000
''')
for _ in range(10):
print(solver.solve())
```

We see that all `<int>`

elements satisfy the constraint.

By default, all nonterminals that re used directly in constraints are universally quantified within the `<start>`

symbol, so the above can actually be simplified to

```
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
str.to.int(<int>) > 1000
''')
for _ in range(10):
print(solver.solve())
```

... and you realize that in all our initial constraints, we always had an implicit universal quantification.

## Picking Expansions¶

Sometimes, we'd like a quantifier to apply only for a specific expansion alternative of a nonterminal. The form

```
forall TYPE VARIABLE=PATTERN in CONTEXT:
(CONSTRAINT)
```

means that the CONSTRAINT only applies to a VARIABLE that actually matches the expansion given in PATTERN.
(Again, we can replace `forall`

with `exists`

, and make this an existential quantification rather than a universal quantification.)

Here's an example of using `forall`

:

```
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
forall <int>="<leaddigit><digits>" in <start>:
(<leaddigit> = "7")
''')
```

This ensures that when `<int>`

is expanded to a lead digit followed by more digits, the lead digit becomes `7`

.
The effect is that all `<int>`

values now start with a `7`

digit:

```
str(solver.solve())
```

Likewise, we can constrain `<int>`

as a whole, and thus ensure that all numbers are greater than 100:

```
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
forall <int> in <start>:
(str.to.int(<int>) > 100)
''')
str(solver.solve())
```

By default, all variables are universally quantified in `<start>`

, so the above can also be expressed as

```
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
str.to.int(<int>) > 100
''')
str(solver.solve())
```

## Matching Expansion Elements¶

In a quantification pattern, we can also *name* individual nonterminal elements and use them in our constraints.
This is done by replacing the nonterminal `<ID>`

with the special form `{<ID> VARIABLE}`

(in curly braces) which then makes the variable VARIABLE a placeholder for the value matched by ID; VARIABLE can then be used in constraints.

Here is an example. In the expansion `<leaddigit><int>`

, we want to ensure that the `<leaddigit>`

is always `9`

.
Using the special brace form, we make `lead`

a variable holding the value of `<leaddigit>`

, and can then use it in a constraint:

```
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
forall <int> i="{<leaddigit> lead}<digits>" in <start>:
(lead = "9")
''')
```

This (again) ensures that all lead digits should be `9`

:

```
for _ in range(10):
print(solver.solve())
```

Could we express the above in a simpler fashion? Yes! For one, we can refer to `<leaddigit>`

directly rather than introducing variables like `i`

and `lead`

:

```
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
forall <int>="<leaddigit><digits>" in <start>:
(<leaddigit> = "9")
''')
print(solver.solve())
```

Furthermore, using implicit universal quantification and the dot notation introduced earlier, we could write, for instance

```
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
<int>.<leaddigit> = "9"
''')
print(solver.solve())
```

or just

```
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
<leaddigit> = "9"
''')
print(solver.solve())
```

and obtain the same result (not necessarily the exact same values, though, due to randomness):

But while universal quantification and dot notation are sufficient for many cases, the pattern matching notation is more general and more flexible – even if it may be harder to read.

## Checking Strings¶

Using an `ISLaSolver`

, we can also check if an string satisfies the constraints.
This can be applied to inputs, but also to *outputs*; ISLa constraints can thus server as *oracles* – that is, *predicates* that check a test result.

Let us check if in a given string, `<pagesize>`

and `<bufsize>`

are the same.

```
constraint = '<pagesize> = <bufsize>'
solver = ISLaSolver(CONFIG_GRAMMAR, constraint)
```

To check the tree, we can pass it into the `evaluate()`

method of the `solver`

– and find that the given input does *not* satisfy our constraint.

```
solver.check('pagesize=12\nbufsize=34')
```

If we repeat the above, however, with an input that satisfies the constraint, we obtain a `True`

result.

```
solver.check('pagesize=27\nbufsize=27')
```

Checking constraints is much more efficient than solving them, as ISLa does not have to search for possible solutions.

## Case Studies¶

Let us further illustrate ISLa using a few case studies.

### Matching Identifiers in XML¶

The Extensible Markup Language (XML) is a typical example of an input language that cannot be fully expressed using a context-free grammar.
The problem is not so much expressing the *syntax* of XML – the basics are fairly easy:

```
XML_GRAMMAR: Grammar = {
"<start>": ["<xml-tree>"],
"<xml-tree>": ["<open-tag><xml-content><close-tag>"],
"<open-tag>": ["<<id>>"],
"<close-tag>": ["</<id>>"],
"<xml-content>": ["Text", "<xml-tree>"],
"<id>": ["<letter>", "<id><letter>"],
"<letter>": crange('a', 'z')
}
```

```
assert is_valid_grammar(XML_GRAMMAR)
```

The problem becomes evident when we produce inputs from the grammar: The `<id>`

elements in `<open-tag>`

and `<close-tag>`

do not match.

```
fuzzer = GrammarFuzzer(XML_GRAMMAR)
fuzzer.fuzz()
```

If we want the tag IDs to match, we need to come up with a *finite* set of tags (as in, say, HTML); then we can extend the grammar with one rule for each tag - `<body>...</body>`

, `<p>...</p>`

, `<strong>...</strong>`

, and so on.
For an *infinite* set of tags, though, as in our grammar, expressing that the two tag IDs must match is not possible in a context-free grammar.

With ISLa, however, constraining the grammar is easy.
All we need is the rule that constrains the `<xml-tree>`

:

```
solver = ISLaSolver(XML_GRAMMAR,
'''
<xml-tree>.<open-tag>.<id> = <xml-tree>.<close-tag>.<id>
''', max_number_smt_instantiations=1)
for _ in range(3):
print(solver.solve())
```

and we see that the `<id>`

tags now indeed match each other.

## Solver Configuration Parameters

The configuration parameter `max_number_smt_instantiations`

we passed to the `ISLaSolver`

object above limits the number of calls to ISLa's underlying SMT solver. Generally, higher numbers lead to more inputs generated per time. Many of those will look structurally similar, though. If we aim for structurally diverse inputs and do not care about, e.g., the names of tags, it can make sense to choose a lower value for this parameter. This is what happens with `max_number_smt_instantiations=10`

, which is the current default:

```
solver = ISLaSolver(XML_GRAMMAR,
'''
<xml-tree>.<open-tag>.<id> = <xml-tree>.<close-tag>.<id>
''', max_number_smt_instantiations=10)
for _ in range(3):
print(solver.solve())
```

The parameter `max_number_free_instantiations`

serves a similar purpose: ISla randomly instantiates nonterminal symbols whose values are not restricted by a constraint. It chooses—surprise!—at most `max_number_free_instantiations`

such random instantiations.

Other configuration parameters of interest are `structural_predicates`

and `semantic_predicates`

, which let you extend the ISLa language by passing custom structural and semantic predicates to the solver. You can use all the predicates in these sets inside the ISLa constraint to solve. Per default, the semantic predicate `count(in_tree, NEEDLE, NUM)`

and the following structural predicates are available:

`after(node_1, node_2)`

`before(node_1, node_2)`

`consecutive(node_1, node_2)`

`count(in_tree, NEEDLE, NUM)`

`different_position(node_1, node_2)`

`direct_child(node_1, node_2)`

`inside(node_1, node_2)`

`level(PRED, NONTERMINAL, node_1, node_2)`

`nth(N, node_1, node_2)`

`same_position(node_1, node_2)`

In contrast to the "input generator" solution in the chapter on generators, our constraint-based solution is purely declarative - and can also be used to parse and check inputs. Plus, of course, we can easily add more constraints:

```
solver = ISLaSolver(XML_GRAMMAR,
'''
<xml-tree>.<open-tag>.<id> = <xml-tree>.<close-tag>.<id>
and
str.len(<id>) > 10
''', max_number_smt_instantiations=1)
for _ in range(3):
print(solver.solve())
```

### Definitions and Usages in Programming Languages¶

When testing compilers with generated program code, one often encounters the problem that before *using* an identifier, one has to *declare* it first - specifying its type, some initial value, and more.

This problem is easily illustrated in the following grammar, which produces *sequences of assignments*.
Variable names consist of a single lowercase letter; values can only be digits; assignments are separated by semicolons.

```
LANG_GRAMMAR: Grammar = {
"<start>":
["<stmt>"],
"<stmt>":
["<assgn>", "<assgn>; <stmt>"],
"<assgn>":
["<lhs> := <rhs>"],
"<lhs>":
["<var>"],
"<rhs>":
["<var>", "<digit>"],
"<var>": list(string.ascii_lowercase),
"<digit>": list(string.digits)
}
```

```
assert is_valid_grammar(LANG_GRAMMAR)
```

```
syntax_diagram(LANG_GRAMMAR)
```

Here are some assignment sequences produced by the grammar:

```
fuzzer = GrammarFuzzer(LANG_GRAMMAR)
```

```
for _ in range(10):
print(fuzzer.fuzz())
```

We see that the assignment *syntax* is similar to what we have in common programming languages.
The *semantics*, however, are, well, questionable, as we commonly access variables whose values have not been previously defined.
Again, this is a *semantic* property that cannot be expressed in a context-free grammar alone.

What we need here is a constraint specifying that on the right-hand side of an assignment, we can only have variable names that occur on the left-hand side. In ISLa, we achive this thorugh the following constraint:

```
solver = ISLaSolver(LANG_GRAMMAR,
'''
forall <rhs> in <assgn>:
exists <assgn> declaration:
<rhs>.<var> = declaration.<lhs>.<var>
''',
max_number_smt_instantiations=1,
max_number_free_instantiations=1)
```

```
for _ in range(10):
print(solver.solve())
```

This is much better already, but not perfect yet - we might still have assignments like `a := a`

or `a := b; b := 5`

.
That is because our constraints do not yet take care of *ordering* – in a `<rhs>`

element, we can only use variables that are defined earlier.

For this purpose, ISLa provides a `before()`

predicate: `before(A, B)`

expresses that the element `A`

must occur before the element `B`

.
With `before()`

, we can rewrite our constraint as

```
solver = ISLaSolver(LANG_GRAMMAR,
'''
forall <rhs> in <assgn>:
exists <assgn> declaration:
(before(declaration, <assgn>) and
<rhs>.<var> = declaration.<lhs>.<var>)
''',
max_number_free_instantiations=1,
max_number_smt_instantiations=1)
```

... and thus ensure that on the right-hand-side of assignments, we only use identifiers defined earlier.

```
for _ in range(10):
print(solver.solve())
```

In case you find that the assignment sequences are too short, you can use the ISLa `count()`

predicate.
`count(VARIABLE, NONTERMINAL, N)`

ensures that the number of NONTERMINALs in VARIABLE is exactly N.
To have statements with exactly 5 assignments, write

```
solver = ISLaSolver(LANG_GRAMMAR,
'''
forall <rhs> in <assgn>:
exists <assgn> declaration:
(before(declaration, <assgn>) and
<rhs>.<var> = declaration.<lhs>.<var>)
and
count(start, "<assgn>", "5")
''',
max_number_smt_instantiations=1,
max_number_free_instantiations=1)
```

```
for _ in range(10):
print(solver.solve())
```

## Lessons Learned¶

- Using ISLa, we can add and solve
*constraints*to grammars, allowing to express*semantic properties*of our test inputs - Declaring constraints (and have a solver solve them) is much more versatile than adding generator code, and language-independent, too
- Using ISLa is fun :-)

## Next Steps¶

In the next chapters, we will continue to focus on semantics. Among others, we will learn how to

- mine grammars from existing inputs
- use symbolic fuzzing - that is, using constraint solvers to reach particular locations
- use concolic fuzzing - that is, combining symbolic fuzzing with concrete runs for higher efficiency

## Background¶

- ISLa is presented in the paper "Input Invariants" at ESEC/FSE 2022.
- The ISLa project contains the full source code and a complete reference.

## Exercises¶

### Exercise 1: String Encodings¶

A common way of representing strings is *length-prefixed strings*, a representation made popular by the Pascal programming language.
A length-prefixed string starts with a few bytes that encode the length $L$ of the string, followed by the $L$ actual characters.
For instance, assuming that two bytes are used to encode the length, the string `"Hello"`

could be represented as the sequence

`0x00 0x05 'H' 'e' 'l' 'l' 'o'`

#### Part 1: Syntax¶

Write a grammar that defines the syntax of length-prefixed strings.

#### Part 2: Semantics¶

Use ISLa to produce valid length-prefixed strings. Make use of the SMT-LIB string library to find appropriate conversion functions.

The content of this project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The source code that is part of the content, as well as the source code used to format and display that content is licensed under the MIT License. Last change: 2022-10-31 11:20:51+01:00 • Cite • Imprint