The Fuzzing Book¶

Sitemap¶

While the chapters of this book can be read one after the other, there are many possible paths through the book. In this graph, an arrow A → B means that chapter A is a prerequisite for chapter B. You can pick arbitrary paths in this graph to get to the topics that interest you most:

Table of Contents ¶

Part I: Whetting Your Appetite ¶

In this part, we introduce the topics of the book.

Tours through the Book ¶

This book is massive. With more than 20,000 lines of code and 150,000 words of text, a printed version would cover more than 1,200 pages of text. Obviously, we do not assume that everybody wants to read everything.

Introduction to Software Testing ¶

Before we get to the central parts of the book, let us introduce essential concepts of software testing. Why is it necessary to test software at all? How does one test software? How can one tell whether a test has been successful? How does one know if one has tested enough? In this chapter, let us recall the most important concepts, and at the same time get acquainted with Python and interactive notebooks.

Part II: Lexical Fuzzing ¶

This part introduces test generation at the lexical level, that is, composing sequences of characters.

Fuzzing: Breaking Things with Random Inputs ¶

In this chapter, we'll start with one of the simplest test generation techniques. The key idea of random text generation, also known as fuzzing, is to feed a string of random characters into a program in the hope to uncover failures.

Code Coverage ¶

In the previous chapter, we introduced basic fuzzing – that is, generating random inputs to test programs. How do we measure the effectiveness of these tests? One way would be to check the number (and seriousness) of bugs found; but if bugs are scarce, we need a proxy for the likelihood of a test to uncover a bug. In this chapter, we introduce the concept of code coverage, measuring which parts of a program are actually executed during a test run. Measuring such coverage is also crucial for test generators that attempt to cover as much code as possible.

Mutation-Based Fuzzing ¶

Most randomly generated inputs are syntactically invalid and thus are quickly rejected by the processing program. To exercise functionality beyond input processing, we must increase chances to obtain valid inputs. One such way is so-called mutational fuzzing – that is, introducing small changes to existing inputs that may still keep the input valid, yet exercise new behavior. We show how to create such mutations, and how to guide them towards yet uncovered code, applying central concepts from the popular AFL fuzzer.

Greybox Fuzzing ¶

In the previous chapter, we have introduced mutation-based fuzzing, a technique that generates fuzz inputs by applying small mutations to given inputs. In this chapter, we show how to guide these mutations towards specific goals such as coverage. The algorithms in this chapter stem from the popular American Fuzzy Lop (AFL) fuzzer, in particular from its AFLFast and AFLGo flavors. We will explore the greybox fuzzing algorithm behind AFL and how we can exploit it to solve various problems for automated vulnerability detection.

Search-Based Fuzzing ¶

Sometimes we are not only interested in fuzzing as many as possible diverse program inputs, but in deriving specific test inputs that achieve some objective, such as reaching specific statements in a program. When we have an idea of what we are looking for, then we can search for it. Search algorithms are at the core of computer science, but applying classic search algorithms like breadth or depth first search to search for tests is unrealistic, because these algorithms potentially require us to look at all possible inputs. However, domain-knowledge can be used to overcome this problem. For example, if we can estimate which of several program inputs is closer to the one we are looking for, then this information can guide us to reach the target quicker – this information is known as a heuristic. The way heuristics are applied systematically is captured in meta-heuristic search algorithms. The "meta" denotes that these algorithms are generic and can be instantiated differently to different problems. Meta-heuristics often take inspiration from processes observed in nature. For example, there are algorithms mimicking evolutionary processes, swarm intelligence, or chemical reactions. In general, they are much more efficient than exhaustive search approaches such that they can be applied to vast search spaces – search spaces as vast as the domain of program inputs are no problem for them.

Mutation Analysis ¶

In the chapter on coverage, we showed how one can identify which parts of the program are executed by a program, and hence get a sense of the effectiveness of a set of test cases in covering the program structure. However, coverage alone may not be the best measure for the effectiveness of a test, as one can have great coverage without ever checking a result for correctness. In this chapter, we introduce another means for assessing the effectiveness of a test suite: After injecting mutations – artificial faults – into the code, we check whether a test suite can detect these artificial faults. The idea is that if it fails to detect such mutations, it will also miss real bugs.

Part III: Syntactic Fuzzing ¶

This part introduces test generation at the syntactical level, that is, composing inputs from language structures.

Fuzzing with Grammars ¶

In the chapter on "Mutation-Based Fuzzing", we have seen how to use extra hints – such as sample input files – to speed up test generation. In this chapter, we take this idea one step further, by providing a specification of the legal inputs to a program. Specifying inputs via a grammar allows for very systematic and efficient test generation, in particular for complex input formats. Grammars also serve as the base for configuration fuzzing, API fuzzing, GUI fuzzing, and many more.

Efficient Grammar Fuzzing ¶

In the chapter on grammars, we have seen how to use grammars for very effective and efficient testing. In this chapter, we refine the previous string-based algorithm into a tree-based algorithm, which is much faster and allows for much more control over the production of fuzz inputs.

Grammar Coverage ¶

Producing inputs from grammars gives all possible expansions of a rule the same likelihood. For producing a comprehensive test suite, however, it makes more sense to maximize variety – for instance, by not repeating the same expansions over and over again. In this chapter, we explore how to systematically cover elements of a grammar such that we maximize variety and do not miss out individual elements.

Parsing Inputs ¶

In the chapter on Grammars, we discussed how grammars can be used to represent various languages. We also saw how grammars can be used to generate strings of the corresponding language. Grammars can also perform the reverse. That is, given a string, one can decompose the string into its constituent parts that correspond to the parts of grammar used to generate it – the derivation tree of that string. These parts (and parts from other similar strings) can later be recombined using the same grammar to produce new strings.

Probabilistic Grammar Fuzzing ¶

Let us give grammars even more power by assigning probabilities to individual expansions. This allows us to control how many of each element should be produced, and thus allows us to target our generated tests towards specific functionality. We also show how to learn such probabilities from given sample inputs, and specifically direct our tests towards input features that are uncommon in these samples.

Fuzzing with Generators ¶

In this chapter, we show how to extend grammars with functions – pieces of code that get executed during grammar expansion, and that can generate, check, or change elements produced. Adding functions to a grammar allows for very versatile test generation, bringing together the best of grammar generation and programming.

Greybox Fuzzing with Grammars ¶

In this chapter, we introduce important extensions to our syntactic fuzzing techniques, all leveraging syntactic parts of existing inputs.

Reducing Failure-Inducing Inputs ¶

By construction, fuzzers create inputs that may be hard to read. This causes issues during debugging, when a human has to analyze the exact cause of the failure. In this chapter, we present techniques that automatically reduce and simplify failure-inducing inputs to a minimum in order to ease debugging.

Part IV: Semantic Fuzzing ¶

This part introduces test generation techniques that take the semantics of the input into account, notably the behavior of the program that processes the input.

Fuzzing with Constraints ¶

In previous chapters, we have seen how Grammar-Based Fuzzing allows us to efficiently generate myriads of syntactically valid inputs. However, there are semantic input features that cannot be expressed in a context-free grammar, such as

Mining Input Grammars ¶

So far, the grammars we have seen have been mostly specified manually – that is, you (or the person knowing the input format) had to design and write a grammar in the first place. While the grammars we have seen so far have been rather simple, creating a grammar for complex inputs can involve quite some effort. In this chapter, we therefore introduce techniques that automatically mine grammars from programs – by executing the programs and observing how they process which parts of the input. In conjunction with a grammar fuzzer, this allows us to

take a program,
extract its input grammar, and
fuzz it with high efficiency and effectiveness, using the concepts in this book. #### Tracking Information Flow

We have explored how one could generate better inputs that can penetrate deeper into the program in question. While doing so, we have relied on program crashes to tell us that we have succeeded in finding problems in the program. However, that is rather simplistic. What if the behavior of the program is simply incorrect, but does not lead to a crash? Can one do better?

Concolic Fuzzing ¶

In the chapter on information flow, we have seen how one can use dynamic taints to produce more intelligent test cases than simply looking for program crashes. We have also seen how one can use the taints to update the grammar, and hence focus more on the dangerous methods.

Symbolic Fuzzing ¶

One of the problems with traditional methods of fuzzing is that they fail to exercise all the possible behaviors that a system can have, especially when the input space is large. Quite often the execution of a specific branch of execution may happen only with very specific inputs, which could represent a minimal fraction of the input space. The traditional fuzzing methods relies on chance to produce inputs they need. However, relying on randomness to generate values that we want is a bad idea when the space to be explored is huge. For example, a function that accepts a string, even if one only considers the first $10$ characters, already has $2^{80}$ possible inputs. If one is looking for a specific string, random generation of values will take a few thousand years even in one of the super computers.

Mining Function Specifications ¶

When testing a program, one not only needs to cover its several behaviors; one also needs to check whether the result is as expected. In this chapter, we introduce a technique that allows us to mine function specifications from a set of given executions, resulting in abstract and formal descriptions of what the function expects and what it delivers.

Part V: Domain-Specific Fuzzing ¶

This part discusses test generation for a number of specific domains. For all these domains, we introduce fuzzers that generate inputs as well as miners that analyze the input structure.

Testing Configurations ¶

The behavior of a program is not only governed by its data. The configuration of a program – that is, the settings that govern the execution of a program on its (regular) input data, as set by options or configuration files – just as well influences behavior, and thus can and should be tested. In this chapter, we explore how to systematically test and cover software configurations. By automatically inferring configuration options, we can apply these techniques out of the box, with no need for writing a grammar. Finally, we show how to systematically cover combinations of configuration options, quickly detecting unwanted interferences.

Fuzzing APIs ¶

So far, we have always generated system input, i.e. data that the program as a whole obtains via its input channels. However, we can also generate inputs that go directly into individual functions, gaining flexibility and speed in the process. In this chapter, we explore the use of grammars to synthesize code for function calls, which allows you to generate program code that very efficiently invokes functions directly.

Carving Unit Tests ¶

So far, we have always generated system input, i.e. data that the program as a whole obtains via its input channels. If we are interested in testing only a small set of functions, having to go through the system can be very inefficient. This chapter introduces a technique known as carving, which, given a system test, automatically extracts a set of unit tests that replicate the calls seen during the system test. The key idea is to record such calls such that we can replay them later – as a whole or selectively. On top, we also explore how to synthesize API grammars from carved unit tests; this means that we can synthesize API tests without having to write a grammar at all.

Testing Compilers ¶

In this chapter, we will make use of grammars and grammar-based testing to systematically generate program code – for instance, to test a compiler or an interpreter. Not very surprisingly, we use Python and the Python interpreter as our domain.

Testing Web Applications ¶

In this chapter, we explore how to generate tests for Graphical User Interfaces (GUIs), notably on Web interfaces. We set up a (vulnerable) Web server and demonstrate how to systematically explore its behavior – first with handwritten grammars, then with grammars automatically inferred from the user interface. We also show how to conduct systematic attacks on these servers, notably with code and SQL injection.

Testing Graphical User Interfaces ¶

In this chapter, we explore how to generate tests for Graphical User Interfaces (GUIs), abstracting from our previous examples on Web testing. Building on general means to extract user interface elements and activate them, our techniques generalize to arbitrary graphical user interfaces, from rich Web applications to mobile apps, and systematically explore user interfaces through forms and navigation elements.

Part VI: Managing Fuzzing ¶

This part discusses how to manage fuzzing in the large.

Fuzzing in the Large ¶

In the past chapters, we have always looked at fuzzing taking place on one machine for a few seconds only. In the real world, however, fuzzers are run on dozens or even thousands of machines; for hours, days and weeks; for one program or dozens of programs. In such contexts, one needs an infrastructure to collect failure data from the individual fuzzer runs, and to aggregate such data in a central repository. In this chapter, we will examine such an infrastructure, the FuzzManager framework from Mozilla.

When To Stop Fuzzing ¶

In the past chapters, we have discussed several fuzzing techniques. Knowing what to do is important, but it is also important to know when to stop doing things. In this chapter, we will learn when to stop fuzzing – and use a prominent example for this purpose: The Enigma machine that was used in the second world war by the navy of Nazi Germany to encrypt communications, and how Alan Turing and I.J. Good used fuzzing techniques to crack ciphers for the Naval Enigma machine.

Appendices ¶

This part holds notebooks and modules that support other notebooks.

Academic Prototyping ¶

This is the manuscript of Andreas Zeller's tutorial "Academic Prototyping" at the ESEC/FSE 2022 conference.

Prototyping with Python ¶

This is the manuscript of Andreas Zeller's keynote "Coding Effective Testing Tools Within Minutes" at the TAIC PART 2020 conference.

Error Handling ¶

The code in this notebook helps with handling errors. Normally, an error in notebook code causes the execution of the code to stop; while an infinite loop in notebook code causes the notebook to run without end. This notebook provides two classes to help address these concerns.

The Fuzzing Book¶

Sitemap¶

How to Cite this Work