Before we get to the central parts of the book, let us introduce essential concepts of software testing. Why is it necessary to test software at all? How does one test software? How can one tell whether a test has been successful? How does one know if one has tested enough? In this chapter, let us recall the most important concepts, and at the same time get acquainted with Python and interactive notebooks.

Let us start with a simple example. Your co-worker has been asked to implement a square root function $\sqrt{x}$. (Let's assume for a moment that the environment does not already have one.) After studying the Newton–Raphson method, she comes up with the following Python code, claiming that, in fact, this `my_sqrt()`

function computes square roots.

In [2]:

```
def my_sqrt(x):
"""Computes the square root of x, using the Newton-Raphson method"""
approx = None
guess = x / 2
while approx != guess:
approx = guess
guess = (approx + x / approx) / 2
return approx
```

Your job is now to find out whether this function actually does what it claims to do.

`my_sqrt()`

works correctly, we can *test* it with a few values. For `x = 4`

, for instance, it produces the correct value:

In [3]:

```
my_sqrt(4)
```

Out[3]:

2.0

`my_sqrt(4)`

(a so-called *cell*) is an input to the Python interpreter, which by default *evaluates* it. The lower part (`2.0`

) is its output. We can see that `my_sqrt(4)`

produces the correct value.

The same holds for `x = 2.0`

, apparently, too:

In [4]:

```
my_sqrt(2)
```

Out[4]:

1.414213562373095

If you are reading this in the interactive notebook, you can try out `my_sqrt()`

with other values as well. Click on one of the above cells with invocations of `my_sqrt()`

and change the value – say, to `my_sqrt(1)`

. Press `Shift+Enter`

(or click on the play symbol) to execute it and see the result. If you get an error message, go to the above cell with the definition of `my_sqrt()`

and execute this first. You can also run *all* cells at once; see the Notebook menu for details. (You can actually also change the text by clicking on it, and corect mistaks such as in this sentence.)

In [6]:

```
quiz("What does `my_sqrt(16)` produce?",
[
"4",
"4.0",
"3.99999",
"None"
], "len('four') - len('two') + 1")
```

Out[6]:

What does

`my_sqrt(16)`

produce?

Try it out for yourself by uncommenting and executing the following line:

In [7]:

```
# my_sqrt(16)
```

`Run all cells above`

from the menu to ensure all definitions are set.

*restart the kernel* (i.e. start the Python interpreter from scratch) to get rid of older, superfluous definitions.

`my_sqrt()`

operates, a simple strategy is to insert `print()`

statements in critical places. You can, for instance, log the value of `approx`

, to see how each loop iteration gets closer to the actual value:

In [8]:

```
def my_sqrt_with_log(x):
"""Computes the square root of x, using the Newton–Raphson method"""
approx = None
guess = x / 2
while approx != guess:
print("approx =", approx) # <-- New
approx = guess
guess = (approx + x / approx) / 2
return approx
```

In [9]:

```
my_sqrt_with_log(9)
```

approx = None approx = 4.5 approx = 3.25 approx = 3.0096153846153846 approx = 3.000015360039322 approx = 3.0000000000393214

Out[9]:

3.0

*debugger* – insert a "magic line" `%%debug`

at the top of a cell and see what happens. Unfortunately, interactive debuggers interfere with our dynamic analysis techniques, so we mostly use logging and assertions for debugging.

`my_sqrt(2)`

actually correct? We can easily verify by exploiting that $\sqrt{x}$ squared again has to be $x$, or in other words $\sqrt{x} \times \sqrt{x} = x$. Let's take a look:

In [10]:

```
my_sqrt(2) * my_sqrt(2)
```

Out[10]:

1.9999999999999996

Okay, we do have some rounding error, but otherwise, this seems just fine.

*tested* the above program: We have *executed* it on a given input and *checked* its result whether it is correct or not. Such a test is the bare minimum of quality assurance before a program goes into production.

So far, we have tested the above program *manually*, that is, running it by hand and checking its results by hand. This is a very flexible way of testing, but in the long run, it is rather inefficient:

- Manually, you can only check a very limited number of executions and their results
- After any change to the program, you have to repeat the testing process

This is why it is very useful to *automate* tests. One simple way of doing so is to let the computer first do the computation, and then have it check the results.

For instance, this piece of code automatically tests whether $\sqrt{4} = 2$ holds:

In [11]:

```
result = my_sqrt(4)
expected_result = 2.0
if result == expected_result:
print("Test passed")
else:
print("Test failed")
```

Test passed

The nice thing about this test is that we can run it again and again, thus ensuring that at least the square root of 4 is computed correctly. But there are still a number of issues, though:

- We need
*five lines of code*for a single test - We do not care for rounding errors
- We only check a single input (and a single result)

Let us address these issues one by one. First, let's make the test a bit more compact. Almost all programming languages do have a means to automatically check whether a condition holds, and stop execution if it does not. This is called an *assertion*, and it is immensely useful for testing.

In Python, the `assert`

statement takes a condition, and if the condition is true, nothing happens. (If everything works as it should, you should not be bothered.) If the condition evaluates to false, though, `assert`

raises an exception, indicating that a test just failed.

In our example, we can use `assert`

to easily check whether `my_sqrt()`

yields the expected result as above:

In [12]:

```
assert my_sqrt(4) == 2
```

`epsilon`

. This is how we can do it:

In [13]:

```
EPSILON = 1e-8
```

In [14]:

```
assert abs(my_sqrt(4) - 2) < EPSILON
```

In [15]:

```
def assertEquals(x, y, epsilon=1e-8):
assert abs(x - y) < epsilon
```

In [16]:

```
assertEquals(my_sqrt(4), 2)
assertEquals(my_sqrt(9), 3)
assertEquals(my_sqrt(100), 10)
```

(Hint: a true Python programmer would use the function `math.isclose()`

instead.)

Remember that the property $\sqrt{x} \times \sqrt{x} = x$ universally holds? We can also explicitly test this with a few values:

In [17]:

```
assertEquals(my_sqrt(2) * my_sqrt(2), 2)
assertEquals(my_sqrt(3) * my_sqrt(3), 3)
assertEquals(my_sqrt(42.11) * my_sqrt(42.11), 42.11)
```

In [18]:

```
for n in range(1, 1000):
assertEquals(my_sqrt(n) * my_sqrt(n), n)
```

How much time does it take to test `my_sqrt()`

with 100 values? Let's see.

`Timer`

module to measure elapsed time. To be able to use `Timer`

, we first import our own utility module, which allows us to import other notebooks.

In [20]:

```
from Timer import Timer
```

In [21]:

```
with Timer() as t:
for n in range(1, 10000):
assertEquals(my_sqrt(n) * my_sqrt(n), n)
print(t.elapsed_time())
```

0.01560891700501088

`my_sqrt()`

takes 1/1000000 second, or about 1 microseconds.

`random.random()`

function returns a random value between 0.0 and 1.0:

In [22]:

```
import random
```

In [23]:

```
with Timer() as t:
for i in range(10000):
x = 1 + random.random() * 1000000
assertEquals(my_sqrt(x) * my_sqrt(x), x)
print(t.elapsed_time())
```

0.01828445799765177

`my_sqrt()`

, each time reinforcing our confidence that `my_sqrt()`

works as it should. Note, though, that while a random function is *unbiased* in producing random values, it is unlikely to generate special values that drastically alter program behavior. We will discuss this later below.

Instead of writing and running tests for `my_sqrt()`

, we can also go and *integrate the check right into the implementation.* This way, *each and every* invocation of `my_sqrt()`

will be automatically checked.

Such an *automatic run-time check* is very easy to implement:

In [24]:

```
def my_sqrt_checked(x):
root = my_sqrt(x)
assertEquals(root * root, x)
return root
```

Now, whenever we compute a root with `my_sqrt_checked()`

$\dots$

In [25]:

```
my_sqrt_checked(2.0)
```

Out[25]:

1.414213562373095

... we already know that the result is correct, and will so for every new successful computation.

Automatic run-time checks, as above, assume two things, though:

One has to be able to

*formulate*such run-time checks. Having concrete values to check against should always be possible, but formulating desired properties in an abstract fashion can be very complex. In practice, you need to decide which properties are most crucial, and design appropriate checks for them. Plus, run-time checks may depend not only on local properties, but on several properties of the program state, which all have to be identified.One has to be able to

*afford*such run-time checks. In the case of`my_sqrt()`

, the check is not very expensive; but if we have to check, say, a large data structure even after a simple operation, the cost of the check may soon be prohibitive. In practice, run-time checks will typically be disabled during production, trading reliability for efficiency. On the other hand, a comprehensive suite of run-time checks is a great way to find errors and quickly debug them; you need to decide how many such capabilities you would still want during production.

In [26]:

```
quiz("Does run-time checking give a guarantee "
"that there will always be a correct result?",
[
"Yes",
"No",
], "1 ** 1 + 1 ** 1")
```

Out[26]:

Does run-time checking give a guarantee that there will always be a correct result?

*only if there is a result* to be checked - that is, they do *not* guarantee that there always will be one. This is an important limitation compared to *symbolic verification techniques* and program proofs, which can also guarantee that there is a result – at a much higher (often manual) effort, though.

`my_sqrt()`

available to other programmers, who may then embed it in their code. At some point, it will have to process input that comes from *third parties*, i.e. is not under control by the programmer.

*system input* by assuming a *program* `sqrt_program()`

whose input is a string under third-party control:

In [27]:

```
def sqrt_program(arg: str) -> None: # type: ignore
x = int(arg)
print('The root of', x, 'is', my_sqrt(x))
```

We assume that `sqrt_program`

is a program which accepts system input from the command line, as in

```
$ sqrt_program 4
2
```

We can easily invoke `sqrt_program()`

with some system input:

In [28]:

```
sqrt_program("4")
```

The root of 4 is 2.0

`sqrt_program(-1)`

, for instance. What happens?

`my_sqrt()`

with a negative number, it enters an infinite loop. For technical reasons, we cannot have infinite loops in this chapter (unless we'd want the code to run forever); so we use a special `with ExpectTimeOut(1)`

construct to interrupt execution after one second.

In [29]:

```
from ExpectError import ExpectTimeout
```

In [30]:

```
with ExpectTimeout(1):
sqrt_program("-1")
```

The above message is an *error message*, indicating that something went wrong. It lists the *call stack* of functions and lines that were active at the time of the error. The line at the very bottom is the line last executed; the lines above represent function invocations – in our case, up to `my_sqrt(x)`

.

We don't want our code terminating with an exception. Consequently, when accepting external input, we must ensure that it is properly validated. We may write, for instance:

In [31]:

```
def sqrt_program(arg: str) -> None: # type: ignore
x = int(arg)
if x < 0:
print("Illegal Input")
else:
print('The root of', x, 'is', my_sqrt(x))
```

and then we can be sure that `my_sqrt()`

is only invoked according to its specification.

In [32]:

```
sqrt_program("-1")
```

Illegal Input

But wait! What happens if `sqrt_program()`

is not invoked with a number?

In [33]:

```
quiz("What is the result of `sqrt_program('xyzzy')`?",
[
"0",
"0.0",
"`None`",
"An exception"
], "16 ** 0.5")
```

Out[33]:

What is the result of

`sqrt_program('xyzzy')`

?

In [34]:

```
from ExpectError import ExpectError
```

In [35]:

```
with ExpectError():
sqrt_program("xyzzy")
```

Here's a version which also checks for bad inputs:

In [36]:

```
def sqrt_program(arg: str) -> None: # type: ignore
try:
x = float(arg)
except ValueError:
print("Illegal Input")
else:
if x < 0:
print("Illegal Number")
else:
print('The root of', x, 'is', my_sqrt(x))
```

In [37]:

```
sqrt_program("4")
```

The root of 4.0 is 2.0

In [38]:

```
sqrt_program("-1")
```

Illegal Number

In [39]:

```
sqrt_program("xyzzy")
```

Illegal Input

*benefit* when generating software tests: If a program can handle any kind of input (possibly with well-defined error messages), we can also *send it any kind of input*. When calling a function with generated values, though, we have to *know* its precise preconditions.

Despite our best efforts in testing, keep in mind that you are always checking functionality for a *finite* set of inputs. Thus, there may always be *untested* inputs for which the function may still fail.

In the case of `my_sqrt()`

, for instance, computing $\sqrt{0}$ results in a division by zero:

In [40]:

```
with ExpectError():
root = my_sqrt(0)
```

`x`

and handling the special case `x = 0`

:

In [41]:

```
def my_sqrt_fixed(x):
assert 0 <= x
if x == 0:
return 0
return my_sqrt(x)
```

With this, we can now correctly compute $\sqrt{0} = 0$:

In [42]:

```
assert my_sqrt_fixed(0) == 0
```

Illegal values now result in an exception:

In [43]:

```
with ExpectError():
root = my_sqrt_fixed(-1)
```

*if* it produces a result, the result will be correct; but there is no guarantee that future executions may not lead to a failing check. As I am writing this, I *believe* that `my_sqrt_fixed(x)`

is a correct implementation of $\sqrt{x}$ for all finite numbers $x$, but I cannot be certain.

With the Newton-Raphson method, we may still have a good chance of actually *proving* that the implementation is correct: The implementation is simple, the math is well-understood. Alas, this is only the case for few domains. If we do not want to go into full-fledged correctness proofs, our best chance with testing is to

- Test the program on several, well-chosen inputs; and
- Check results extensively and automatically.

This is what we do in the remainder of this course: Devise techniques that help us to thoroughly test a program, as well as techniques that help us check its state for correctness. Enjoy!

- The aim of testing is to execute a program such that we find bugs.
- Test execution, test generation, and checking test results can be automated.
- Testing is
*incomplete*; it provides no 100% guarantee that the code is free of errors.

There is a large number of works on software testing and analysis.

An all-new modern, comprehensive, and online textbook on testing is "Effective Software Testing: A Developer's Guide" \cite{Aniche2022}. Much recommended!

For this book, we are also happy to recommend "Software Testing and Analysis" \cite{Pezze2008} as an introduction to the field; its strong technical focus very well fits our methodology.

Other important must-reads with a comprehensive approach to software testing, including psychology and organization, include "The Art of Software Testing" \cite{Myers2004} as well as "Software Testing Techniques" \cite{Beizer1990}.

Your first exercise in this book is to get acquainted with notebooks and Python, such that you can run the code examples in the book – and try out your own. Here are a few tasks to get you started.

The easiest way to get access to the code is to run them in your browser.

- From the Web Page, check out the menu at the top. Select
`Resources`

$\rightarrow$`Edit as Notebook`

. - After a short waiting time, this will open a Jupyter Notebook right within your browser, containing the current chapter as a notebook.
- You can again scroll through the material, but you click on any code example to edit and run its code (by entering
`Shift`+`Return`). You can edit the examples as you please. - Note that code examples typically depend on earlier code, so be sure to run the preceding code first.
- Any changes you make will not be saved (unless you save your notebook to disk).

For help on Jupyter Notebooks, from the Web Page, check out the `Help`

menu.

This is useful if you want to make greater changes, but do not want to work with Jupyter.

- From the Web Page, check out the menu at the top. Select
`Resources`

$\rightarrow$`Download Code`

. - This will download the Python code of the chapter as a single Python .py file, which you can save to your computer.
- You can then open the file, edit it, and run it in your favorite Python environment to re-run the examples.
- Most importantly, you can import it into your own code and reuse functions, classes, and other resources.

For help on Python, from the Web Page, check out the `Help`

menu.

This is useful if you want to work with Jupyter on your machine. This will allow you to also run more complex examples, such as those with graphical output.

- From the Web Page, check out the menu at the top. Select
`Resources`

$\rightarrow$`All Notebooks`

. - This will download all Jupyter Notebooks as a collection of
`.ipynb`

files, which you can save to your computer. - You can then open the notebooks in Jupyter Notebook or Jupyter Lab, edit them, and run them. To navigate across notebooks, open the notebook
`00_Table_of_Contents.ipynb`

. - You can also download individual notebooks using Select
`Resources`

$\rightarrow$`Download Notebook`

. Running these, however, will require that you have the other notebooks downloaded already.

For help on Jupyter Notebooks, from the Web Page, check out the `Help`

menu.

This is useful if you want to contribute to the book with patches or other material. It also gives you access to the very latest version of the book.

- From the Web Page, check out the menu at the top. Select
`Resources`

$\rightarrow$`GitHub Repo`

. - This will get you to the GitHub repository which contains all sources of the book, including the latest notebooks.
- You can then
*clone*this repository to your disk, such that you get the latest and greatest. - You can report issues and suggest pull requests on the GitHub page.
- Updating the repository with
`git pull`

will get you updated.

If you want to contribute code or text, check out the Guide for Authors.

In [44]:

```
def shellsort(elems):
sorted_elems = elems.copy()
gaps = [701, 301, 132, 57, 23, 10, 4, 1]
for gap in gaps:
for i in range(gap, len(sorted_elems)):
temp = sorted_elems[i]
j = i
while j >= gap and sorted_elems[j - gap] > temp:
sorted_elems[j] = sorted_elems[j - gap]
j -= gap
sorted_elems[j] = temp
return sorted_elems
```

A first test indicates that `shellsort()`

might actually work:

In [45]:

```
shellsort([3, 2, 1])
```

Out[45]:

[1, 2, 3]

*list* as argument `elems`

(which it copies into `sorted_elems`

) as well as for the fixed list `gaps`

. Lists work like *arrays* in other languages:

In [46]:

```
a = [5, 6, 99, 7]
print("First element:", a[0], "length:", len(a))
```

First element: 5 length: 4

`range()`

function returns an iterable list of elements. It is often used in conjunction with `for`

loops, as in the above implementation.

In [47]:

```
for x in range(1, 5):
print(x)
```

1 2 3 4

Your job is now to thoroughly test `shellsort()`

with a variety of inputs.

`assert`

statements with a number of manually written test cases. Select your test cases such that extreme cases are covered. Use `==`

to compare two lists.

`shellsort()`

. Make use of the following helper predicates to check whether the result is (a) sorted, and (b) a permutation of the original.

In [51]:

```
def is_sorted(elems):
return all(elems[i] <= elems[i + 1] for i in range(len(elems) - 1))
```

In [52]:

```
is_sorted([3, 5, 9])
```

Out[52]:

True

In [53]:

```
def is_permutation(a, b):
return len(a) == len(b) and all(a.count(elem) == b.count(elem) for elem in a)
```

In [54]:

```
is_permutation([3, 2, 1], [1, 3, 2])
```

Out[54]:

True

`[]`

as the empty list and `elems.append(x)`

to append an element `x`

to the list `elems`

. Use the above helper functions to assess the results. Generate and test 1,000 lists.

In [61]:

```
def quadratic_solver(a, b, c):
q = b * b - 4 * a * c
solution_1 = (-b + my_sqrt_fixed(q)) / (2 * a)
solution_2 = (-b - my_sqrt_fixed(q)) / (2 * a)
return (solution_1, solution_2)
```

In [62]:

```
quadratic_solver(3, 4, 1)
```

Out[62]:

(-0.3333333333333333, -1.0)

The above implementation is incomplete, though. You can trigger

- a division by zero; and
- violate the precondition of
`my_sqrt_fixed()`

.

How does one do that, and how can one prevent this?

For each of the two cases above, identify values for `a`

, `b`

, `c`

that trigger the bug.

Extend the code appropriately such that the cases are handled. Return `None`

for nonexistent values.

What are the chances of discovering these conditions with random inputs? Assuming one can do a billion tests per second, how long would one have to wait on average until a bug gets triggered?

`my_sqrt_fixed(x)`

works for all *finite* numbers $x$: What happens if you set $x$ to $\infty$ (infinity)? Try this out!