Testing Compilers

In this chapter, we will make use of grammars and grammar-based testing to systematically generate program code – for instance, to test a compiler or an interpreter. Not very surprisingly, we use Python and the Python interpreter as our domain.

We chose Python not only because the rest of the book is also based on Python. Most importantly, Python brings lots of built-in infrastructure we can leverage, especially

  • parsers that convert Python code into an abstract syntax tree (AST) representation and
  • unparsers that take an AST and convert it back into Python code.

This allows us to leverage grammars that operate on ASTs rather than concrete syntax, greatly reducing complexity.

from bookutils import YouTubeVideo
YouTubeVideo('Nr1xbKj_WRQ')

Prerequisites

  • You must read the chapter on Fuzzing with Grammars to understand how grammars and grammar-based testing work.

Synopsis

To use the code provided in this chapter, write

>>> from fuzzingbook.PythonFuzzer import <identifier>

and then make use of the following features.

This chapter provides a PythonFuzzer class that allows producing arbitrary Python code elements:

>>> fuzzer = PythonFuzzer()
>>> print(fuzzer.fuzz())
def Hw(): # type: 
    pass

By default, PythonFuzzer produces a function definition – that is, a list of statements as above. You can pass a start_symbol argument to state which Python element you'd like to have:

>>> fuzzer = PythonFuzzer('<While>')
>>> print(fuzzer.fuzz())
while 
'' or {Z: *(set() ^ set())}:
    del

Here is a list of all possible start symbols. Their names reflect the nonterminals from the Python ast module documentation.

>>> sorted(list(PYTHON_AST_GRAMMAR.keys()))
['<Assert>',
 '<Assign>',
 '<Attribute>',
 '<AugAssign>',
 '<BinOp>',
 '<BoolOp>',
 '<Break>',
 '<Call>',
 '<Compare>',
 '<Constant>',
 '<Continue>',
 '<Delete>',
 '<Dict>',
 '<EmptySet>',
 '<Expr>',
 '<For>',
 '<FunctionDef>',
 '<If>',
 '<List>',
 '<Module>',
 '<Name>',
 '<Pass>',
 '<Return>',
 '<Set>',
 '<Slice>',
 '<Starred>',
 '<Subscript>',
 '<Tuple>',
 '<UnaryOp>',
 '<While>',
 '<With>',
 '<arg>',
 '<arg_list>',
 '<args>',
 '<arguments>',
 '<bool>',
 '<boolop>',
 '<cmpop>',
 '<cmpop_list>',
 '<cmpops>',
 '<digit>',
 '<digits>',
 '<expr>',
 '<expr_list>',
 '<exprs>',
 '<float>',
 '<func>',
 '<id>',
 '<id_continue>',
 '<id_start>',
 '<identifier>',
 '<integer>',
 '<keyword>',
 '<keyword_list>',
 '<keywords>',
 '<kwarg>',
 '<lhs_Attribute>',
 '<lhs_List>',
 '<lhs_Name>',
 '<lhs_Starred>',
 '<lhs_Subscript>',
 '<lhs_Tuple>',
 '<lhs_expr>',
 '<lhs_exprs>',
 '<literal>',
 '<mod>',
 '<none>',
 '<nonempty_expr_list>',
 '<nonempty_lhs_expr_list>',
 '<nonempty_stmt_list>',
 '<nonzerodigit>',
 '<not_double_quotes>',
 '<not_single_quotes>',
 '<operator>',
 '<returns>',
 '<start>',
 '<stmt>',
 '<stmt_list>',
 '<stmts>',
 '<string>',
 '<type_comment>',
 '<type_ignore>',
 '<type_ignore_list>',
 '<type_ignores>',
 '<unaryop>',
 '<vararg>',
 '<withitem>',
 '<withitem_list>',
 '<withitems>']

If you'd like more control over Python code generation, here is what is happening behind the scenes. The EBNF grammar PYTHON_AST_GRAMMAR can parse and produce abstract syntax trees for Python. To produce a Python module without PythonFuzzer, you would take these steps:

Step 1: Create a non-EBNF grammar suitable for ISLaSolver (or any other grammar fuzzer):

>>> python_ast_grammar = convert_ebnf_grammar(PYTHON_AST_GRAMMAR)

Step 2: Feed the resulting grammar into a grammar fuzzer such as ISLa:

>>> solver = ISLaSolver(python_ast_grammar, start_symbol='<FunctionDef>')

Step 3: Have the grammar fuzzer produce a string. This string represents an AST.

>>> ast_string = str(solver.solve())
>>> ast_string
'FunctionDef(name=\'t\', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Break()], decorator_list=[], returns=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), type_comment=\'\')'

Step 4: Convert the AST into an actual Python AST data structure.

>>> from ast import *
>>> abstract_syntax_tree = eval(ast_string)

Step 5: Finally, convert the AST structure back into readable Python code:

>>> ast.fix_missing_locations(abstract_syntax_tree)
>>> print(ast.unparse(abstract_syntax_tree))
def t() -> set(): # type: 
    break

The chapter has many more applications, including parsing and mutating Python code, evolutionary fuzzing, and more.

Here are the details on the PythonFuzzer constructor:

PythonFuzzer(self, start_symbol: Optional[str] = None, *, grammar: Optional[Dict[str, List[Union[str, Tuple[str, Dict[str, Any]]]]]] = None, constraint: Optional[str] = None, **kw_params) -> None

Produce Python code. Parameters are:

  • start_symbol: The grammatical entity to be generated (default: <FunctionDef>)
  • grammar: The EBNF grammar to be used (default: PYTHON__AST_GRAMMAR); and
  • constraint an ISLa constraint (if any).

Additional keyword parameters are passed to the ISLaSolver superclass.

PythonFuzzer PythonFuzzer __init__() fuzz() ISLaSolver ISLaSolver __init__() PythonFuzzer->ISLaSolver Legend Legend •  public_method() •  private_method() •  overloaded_method() Hover over names to see doc

A Grammar for Concrete Code

To produce code, it is fairly easy to write a grammar with concrete syntax. If we want to produce, say, arithmetic expressions, we can easily create a concrete grammar which does precisely that.

from Grammars import Grammar
from Grammars import is_valid_grammar, convert_ebnf_grammar, extend_grammar, trim_grammar
from typing import Optional

We use the Fuzzingbook format for grammars, in which grammars are represented as dictionaries from symbols to lists of expansion alternatives.

EXPR_GRAMMAR: Grammar = {
    "<start>":
        ["<expr>"],

    "<expr>":
        ["<term> + <expr>", "<term> - <expr>", "<term>"],

    "<term>":
        ["<factor> * <term>", "<factor> / <term>", "<factor>"],

    "<factor>":
        ["+<factor>",
         "-<factor>",
         "(<expr>)",
         "<integer>.<integer>",
         "<integer>"],

    "<integer>":
        ["<digit><integer>", "<digit>"],

    "<digit>":
        ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
}
assert is_valid_grammar(EXPR_GRAMMAR)

We can use this grammar to produce syntactically valid arithmetic expressions. We use the ISLa solver as our generator, as it is the most powerful; but we could also use any other of our grammar fuzzers such as GrammarFuzzer at this point.

from isla.solver import ISLaSolver

Here are some concrete inputs produced from the grammar:

expr_solver = ISLaSolver(EXPR_GRAMMAR)
for _ in range(10):
    print(expr_solver.solve())
4.3 + 512 / -(7 / 6 - 0 / 9 * 1 * 1) * +8.3 / 7 * 4 / 6
(4 / 7 + 1) / (4) / 9 / 8 + 4 / (3 + 6 - 7)
+--(--(-9) * (4 * 7 + (4) + 4) + --(+(3)) - 6 + 0 / 7 + 7)
(2 * 6 + 0 - 5) * 4 - +1 * (2 - 2) / 8 / 6
(+-(0 - (1) * 7 / 3)) / ((1 * 3 + 8) + 9 - +1 / --0) - 5 * (-+939.491)
+2.9 * 0 / 501.19814 / --+--(6.05002)
+-8.8 / (1) * -+1 + -8 + 9 - 3 / 8 * 6 + 4 * 3 * 5
(+(8 / 9 - 1 - 7)) + ---06.30 / +4.39
8786.82 - +01.170 / 9.2 - +(7) + 1 * 9 - 0
+-6 * 0 / 5 * (-(1.7 * +(-1 / +4.9 * 5 * 1 * 2) + -4.2 + (6 + -5) / (4 * 3 + 4)))

We could extend the grammar further to also produce assignments and other statements, and piece by piece cover the entire syntax of the programming language. However, this would be a not-so-great idea. Why?

The problem is that when testing compilers, you not only want to be able to produce code, but also to parse code, such that you can mutate and manipulate it at will. And this is where our "concrete" syntax will give us problems. While we can easily parse code (or expressions) that exactly adheres to the syntax...

expr_solver.check('2 + 2')
True

... a single space will already suffice to make it fail...

expr_solver.check('2 +  2')
Error parsing "2 +  2" starting with "<start>"
False

... as does the absence of spaces:

expr_solver.check('2+2')
Error parsing "2+2" starting with "<start>"
False

Indeed, spaces are optional in most programming languages. We could update our grammar such that it can handle optional spaces at all times (introducing a <space> nonterminal). But then, there are other features like comments...

expr_solver.check('2 + 2    # should be 4')
Error parsing "2 + 2    # should be 4" starting with "<start>"
False

... or continuation lines ...

expr_solver.check('2 + \\\n2')  # An expression split over two lines
Error parsing "2 + \
2" starting with "<start>"
False

that our grammar would have to cover.

On top, there are language features that cannot be even represented properly in a context-free grammar:

  • In the C programming language, for instance, the parser needs to know whether an identifier has been defined as a type
  • In Python, indentation levels cannot be represented by a context-free grammar.

For this reason, it is often a good idea to make use of a dedicated parser (or preprocessor) to turn input into a more abstract representation - typically a tree structure. In programming languages, such a tree is called an abstract syntax tree (AST); it is the data structure that compilers operate on.

Abstract Syntax Trees

Abstract Syntax Trees (ASTs) that represent program code are among the most complex data structures in the world (if not the most complex data structures) - notably because they reflect all the complexity of the programming language and its features. The good news is that in Python, working with ASTs is particularly easy - one can work with them using standard language features.

Let us illustrate ASTs using an example. Here is a piece of code that we'd like to work with:

def main():
    print("Hello, world!")  # A simple example
main()
Hello, world!

Let us obtain the source code of this function:

import inspect
main_source = inspect.getsource(main)
print(main_source)
def main():
    print("Hello, world!")  # A simple example

We make use of the Python AST module to convert this code string to an AST and back.

import ast

With ast.parse(), we can parse the main() source into an AST:

main_tree = ast.parse(main_source)

This is what this tree looks like:

from bookutils import show_ast
show_ast(main_tree)
0 FunctionDef 1 "main" 0--1 2 arguments 0--2 3 Expr 0--3 4 Call 3--4 5 Name 4--5 8 Constant 4--8 6 "print" 5--6 7 Load 5--7 9 "Hello, world!" 8--9

We see how the function definition has become a FunctionDef node, whose third child is an Expr node, which in turn becomes a Call – of the "print" function with an argument of "Hello, world!".

Each of these AST nodes comes as a constructor – that is, we can invoke FunctionDef() to obtain a function definition node, or Call() to obtain a call node. These constructors take the AST children as arguments, but also lots of optional arguments (which we did not use so far). The dump of the AST into a string reveals all the arguments for each constructor:

print(ast.dump(main_tree, indent=4))
Module(
    body=[
        FunctionDef(
            name='main',
            args=arguments(
                posonlyargs=[],
                args=[],
                kwonlyargs=[],
                kw_defaults=[],
                defaults=[]),
            body=[
                Expr(
                    value=Call(
                        func=Name(id='print', ctx=Load()),
                        args=[
                            Constant(value='Hello, world!')],
                        keywords=[]))],
            decorator_list=[])],
    type_ignores=[])

The Python ast documentation lists all these constructors, which make up the abstract syntax. There are more than 100 individual constructors! (We said that ASTs are complex, right?)

The nice thing about the above string representation is that we can take it as is and turn it into a tree again:

from ast import *
my_main_tree = Module(
    body=[
        FunctionDef(
            name='main',
            args=arguments(
                posonlyargs=[],
                args=[],
                kwonlyargs=[],
                kw_defaults=[],
                defaults=[]),
            body=[
                Expr(
                    value=Call(
                        func=Name(id='print', ctx=Load()),
                        args=[
                            Constant(value='Hello, world!')],
                        keywords=[]))],
            decorator_list=[])],
    type_ignores=[])

We can take this tree and compile it into executable code:

my_main_tree = fix_missing_locations(my_main_tree)  # required for trees built from constructors
my_main_code = compile(my_main_tree, filename='<unknown>', mode='exec')
del main  # This deletes the definition of main()
exec(my_main_code)  # This defines main() again from `code`
main()
Hello, world!

We can also unparse the tree (= turn it into source code again). (Note how the comment got lost during parsing.)

print(ast.unparse(my_main_tree))
def main():
    print('Hello, world!')

Hence, we can

  1. Parse concrete code into ASTs (with ast.parse())
  2. Generate new ASTs and mutate existing ones
  3. Unparse ASTs to obtain concrete code again (with ast.unparse())

To generate and mutate ASTs (step #2, above), we need means to produce correct ASTs, invoking all constructors with the correct arguments. The plan is thus to have a grammar for ASTs, which produces (and parses) ASTs as we like.

A Grammar for ASTs

Programming language grammars are among the most complicated formal grammars around, and ASTs reflect much of this complexity. We will use the abstract AST grammar as specified in the Python documentation as base, and build a formal context-free grammar step by step.

Constants

We will start with simple constants – strings and integers. Again, we use the fuzzingbook syntax for grammars, as it allows for easier extension.

import string
ANYTHING_BUT_DOUBLE_QUOTES_AND_BACKSLASH = (string.digits + string.ascii_letters + string.punctuation + ' ').replace('"', '').replace('\\', '')
ANYTHING_BUT_SINGLE_QUOTES_AND_BACKSLASH = (string.digits + string.ascii_letters + string.punctuation + ' ').replace("'", '').replace('\\', '')
ANYTHING_BUT_DOUBLE_QUOTES_AND_BACKSLASH
"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!#$%&'()*+,-./:;<=>?@[]^_`{|}~ "
ANYTHING_BUT_SINGLE_QUOTES_AND_BACKSLASH
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&()*+,-./:;<=>?@[]^_`{|}~ '
PYTHON_AST_CONSTANTS_GRAMMAR: Grammar = {
    '<start>': [ '<expr>' ],

    # Expressions
    '<expr>': [ '<Constant>', '<Expr>' ],
    '<Expr>': [ 'Expr(value=<expr>)' ],

    # Constants
    '<Constant>': [ 'Constant(value=<literal>)' ],
    '<literal>': [ '<string>', '<integer>', '<float>', '<bool>', '<none>' ],

    # Strings
    '<string>': [ '"<not_double_quotes>*"', "'<not_single_quotes>*'" ],
    '<not_double_quotes>': list(ANYTHING_BUT_DOUBLE_QUOTES_AND_BACKSLASH),
    '<not_single_quotes>': list(ANYTHING_BUT_SINGLE_QUOTES_AND_BACKSLASH),
    # FIXME: The actual rules for Python strings are also more complex:
    # https://docs.python.org/3/reference/lexical_analysis.html#numeric-literals

    # Numbers
    '<integer>': [ '<digit>', '<nonzerodigit><digits>' ],
    '<float>': [ '<integer>.<integer>' ],
    '<nonzerodigit>': ['1', '2', '3', '4', '5', '6', '7', '8', '9'],
    '<digits>': [ '<digit><digits>', '<digit>' ],
    '<digit>': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'],
    # FIXME: There are _many_ more ways to express numbers in Python; see
    # https://docs.python.org/3/reference/lexical_analysis.html#numeric-literals

    # More
    '<bool>': [ 'True', 'False' ],
    '<none>': [ 'None' ],

    # FIXME: Not supported: bytes, format strings, regex strings...
}

Note that we use extended Backus-Naur form in our grammars (here: <string>):

  • <elem>+ stands for one or more instances of <elem>;
  • <elem>* stands for zero or more instances of <elem>;
  • <elem>? stands for one or zero instances of <elem>.

A call to is_valid_grammar() ensures our grammar is free of common mistakes. Don't write grammars without it!

assert is_valid_grammar(PYTHON_AST_CONSTANTS_GRAMMAR)
constants_grammar = convert_ebnf_grammar(PYTHON_AST_CONSTANTS_GRAMMAR)
constants_solver = ISLaSolver(constants_grammar)
constants_tree_str = str(constants_solver.solve())
print(constants_tree_str)
Expr(value=Constant(value=None))

We can create an AST from this expression and turn it into Python code (well, a literal):

constants_tree = eval(constants_tree_str)
ast.unparse(constants_tree)
'None'

Let's do this a number of times:

def test_samples(grammar: Grammar, iterations: int = 10, start_symbol = None, log: bool = True):
    g = convert_ebnf_grammar(grammar)
    solver = ISLaSolver(g, start_symbol=start_symbol, max_number_free_instantiations=iterations)
    for i in range(iterations):
        tree_str = str(solver.solve())
        tree = eval(tree_str)
        ast.fix_missing_locations(tree)
        if log:
            code = ast.unparse(tree)
            print(f'{code:40} # {tree_str}')
test_samples(PYTHON_AST_CONSTANTS_GRAMMAR)
False                                    # Expr(value=Constant(value=False))
2                                        # Constant(value=2)
None                                     # Constant(value=None)
'#'                                      # Constant(value="#")
550.81                                   # Constant(value=550.81)
True                                     # Constant(value=True)
'.'                                      # Constant(value='.')
467                                      # Constant(value=467)
7894                                     # Constant(value=7894)
263                                      # Constant(value=263)

Our grammar can also parse ASTs obtained from concrete code.

sample_constant_code = "4711"
sample_constant_ast = ast.parse(sample_constant_code).body[0]  # get the `Expr` node
sample_constant_ast_str = ast.dump(sample_constant_ast)
print(sample_constant_ast_str)
Expr(value=Constant(value=4711))
constant_solver = ISLaSolver(constants_grammar)
constant_solver.check(sample_constant_ast_str)
True

Let us now come up with a quiz question: Does our grammar support negative numbers? For this, let's first find out if the Constant() constructor also take a negative number as an argument? It turns out it can:

ast.unparse(Constant(value=-1))
'-1'

But what happens if we parse a negative number, say -1? One might assume that this simply results in a Constant(-1), right? Try it out yourself!

from bookutils import quiz

Quiz

If we parse a negative number, do we obtain



The answer is that parsing -1 yields a unary minus USub() applied to a positive value:

print(ast.dump(ast.parse('-1')))
Module(body=[Expr(value=UnaryOp(op=USub(), operand=Constant(value=1)))], type_ignores=[])

As unary operators are not part of our grammar (yet), it cannot handle negative numbers:

sample_constant_code = "-1"
sample_constant_ast = ast.parse(sample_constant_code).body[0]  # get the `Expr` node
sample_constant_ast_str = ast.dump(sample_constant_ast)
constant_solver = ISLaSolver(constants_grammar)
constant_solver.check(sample_constant_ast_str)
Error parsing "Expr(value=UnaryOp(op=USub(), operand=Constant(value=1)))" starting with "<start>"
False

In the next sections, we will gradually expand our grammar with more and more Python features, eventually covering (almost) the entire language.

Composites

Let us add composite constants – lists, dictionaries, tuples, etc. Here is how these are represented in an AST:

print(ast.dump(ast.parse("{ 'a': set() }"), indent=4))
Module(
    body=[
        Expr(
            value=Dict(
                keys=[
                    Constant(value='a')],
                values=[
                    Call(
                        func=Name(id='set', ctx=Load()),
                        args=[],
                        keywords=[])]))],
    type_ignores=[])

Let us encode these into a grammar, again using the definitions from the abstract AST grammar. All these structures also take contexts in which identifiers are used – Load() if they are used for evaluation, Store() if they appear on the left-hand side of an assignment (yes, in Python, you can have a tuple on the left-hand side of an assignment, say (x, y) = (1, 2)), and Del() if they are used as operands in a del statement. Right now, we only use Load() and Del() interchangeably.

PYTHON_AST_COMPOSITES_GRAMMAR: Grammar = extend_grammar(
    PYTHON_AST_CONSTANTS_GRAMMAR, {
    '<expr>': PYTHON_AST_CONSTANTS_GRAMMAR['<expr>'] + [
        '<Dict>', '<Set>', '<List>', '<Tuple>'
    ],

    '<Dict>': [ 'Dict(keys=<expr_list>, values=<expr_list>)' ],
    '<Set>': [ 'Set(elts=<nonempty_expr_list>)', '<EmptySet>' ],
    '<EmptySet>': [ 'Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])' ],
    '<List>': [
        'List(elts=<expr_list>, ctx=Load())',
        'List(elts=<expr_list>, ctx=Del())',
    ],
    '<Tuple>': [
        'Tuple(elts=<expr_list>, ctx=Load())',
        'Tuple(elts=<expr_list>, ctx=Del())',
    ],

    # Lists of expressions
    '<expr_list>': [ '[<exprs>?]' ],
    '<nonempty_expr_list>': [ '[<exprs>]' ],
    '<exprs>': [ '<expr>', '<exprs>, <expr>' ],
})
assert is_valid_grammar(PYTHON_AST_COMPOSITES_GRAMMAR)
for elt in [ '<Constant>', '<Dict>', '<Set>', '<List>', '<Tuple>' ]:
    print(elt)
    test_samples(PYTHON_AST_COMPOSITES_GRAMMAR, start_symbol=elt)
    print()
<Constant>
'c'                                      # Constant(value='c')
96.7                                     # Constant(value=96.7)
None                                     # Constant(value=None)
False                                    # Constant(value=False)
505                                      # Constant(value=505)
'U'                                      # Constant(value="U")
True                                     # Constant(value=True)
41398                                    # Constant(value=41398)
24                                       # Constant(value=24)
72                                       # Constant(value=72)

<Dict>
{}                                       # Dict(keys=[], values=[List(elts=[Dict(keys=[List(elts=[Constant(value=9.63)], ctx=Load())], values=[Tuple(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Set(elts=[Constant(value=True), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])])], ctx=Load())]), Constant(value=2), Tuple(elts=[Constant(value=''), Constant(value=False), Set(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), Expr(value=List(elts=[Constant(value=None)], ctx=Load())), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], ctx=Del())], ctx=Del())])
{577: ''}                                # Dict(keys=[Constant(value=577), Constant(value=34), Constant(value=286), Constant(value=7051)], values=[Constant(value="")])
{90: 14}                                 # Dict(keys=[Constant(value=90)], values=[Constant(value=14), Constant(value=88), Constant(value=435)])
{"nF}[ (^{bXBrwzf-P@geW'.]~G>;O2i&/t7Cc5:QU1jR4q_8VJ)Hsxd#o*aT3Sv!$ku?IhMpmA,EL0ZN=`9yK|<Y6lD+%I": 'Gym]A&K;70{jJLC"DV)/Y S.eNMEQq^%?i+-b!hz|gcUBvW485O#pPu~d:(F>_<a}kI2norf9H[T,lXt=w6@Z*1$xs`"R3'} # Dict(keys=[Constant(value="nF}[ (^{bXBrwzf-P@geW'.]~G>;O2i&/t7Cc5:QU1jR4q_8VJ)Hsxd#o*aT3Sv!$ku?IhMpmA,EL0ZN=`9yK|<Y6lD+%I")], values=[Constant(value='Gym]A&K;70{jJLC"DV)/Y S.eNMEQq^%?i+-b!hz|gcUBvW485O#pPu~d:(F>_<a}kI2norf9H[T,lXt=w6@Z*1$xs`"R3')])
{}                                       # Dict(keys=[], values=[])
{}                                       # Dict(keys=[], values=[])
{}                                       # Dict(keys=[], values=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Constant(value=True), Constant(value=687596.53), Dict(keys=[Set(elts=[Expr(value=Set(elts=[Set(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), Constant(value=34.676)]))])], values=[Set(elts=[Set(elts=[List(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Set(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), List(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], ctx=Load())], ctx=Del())])])]), List(elts=[], ctx=Load())])
{}                                       # Dict(keys=[], values=[])
{}                                       # Dict(keys=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], values=[])
{}                                       # Dict(keys=[Tuple(elts=[], ctx=Del())], values=[])

<Set>
{
[], 79.2}                              # Set(elts=[Expr(value=List(elts=[], ctx=Del())), Constant(value=79.2)])
set()                                    # Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])
{{{False: [set()], None: []}, (({20: set()},),)}} # Set(elts=[Set(elts=[Dict(keys=[Constant(value=False), Constant(value=None)], values=[List(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], ctx=Load()), List(elts=[], ctx=Del())]), Tuple(elts=[Tuple(elts=[Dict(keys=[Constant(value=20)], values=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Constant(value=True), List(elts=[], ctx=Del())])], ctx=Load())], ctx=Del())])])
{'Z'}                                    # Set(elts=[Constant(value='Z')])
{3763, ''}                               # Set(elts=[Constant(value=3763), Constant(value="")])
{475, 136, 95, 841, 58}                  # Set(elts=[Constant(value=475), Constant(value=136), Constant(value=95), Constant(value=841), Constant(value=58)])
{"F3Ye]1UZz&sPrG:D-R`k?5d+SM,/4b!uE fW;L$)@oQ'h^qI[(lXgN0wmt=~Jav86|Vp%72CcOBj_nHK<9A*#i}yTx>{."} # Set(elts=[Constant(value="F3Ye]1UZz&sPrG:D-R`k?5d+SM,/4b!uE fW;L$)@oQ'h^qI[(lXgN0wmt=~Jav86|Vp%72CcOBj_nHK<9A*#i}yTx>{.")])
{66, 7}                                  # Set(elts=[Constant(value=66), Constant(value=7)])
{set(), '', None, '_P[', 'L}w,6'}        # Set(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Constant(value=''), Constant(value=None), Constant(value='_P['), Constant(value='L}w,6')])
{'51I{Ef&u;kThXbRo]cV/8)Q@W>4|=J7lHge"+^y%(rv<q.DM:najxi9OUG?!KS zsd2t-Fm3NApB#0$~C`*PY'} # Set(elts=[Constant(value='51I{Ef&u;kThXbRo]cV/8)Q@W>4|=J7lHge"+^y%(rv<q.DM:najxi9OUG?!KS zsd2t-Fm3NApB#0$~C`*PY')])

<List>
[[], {831.3: (7, set(), {('1',)})}]      # List(elts=[List(elts=[], ctx=Load()), Dict(keys=[Constant(value=831.30), Constant(value=None), Expr(value=Tuple(elts=[Constant(value=""), Constant(value=True), Constant(value=False)], ctx=Del()))], values=[Tuple(elts=[Constant(value=7), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Set(elts=[Tuple(elts=[Constant(value='1')], ctx=Load())])], ctx=Load())])], ctx=Del())
[22]                                     # List(elts=[Constant(value=22)], ctx=Load())
[64]                                     # List(elts=[Constant(value=64)], ctx=Load())
[56]                                     # List(elts=[Constant(value=56)], ctx=Del())
[9589]                                   # List(elts=[Constant(value=9589)], ctx=Load())
[780]                                    # List(elts=[Constant(value=780)], ctx=Del())
[164, 47]                                # List(elts=[Constant(value=164), Constant(value=47)], ctx=Load())
["^dG@0 N26zE73qSfX,>xhPlW#j.1cQO4bF+A:LZR'CT=$i_", 'tJI`]gD_M/8yu!%<n~&H|9w*)Ur5sk(e}[vap?V-oK{BYm;eccmO'] # List(elts=[Constant(value="^dG@0 N26zE73qSfX,>xhPlW#j.1cQO4bF+A:LZR'CT=$i_"), Constant(value="tJI`]gD_M/8yu!%<n~&H|9w*)Ur5sk(e}[vap?V-oK{BYm;eccmO")], ctx=Load())
['e]@JX9LBnA:0Ha^3KVf OWuFT%*8ZGtp/x`Cw"li|Mq?_UI45$)zNh#gDcs;!-d[,(~{>bYrE<.RQ27}&moSk+vjP=6y9'] # List(elts=[Constant(value='e]@JX9LBnA:0Ha^3KVf OWuFT%*8ZGtp/x`Cw"li|Mq?_UI45$)zNh#gDcs;!-d[,(~{>bYrE<.RQ27}&moSk+vjP=6y9')], ctx=Load())
[set(), set()]                           # List(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], ctx=Load())

<Tuple>
()                                       # Tuple(elts=[], ctx=Load())
(set(),)                                 # Tuple(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], ctx=Del())
(set(), [], 
1.4, [[None], True], {set(): (False, {set()})}) # Tuple(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), List(elts=[], ctx=Del()), Expr(value=Constant(value=1.4)), List(elts=[List(elts=[Constant(value=None)], ctx=Load()), Constant(value=True)], ctx=Load()), Dict(keys=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Set(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), Expr(value=Constant(value=False))], values=[Tuple(elts=[Constant(value=False), Set(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])])], ctx=Load())])], ctx=Del())
('',)                                    # Tuple(elts=[Constant(value="")], ctx=Load())
(93,)                                    # Tuple(elts=[Constant(value=93)], ctx=Load())
(28371613, 51, 892, 45, 10678, '')       # Tuple(elts=[Constant(value=28371613), Constant(value=51), Constant(value=892), Constant(value=45), Constant(value=10678), Constant(value='')], ctx=Del())
(72, 632)                                # Tuple(elts=[Constant(value=72), Constant(value=632)], ctx=Load())
('p[R#U', '5JRh~3', 'aAI>V+LBk60Ogp')    # Tuple(elts=[Constant(value='p[R#U'), Constant(value="5JRh~3"), Constant(value="aAI>V+LBk60Ogp")], ctx=Load())
(363,)                                   # Tuple(elts=[Constant(value=363)], ctx=Del())
('a*wyz!$CcJ.TDj?<8Q`o}|fG~3%FX/O:r@YW5dK,MqLt^l&B(PbH1_ZInkimvSV4x> u{+2gs)h"e9NA;76]=E-0;',) # Tuple(elts=[Constant(value='a*wyz!$CcJ.TDj?<8Q`o}|fG~3%FX/O:r@YW5dK,MqLt^l&B(PbH1_ZInkimvSV4x> u{+2gs)h"e9NA;76]=E-0;')], ctx=Load())

You may encounter a number of uncommon expressions here. For instance:

  1. () is an empty tuple.
  2. (1,) is a tuple with one element.
  3. {} is an empty dictionary; {1} is a set with one element.
  4. An empty set is denoted by set().

The fact that we use set() to represent empty sets is actually a feature of our PYTHON_AST_COMPOSITES_GRAMMAR grammar. If we invoke the Set() AST constructor without any elements, we obtain this beautiful expression...

print(ast.unparse(Set(elts=[])))
{*()}

... which indeed evaluates into an empty set.

{*()}
set()

Technically speaking, all of this is correct, but we'd like to stick to (somewhat) more readable code. If you want to confuse your programmer friends, always use {*()} instead of set().

Expressions

Let us extend our grammar with expressions. The Python parser already takes care of precedence rules, so we can treat all unary and binary operators in a similar fashion.

print(ast.dump(ast.parse("2 + 2 is not False"), indent=4))
Module(
    body=[
        Expr(
            value=Compare(
                left=BinOp(
                    left=Constant(value=2),
                    op=Add(),
                    right=Constant(value=2)),
                ops=[
                    IsNot()],
                comparators=[
                    Constant(value=False)]))],
    type_ignores=[])
PYTHON_AST_EXPRS_GRAMMAR: Grammar = extend_grammar(PYTHON_AST_COMPOSITES_GRAMMAR, {
    '<expr>': PYTHON_AST_COMPOSITES_GRAMMAR['<expr>'] + [
        '<BoolOp>', '<BinOp>', '<UnaryOp>', '<Compare>',
    ],

    # Booleans: and or
    '<BoolOp>': [ 'BoolOp(op=<boolop>, values=<expr_list>)' ],
    '<boolop>': [ 'And()', 'Or()' ],

    # Binary operators: + - * ...
    '<BinOp>': [ 'BinOp(left=<expr>, op=<operator>, right=<expr>)' ],
    '<operator>': [ 'Add()', 'Sub()', 'Mult()', 'MatMult()',
                   'Div()', 'Mod()', 'Pow()',
                   'LShift()', 'RShift()', 'BitOr()', 'BitXor()', 'BitAnd()',
                   'FloorDiv()' ],

    # Unary operators: not + - ...
    '<UnaryOp>': [ 'UnaryOp(op=<unaryop>, operand=<expr>)'],
    '<unaryop>': [ 'Invert()', 'Not()', 'UAdd()', 'USub()' ],

    # Comparisons: == != < <= > >= is in ...
    '<Compare>': [ 'Compare(left=<expr>, ops=<cmpop_list>, comparators=<expr_list>)'],
    '<cmpop_list>': [ '[<cmpops>?]' ],
    '<cmpops>': [ '<cmpop>', '<cmpop>, <cmpops>' ],
    '<cmpop>': [ 'Eq()', 'NotEq()', 'Lt()', 'LtE()', 'Gt()', 'GtE()',
                 'Is()', 'IsNot()', 'In()', 'NotIn()' ],

    # FIXME: There's a few more expressions: GeneratorExp, Await, YieldFrom, ...
})
assert is_valid_grammar(PYTHON_AST_EXPRS_GRAMMAR)
for elt in [ '<BoolOp>', '<BinOp>', '<UnaryOp>', '<Compare>' ]:
    print(elt)
    test_samples(PYTHON_AST_EXPRS_GRAMMAR, start_symbol=elt)
    print()
<BoolOp>
() and {-([]) / (set(), set()), {
True: set()}} # BoolOp(op=And(), values=[BoolOp(op=Or(), values=[]), Set(elts=[BinOp(left=UnaryOp(op=USub(), operand=Compare(left=List(elts=[], ctx=Del()), ops=[], comparators=[])), op=Div(), right=Tuple(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], ctx=Load())), Dict(keys=[Expr(value=Constant(value=True))], values=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), List(elts=[], ctx=Load())])])])
(set(), set(), set() @ set() | set() + set()) and set() ** (set() ^ set()) * set() # BoolOp(op=And(), values=[Tuple(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=MatMult(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=BitOr(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Add(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])))], ctx=Del()), BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Pow(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitXor(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), op=Mult(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))])
set() % (set() >> set()) - (set() << set()) or set() & set() # BoolOp(op=Or(), values=[BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mod(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=RShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), op=Sub(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=LShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitAnd(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))])
'8' or 6                                 # BoolOp(op=Or(), values=[Constant(value='8'), Constant(value=6)])
~+123.95                                 # BoolOp(op=Or(), values=[UnaryOp(op=Invert(), operand=UnaryOp(op=UAdd(), operand=Constant(value=123.95)))])
not False // None                        # BoolOp(op=Or(), values=[UnaryOp(op=Not(), operand=BinOp(left=Constant(value=False), op=FloorDiv(), right=Constant(value=None)))])
'S' and 6180 in 397494                   # BoolOp(op=And(), values=[Constant(value="S"), Compare(left=Constant(value=6180), ops=[In()], comparators=[Constant(value=397494)])])
41                                       # BoolOp(op=And(), values=[Constant(value=41)])
214                                      # BoolOp(op=Or(), values=[Constant(value=214)])
5818 and "N1qoR6ak 2UJTWyh>!B)/#YKe0]=w{E.-Q`F[5'&^9cA~<V+M$bnLu%H8I3;g*D?rz7Xj:}pPvif_GOtx4,(ZCdmls|@YiT" and 70 and 884 # BoolOp(op=And(), values=[Constant(value=5818), Constant(value="N1qoR6ak 2UJTWyh>!B)/#YKe0]=w{E.-Q`F[5'&^9cA~<V+M$bnLu%H8I3;g*D?rz7Xj:}pPvif_GOtx4,(ZCdmls|@YiT"), Constant(value=70), Constant(value=884)])

<BinOp>
{} - 33                                  # BinOp(left=Expr(value=Dict(keys=[], values=[Tuple(elts=[UnaryOp(op=Invert(), operand=List(elts=[List(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], ctx=Load()), Tuple(elts=[], ctx=Del()), BoolOp(op=And(), values=[Set(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), List(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mod(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))], ctx=Load())])], ctx=Del())), BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Pow(), right=Compare(left=Tuple(elts=[], ctx=Load()), ops=[], comparators=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]))], ctx=Del())])), op=Sub(), right=Constant(value=33))
set() / (set() << set()) * (set() >> set()) // (set() @ set() & set()) # BinOp(left=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Div(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=LShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), op=Mult(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=RShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), op=FloorDiv(), right=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=MatMult(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=BitAnd(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])))
None ^ False                             # BinOp(left=Constant(value=None), op=BitXor(), right=Constant(value=False))
-'' + +7719.5                            # BinOp(left=UnaryOp(op=USub(), operand=Constant(value="")), op=Add(), right=UnaryOp(op=UAdd(), operand=Constant(value=7719.5)))
(set() or 906) >> ('F') | (not (True)) % ((set() and set()) << False) # BinOp(left=BinOp(left=BoolOp(op=Or(), values=[Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[]), Constant(value=906)]), op=RShift(), right=BoolOp(op=Or(), values=[Constant(value='F')])), op=BitOr(), right=BinOp(left=UnaryOp(op=Not(), operand=BoolOp(op=Or(), values=[Constant(value=True)])), op=Mod(), right=BinOp(left=BoolOp(op=And(), values=[Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[]), Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[])]), op=LShift(), right=Constant(value=False))))
((set()) > None != set()) | ((set()))    # BinOp(left=Compare(left=Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[]), ops=[Gt(), NotEq()], comparators=[Constant(value=None), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), op=BitOr(), right=Compare(left=Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[]), ops=[], comparators=[Constant(value=True)]))
524 - 188                                # BinOp(left=Constant(value=524), op=Sub(), right=Constant(value=188))
6214 / 81                                # BinOp(left=Constant(value=6214), op=Div(), right=Constant(value=81))
26 / 43                                  # BinOp(left=Constant(value=26), op=Div(), right=Constant(value=43))
"s85;3Rw?ST!NI]_-eJ(x7'kG|z}C^&fWLnY[Z,rV*Qj.`Ed%:4<t" ^ '/$ao6 U{2cim@hHtF>b+vX)KBg1l=qyMDp~O0#A9uPa+l' # BinOp(left=Constant(value="s85;3Rw?ST!NI]_-eJ(x7'kG|z}C^&fWLnY[Z,rV*Qj.`Ed%:4<t"), op=BitXor(), right=Constant(value="/$ao6 U{2cim@hHtF>b+vX)KBg1l=qyMDp~O0#A9uPa+l"))

<UnaryOp>
+(set(), 
[])                            # UnaryOp(op=UAdd(), operand=Tuple(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Expr(value=List(elts=[], ctx=Del()))], ctx=Del()))
~(None)                                  # UnaryOp(op=Invert(), operand=BoolOp(op=Or(), values=[Constant(value=None)]))
-(((not {set(), set()})) & {set(): set(), (): set()}) # UnaryOp(op=USub(), operand=BinOp(left=Compare(left=UnaryOp(op=Not(), operand=Set(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])])), ops=[], comparators=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), op=BitAnd(), right=Dict(keys=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Tuple(elts=[], ctx=Load())], values=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), List(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], ctx=Load())])))
-((set() + set()) % ((set() << set()) / (set() ^ set()))) ** (set() // set() >> set() * set()) # UnaryOp(op=USub(), operand=BinOp(left=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Add(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=Mod(), right=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=LShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=Div(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitXor(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])))), op=Pow(), right=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=FloorDiv(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=RShift(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mult(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])))))
-True                                    # UnaryOp(op=USub(), operand=Constant(value=True))
+(4.3 @ (823 - '&') | '')                # UnaryOp(op=UAdd(), operand=BinOp(left=BinOp(left=Constant(value=4.30), op=MatMult(), right=BinOp(left=Constant(value=823), op=Sub(), right=Constant(value='&'))), op=BitOr(), right=Constant(value="")))
~(False <= 51 not in 959)                # UnaryOp(op=Invert(), operand=BoolOp(op=And(), values=[Compare(left=Constant(value=False), ops=[LtE(), NotIn()], comparators=[Constant(value=51), Constant(value=959)])]))
~17                                      # UnaryOp(op=Invert(), operand=Constant(value=17))
not 26                                   # UnaryOp(op=Not(), operand=Constant(value=26))
-68                                      # UnaryOp(op=USub(), operand=Constant(value=68))

<Compare>
()                                       # Compare(left=BoolOp(op=Or(), values=[]), ops=[], comparators=[Expr(value=Constant(value=8)), Tuple(elts=[List(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], ctx=Del()), Dict(keys=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], values=[]), Set(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Tuple(elts=[], ctx=Del())]), UnaryOp(op=UAdd(), operand=BinOp(left=Compare(left=Tuple(elts=[], ctx=Load()), ops=[], comparators=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), op=Add(), right=List(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], ctx=Load())))], ctx=Del())])
(set() & set()) / (set() @ (set() - set())) not in set() ^ set() # Compare(left=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitAnd(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=Div(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=MatMult(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Sub(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])))), ops=[NotIn()], comparators=[BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitXor(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mult(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))])
set() % set() // None << (set() - set() >> set() ** set()) <= set() | set() > set() + set() # Compare(left=BinOp(left=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mod(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=FloorDiv(), right=Constant(value=None)), op=LShift(), right=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Sub(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=RShift(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Pow(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])))), ops=[LtE(), Gt()], comparators=[BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitOr(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Add(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), Constant(value=True)])
-632.86 != (not ~(not ('')))             # Compare(left=UnaryOp(op=USub(), operand=Constant(value=632.860)), ops=[NotEq()], comparators=[UnaryOp(op=Not(), operand=UnaryOp(op=Invert(), operand=UnaryOp(op=Not(), operand=BoolOp(op=And(), values=[Constant(value='')]))))])
'' >= 717 is not False                   # Compare(left=Constant(value=""), ops=[GtE(), IsNot(), Eq()], comparators=[Constant(value=717), Constant(value=False)])
15 < 39                                  # Compare(left=Constant(value=15), ops=[Lt(), Is(), Gt(), In()], comparators=[Constant(value=39)])
548 != 934688                            # Compare(left=Constant(value=548), ops=[NotEq(), LtE()], comparators=[Constant(value=934688)])
"w-xSGA8TI{%pRcq6e!_E:}P]9LM/&b1+7*lBDnvu)[o`3dY|Oj~JU<#Z'rH;g,f>@Q0tKk4N$iVaFhzW52y=(C.? sXm^{ " in 425 # Compare(left=Constant(value="w-xSGA8TI{%pRcq6e!_E:}P]9LM/&b1+7*lBDnvu)[o`3dY|Oj~JU<#Z'rH;g,f>@Q0tKk4N$iVaFhzW52y=(C.? sXm^{ "), ops=[In()], comparators=[Constant(value=425), Constant(value=21270)])
'H]3Ky.2p-:#6F%9V{X^8)lMD[;7Otk/hgImvcJf& E`uG}w?PY:' >= 'nCALds|1zjq4BZ$"ab*@_(e<!rT=iUW~05+,Q>oNSxRVpF' # Compare(left=Constant(value='H]3Ky.2p-:#6F%9V{X^8)lMD[;7Otk/hgImvcJf& E`uG}w?PY:'), ops=[GtE(), Gt(), NotIn(), IsNot(), GtE()], comparators=[Constant(value='nCALds|1zjq4BZ$"ab*@_(e<!rT=iUW~05+,Q>oNSxRVpF')])
6.3                                      # Compare(left=Constant(value=6.3), ops=[], comparators=[])

Not all of these expressions are type-correct. For instance, set() * set() raises a type error at runtime. They can be properly parsed, though.

How good is our grammar at this point? Let us create 20 expressions and check how many of these

  1. parse without SyntaxError
  2. evaluate without TypeError.
expr_iterations = 20
bad_syntax = 0
bad_type = 0
ast_exprs_grammar = convert_ebnf_grammar(PYTHON_AST_EXPRS_GRAMMAR)
expr_solver = ISLaSolver(ast_exprs_grammar, max_number_free_instantiations=expr_iterations)
for i in range(expr_iterations):
    expr_tree = eval(str(expr_solver.solve()))
    expr_tree = fix_missing_locations(expr_tree)
    expr_str = ast.unparse(expr_tree)
    print(i, expr_str)
    try:
        ...  # insert parsing code here
    except SyntaxError:
        bad_syntax += 1
    except TypeError:
        bad_type += 1

    try:
        ...  # <-- insert evaluation code here
    except TypeError:
        bad_type += 1
    except SyntaxError:
        bad_syntax += 1

print(f"Bad syntax: {bad_syntax}/{expr_iterations}")
print(f"Bad type: {bad_type}/{expr_iterations}")
0 set()
1 
2 [
~(False,) >> {635: (set() @ set() & set(),), 99.1 not in set() ** set(): {[set() ^ set()]}}]
3 (set() * set() - (set() + set())) / ((set() ^ set()) % set() | set() << set() % set())
4 not None
5 -+(True and '#' and 'x')
6 (8876 > 46 in 36 != 50)
7 24
8 "LfDW-kSM|tpB&+V*RgQ7U]3xq)zh~n^`wTdie5jvPN: A2K?$ZGJ(X;%@sr9mcIu!}OC/1><b=y'0H8o_.4lFYa{6[,>E?"
9 'o,awXihgeM[581Bln"RA60^k2N_L=d$C`7U~f)(&ZG]#m+DqF|PjpIQ<.4ur@ T!-W}Vs:Y{*zOEJb3StHK>?y%c/;iv9'
10 ((set()) < set() is set()) >= ((set()))
11 ((set())) == ((set()) <= set())
12 
13 
14 set()
15 set()
16 []
17 
18 ((() or 'k5'))
19 () | []
Bad syntax: 0/20
Bad type: 0/20

We're not doing too bad here. It is possible, in principle, to use ISLa constraints such that the resulting code will be properly typed - but this would take hundreds to thousands of rules. We will leave this exercise to the reader.

Note that you should not repeat this experiment once identifiers come into play. There is a remote chance that the fuzzer synthesizes a call like os.remove("/") – and away goes your file system!

Names and Function Calls

Let us add some identifiers such that we can call functions.

ID_START = string.ascii_letters + '_'
ID_CONTINUE = ID_START + string.digits
ID_CONTINUE
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_0123456789'
print(ast.dump(ast.parse("xyzzy(a, b=c)"), indent=4))
Module(
    body=[
        Expr(
            value=Call(
                func=Name(id='xyzzy', ctx=Load()),
                args=[
                    Name(id='a', ctx=Load())],
                keywords=[
                    keyword(
                        arg='b',
                        value=Name(id='c', ctx=Load()))]))],
    type_ignores=[])
PYTHON_AST_IDS_GRAMMAR: Grammar = extend_grammar(PYTHON_AST_EXPRS_GRAMMAR, {
    '<expr>': PYTHON_AST_EXPRS_GRAMMAR['<expr>'] + [
        '<Name>', '<Call>'
    ],

    # Identifiers
    '<Name>': [
        'Name(id=<identifier>, ctx=Load())',
        'Name(id=<identifier>, ctx=Del())'
    ],
    '<identifier>': [ "'<id>'" ],
    '<id>': [ '<id_start><id_continue>*' ],
    '<id_start>': list(ID_START),
    '<id_continue>': list(ID_CONTINUE),
    # FIXME: Actual rules are a bit more complex; see
    # https://docs.python.org/3/reference/lexical_analysis.html#identifiers

    # Function Calls
   '<Call>': [ 'Call(func=<func>, args=<expr_list>, keywords=<keyword_list>)' ],
   '<func>': [ '<expr>' ],  # Actually <Expr>, but this is more readable and parses 90%
    '<keyword_list>': [ '[<keywords>?]' ],
    '<keywords>': [ '<keyword>', '<keyword>, <keywords>' ],
    '<keyword>': [ 'keyword(arg=<identifier>, value=<expr>)' ]
})
assert is_valid_grammar(PYTHON_AST_IDS_GRAMMAR)
for elt in [ '<Name>', '<Call>' ]:
    print(elt)
    test_samples(PYTHON_AST_IDS_GRAMMAR, start_symbol=elt)
    print()
<Name>
n                                        # Name(id='n', ctx=Load())
vmGtKyT3Oq1gBC_srAIRaeQw6Dh8V5oLdj9FcvHfb4MpPZiNuEJ27WYU0lnkSxX9Lz # Name(id='vmGtKyT3Oq1gBC_srAIRaeQw6Dh8V5oLdj9FcvHfb4MpPZiNuEJ27WYU0lnkSxX9Lz', ctx=Del())
h                                        # Name(id='h', ctx=Load())
L                                        # Name(id='L', ctx=Del())
M                                        # Name(id='M', ctx=Load())
g                                        # Name(id='g', ctx=Del())
P                                        # Name(id='P', ctx=Del())
It                                       # Name(id='It', ctx=Del())
jGn7g                                    # Name(id='jGn7g', ctx=Load())
psj                                      # Name(id='psj', ctx=Del())

<Call>
{{}()}(set(), set(), W=set(), a=set())   # Call(func=Set(elts=[Call(func=Dict(keys=[], values=[]), args=[], keywords=[])]), args=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[])], keywords=[keyword(arg='W', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), keyword(arg='a', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))])
((set(), set()) % ())(~set(), Icz=[])    # Call(func=BinOp(left=Tuple(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], ctx=Del()), op=Mod(), right=BoolOp(op=Or(), values=[])), args=[UnaryOp(op=Invert(), operand=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))], keywords=[keyword(arg='Icz', value=List(elts=[], ctx=Del()))])
Ywn(set(), [], 358, J1eY=N)              # Call(func=Expr(value=Name(id='Ywn', ctx=Del())), args=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), List(elts=[], ctx=Load()), Constant(value=358)], keywords=[keyword(arg='J1eY', value=Name(id='N', ctx=Load()))])
(set() / (set() | set()))(set() @ set(), (set(),), Z=set(), K=set()) # Call(func=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Div(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitOr(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), args=[BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=MatMult(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), Tuple(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], ctx=Load())], keywords=[keyword(arg='Z', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), keyword(arg='K', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))])
(set() * set() // (set() - (set() & set())))(set() ^ set(), U=set(), o=set()) # Call(func=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mult(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=FloorDiv(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Sub(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitAnd(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])))), args=[BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitXor(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))], keywords=[keyword(arg='U', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), keyword(arg='o', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))])
(set() << set() + set())(None, P=set(), R=set()) # Call(func=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=LShift(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Add(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), args=[Constant(value=None)], keywords=[keyword(arg='P', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), keyword(arg='R', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))])
(+False)(set() >> set(), not 6.1, H=--set(), M=True) # Call(func=UnaryOp(op=UAdd(), operand=Constant(value=False)), args=[BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=RShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), UnaryOp(op=Not(), operand=Constant(value=6.1))], keywords=[keyword(arg='H', value=UnaryOp(op=USub(), operand=UnaryOp(op=USub(), operand=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])))), keyword(arg='M', value=Constant(value=True))])
(set() and set() and set())(None, '"', m=True, X=set(), E=set()) # Call(func=BoolOp(op=And(), values=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), args=[Constant(value=None), Constant(value='"')], keywords=[keyword(arg='m', value=Constant(value=True)), keyword(arg='X', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), keyword(arg='E', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))])
(set() ** (set() % set()))(set(), t=set() << set(), p=set(), f=set()) # Call(func=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Pow(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mod(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), args=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], keywords=[keyword(arg='t', value=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=LShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), keyword(arg='p', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), keyword(arg='f', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))])
'2'(set(), set(), set(), set(), c=(set())) # Call(func=Constant(value="2"), args=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[]), Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[]), Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[])], keywords=[keyword(arg='c', value=Compare(left=Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[]), ops=[], comparators=[]))])
ast_ids_grammar = convert_ebnf_grammar(PYTHON_AST_IDS_GRAMMAR)
id_solver = ISLaSolver(ast_ids_grammar, start_symbol='<id>')
assert id_solver.check('open')
name_solver = ISLaSolver(ast_ids_grammar)
assert name_solver.check("Name(id='open', ctx=Load())")
call_solver = ISLaSolver(ast_ids_grammar, start_symbol='<keyword_list>')
assert call_solver.check('[]')
call_str = ast.dump(ast.parse('open()').body[0].value)
call_solver = ISLaSolver(ast_ids_grammar)
assert call_solver.check(call_str)
Attributes and Subscripts

Let us add attributes and subscripts.

print(ast.dump(ast.parse("a[b].c"), indent=4))
Module(
    body=[
        Expr(
            value=Attribute(
                value=Subscript(
                    value=Name(id='a', ctx=Load()),
                    slice=Name(id='b', ctx=Load()),
                    ctx=Load()),
                attr='c',
                ctx=Load()))],
    type_ignores=[])
PYTHON_AST_ATTRS_GRAMMAR: Grammar = extend_grammar(PYTHON_AST_IDS_GRAMMAR, {
    '<expr>': PYTHON_AST_IDS_GRAMMAR['<expr>'] + [
        '<Attribute>', '<Subscript>', '<Starred>',
    ],

    # Attributes
    '<Attribute>': [
        'Attribute(value=<expr>, attr=<identifier>, ctx=Load())',
        'Attribute(value=<expr>, attr=<identifier>, ctx=Del())',
    ],

    # Subscripts
    '<Subscript>': [
        'Subscript(value=<expr>, slice=<Slice>, ctx=Load())',
        'Subscript(value=<expr>, slice=<Slice>, ctx=Del())',
    ],
    '<Slice>': [
        'Slice()',
        'Slice(<expr>)',
        'Slice(<expr>, <expr>)',
        'Slice(<expr>, <expr>, <expr>)',
    ],

    # Starred
    '<Starred>': [
        'Starred(value=<expr>, ctx=Load())',
        'Starred(value=<expr>, ctx=Del())',
    ],

    # We're extending the set of callers a bit
    '<func>': [ '<Name>', '<Attribute>', '<Subscript>' ],
})
assert is_valid_grammar(PYTHON_AST_ATTRS_GRAMMAR)
for elt in [ '<Attribute>', '<Subscript>', '<Starred>' ]:
    print(elt)
    test_samples(PYTHON_AST_ATTRS_GRAMMAR, start_symbol=elt)
    print()
<Attribute>
(set()).sB                               # Attribute(value=Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[Is()], comparators=[]), attr='sB', ctx=Del())
FREed3NqLMUugcArznkP9yhK2D1bfmsCoD0.aQHvYZ_VatlOGFxjS87ipTw0WJ6XI45C1 # Attribute(value=Name(id='FREed3NqLMUugcArznkP9yhK2D1bfmsCoD0', ctx=Del()), attr='aQHvYZ_VatlOGFxjS87ipTw0WJ6XI45C1', ctx=Load())
*True .V[() % {}:(set(), set(), set()):[set()[:](), ~set()]].Z0rA # Attribute(value=Expr(value=Starred(value=Subscript(value=Attribute(value=Constant(value=True), attr='V', ctx=Load()), slice=Slice(BinOp(left=BoolOp(op=And(), values=[]), op=Mod(), right=Dict(keys=[], values=[])), Tuple(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], ctx=Load()), List(elts=[Call(func=Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Del()), args=[], keywords=[]), UnaryOp(op=Invert(), operand=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))], ctx=Del())), ctx=Load()), ctx=Load())), attr='Z0rA', ctx=Del())
*[{(set(), set(), set() * set(), (set() | set()) // (set() >> set())), la_}].f # Attribute(value=Starred(value=List(elts=[Set(elts=[Tuple(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mult(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitOr(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=FloorDiv(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=RShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])))], ctx=Del()), Name(id='la_', ctx=Load())])], ctx=Load()), ctx=Del()), attr='f', ctx=Load())
((set() + set()) ** (set() @ set() << set()) ^ (set() - set()) / (set() & set())).t # Attribute(value=BinOp(left=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Add(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=Pow(), right=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=MatMult(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=LShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), op=BitXor(), right=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Sub(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=Div(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitAnd(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])))), attr='t', ctx=Del())
0.567.C                                  # Attribute(value=Constant(value=0.567), attr='C', ctx=Load())
(+'').BD                                 # Attribute(value=UnaryOp(op=UAdd(), operand=Constant(value="")), attr='BD', ctx=Load())
K((-11)[None:not set()], r=set(), j=set()).X # Attribute(value=Call(func=Name(id='K', ctx=Del()), args=[Subscript(value=UnaryOp(op=USub(), operand=Constant(value=11)), slice=Slice(Constant(value=None), UnaryOp(op=Not(), operand=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), ctx=Del())], keywords=[keyword(arg='r', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), keyword(arg='j', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))]), attr='X', ctx=Load())
y(U=set(), u=set())[set().p(, n=set()):].W # Attribute(value=Subscript(value=Call(func=Name(id='y', ctx=Del()), args=[], keywords=[keyword(arg='U', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), keyword(arg='u', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))]), slice=Slice(Call(func=Attribute(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), attr='p', ctx=Del()), args=[BoolOp(op=Or(), values=[])], keywords=[keyword(arg='n', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))])), ctx=Del()), attr='W', ctx=Load())
''.Q                                     # Attribute(value=Constant(value=''), attr='Q', ctx=Load())

<Subscript>
(set() ^ set()).D(f=set(), d=set().E)[{set(), set()}:g] # Subscript(value=Call(func=Attribute(value=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitXor(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), attr='D', ctx=Del()), args=[], keywords=[keyword(arg='f', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), keyword(arg='d', value=Attribute(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), attr='E', ctx=Load()))]), slice=Slice(Set(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), Name(id='g', ctx=Del())), ctx=Load())
(*
[set(), not {set(): set(), set(): set()}, (set() or set())][True:],)[:] # Subscript(value=Tuple(elts=[Starred(value=Expr(value=Subscript(value=List(elts=[BoolOp(op=And(), values=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), UnaryOp(op=Not(), operand=Dict(keys=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], values=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Name(id='Y', ctx=Load())])), Compare(left=BoolOp(op=Or(), values=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), ops=[Gt()], comparators=[])], ctx=Del()), slice=Slice(Constant(value=True)), ctx=Del())), ctx=Load())], ctx=Del()), slice=Slice(), ctx=Del())
[*set() % set()][(set(), set() ** set()):(set(), set(), set()):[set()]] # Subscript(value=List(elts=[Starred(value=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mod(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Del())], ctx=Del()), slice=Slice(Tuple(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Pow(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))], ctx=Load()), Tuple(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], ctx=Load()), List(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], ctx=Load())), ctx=Load())
(set() * set() + (set() >> set()))[(set() | set()) @ set():set() - (set() << set())] # Subscript(value=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mult(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=Add(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=RShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), slice=Slice(BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitOr(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=MatMult(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Sub(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=LShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])))), ctx=Load())
'T'[885:]                                # Subscript(value=Constant(value='T'), slice=Slice(Constant(value=885)), ctx=Del())
((not ~set()) & -set())[-set() // +set()[:]():] # Subscript(value=BinOp(left=UnaryOp(op=Not(), operand=UnaryOp(op=Invert(), operand=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), op=BitAnd(), right=UnaryOp(op=USub(), operand=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), slice=Slice(BinOp(left=UnaryOp(op=USub(), operand=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=FloorDiv(), right=UnaryOp(op=UAdd(), operand=Call(func=Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Del()), args=[], keywords=[])))), ctx=Load())
51.3[None:0]                             # Subscript(value=Constant(value=51.3), slice=Slice(Constant(value=None), Constant(value=0)), ctx=Load())
A(J=set())[i():set() / set()]            # Subscript(value=Call(func=Name(id='A', ctx=Del()), args=[], keywords=[keyword(arg='J', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))]), slice=Slice(Call(func=Name(id='i', ctx=Load()), args=[], keywords=[]), BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Div(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), ctx=Del())
False[62:14:'A']                         # Subscript(value=Constant(value=False), slice=Slice(Constant(value=62), Constant(value=14), Constant(value="A")), ctx=Del())
(set())[set():set():set()]               # Subscript(value=Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[]), slice=Slice(Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[]), Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Load())

<Starred>
*T(dTxZ7oJd2IgEAnCjHPfGRsyeQ36hUcbW=qL4arDOYMlS_itF1v5NVKwBk0W9zmXq8upKKMofbL) # Starred(value=Call(func=Name(id='T', ctx=Del()), args=[], keywords=[keyword(arg='dTxZ7oJd2IgEAnCjHPfGRsyeQ36hUcbW', value=Name(id='qL4arDOYMlS_itF1v5NVKwBk0W9zmXq8upKKMofbL', ctx=Load()))]), ctx=Load())
*[{set()[:], set(), {set(): set()}, '' << 
(set(), *set().A)}] # Starred(value=List(elts=[Set(elts=[Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Del()), BoolOp(op=And(), values=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), Dict(keys=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], values=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), BinOp(left=Constant(value=''), op=LShift(), right=Expr(value=Tuple(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Starred(value=Attribute(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), attr='A', ctx=Del()), ctx=Del())], ctx=Load())))])], ctx=Del()), ctx=Del())
*~([].s[set() / (set() | set()):] == (set() ** set(), [], [])) # Starred(value=UnaryOp(op=Invert(), operand=Compare(left=Subscript(value=Attribute(value=List(elts=[], ctx=Load()), attr='s', ctx=Load()), slice=Slice(BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Div(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitOr(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])))), ctx=Load()), ops=[Eq(), GtE()], comparators=[Tuple(elts=[BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Pow(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), List(elts=[], ctx=Load()), List(elts=[], ctx=Del())], ctx=Del())])), ctx=Del())
*set() // set() % set() - set() @ set() * (set() >> set()) # Starred(value=BinOp(left=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=FloorDiv(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=Mod(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=Sub(), right=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=MatMult(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=Mult(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=RShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])))), ctx=Del())
*None                                    # Starred(value=Constant(value=None), ctx=Load())
*991                                     # Starred(value=Constant(value=991), ctx=Load())
*+-(set() & set())[set() % J():set() ^ set()][set() - set():set() | set():set()[:]()] + (not -10.7) # Starred(value=BinOp(left=UnaryOp(op=UAdd(), operand=UnaryOp(op=USub(), operand=Subscript(value=Subscript(value=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitAnd(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), slice=Slice(BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mod(), right=Call(func=Name(id='J', ctx=Del()), args=[], keywords=[])), BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitXor(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), ctx=Load()), slice=Slice(BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Sub(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitOr(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), Call(func=Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Del()), args=[], keywords=[])), ctx=Load()))), op=Add(), right=UnaryOp(op=Not(), operand=UnaryOp(op=USub(), operand=Constant(value=10.7)))), ctx=Load())
*(False)                                 # Starred(value=BoolOp(op=Or(), values=[Constant(value=False)]), ctx=Load())
*True .G5('', D=set(), W=set())          # Starred(value=Call(func=Attribute(value=Constant(value=True), attr='G5', ctx=Del()), args=[Constant(value="")], keywords=[keyword(arg='D', value=Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[])), keyword(arg='W', value=Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[]))]), ctx=Del())
*v                                       # Starred(value=Name(id='v', ctx=Load()), ctx=Load())
Variable Assignments

Now for variable assignments. These make things more complex, as we have a restricted set of expressions on the left hand side of an assignment.

PYTHON_AST_ASSIGNMENTS_GRAMMAR: Grammar = extend_grammar(PYTHON_AST_ATTRS_GRAMMAR, {
    '<start>': [ '<stmt>' ],

    '<stmt>': [
        '<Assign>', '<AugAssign>',
        '<Expr>'
    ],

    # Assignments
    '<Assign>': [
        'Assign(targets=<nonempty_lhs_expr_list>, value=<expr><type_comment>?)',
    ],
    '<type_comment>': [ ', type_comment=<string>' ],
    '<AugAssign>': [
        'AugAssign(target=<lhs_expr>, op=<operator>, value=<expr>)',
    ],

    # Lists of left-hand side expressions
    # '<lhs_expr_list>': [ '[<lhs_exprs>?]' ],
    '<nonempty_lhs_expr_list>': [ '[<lhs_exprs>]' ],
    '<lhs_exprs>': [ '<lhs_expr>', '<lhs_exprs>, <lhs_expr>' ],

    # On the left-hand side of assignments, we allow a number of structures
    '<lhs_expr>': [
        '<lhs_Name>',  # Most common
        '<lhs_List>', '<lhs_Tuple>',
        '<lhs_Attribute>',
        '<lhs_Subscript>',
        '<lhs_Starred>',
    ],

    '<lhs_Name>': [ 'Name(id=<identifier>, ctx=Store())', ],

    '<lhs_List>': [
        'List(elts=<nonempty_lhs_expr_list>, ctx=Store())',
    ],
    '<lhs_Tuple>': [
        'Tuple(elts=<nonempty_lhs_expr_list>, ctx=Store())',
    ],
    '<lhs_Attribute>': [
        'Attribute(value=<lhs_expr>, attr=<identifier>, ctx=Store())',
    ],
    '<lhs_Subscript>': [
        'Subscript(value=<lhs_expr>, slice=<Slice>, ctx=Store())',
    ],
    '<lhs_Starred>': [
        'Starred(value=<lhs_expr>, ctx=Store())',
    ],
})
assert is_valid_grammar(PYTHON_AST_ASSIGNMENTS_GRAMMAR)
for elt in ['<Assign>', '<AugAssign>']:
    print(elt)
    test_samples(PYTHON_AST_ASSIGNMENTS_GRAMMAR, start_symbol=elt)
    print()
<Assign>
([U, S, z[:][:]], *o.FQA) = I = (3,)[:]  # Assign(targets=[Tuple(elts=[List(elts=[Name(id='U', ctx=Store()), Name(id='S', ctx=Store()), Subscript(value=Subscript(value=Name(id='z', ctx=Store()), slice=Slice(), ctx=Store()), slice=Slice(), ctx=Store())], ctx=Store()), Attribute(value=Starred(value=Name(id='o', ctx=Store()), ctx=Store()), attr='FQA', ctx=Store())], ctx=Store()), Name(id='I', ctx=Store())], value=Subscript(value=Tuple(elts=[Constant(value=3)], ctx=Del()), slice=Slice(), ctx=Del()))
b[set():] = L[:] = i[:][{}:set()] = *
.q0DVnL # type: K # Assign(targets=[Subscript(value=Name(id='b', ctx=Store()), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Store()), Subscript(value=Name(id='L', ctx=Store()), slice=Slice(), ctx=Store()), Subscript(value=Subscript(value=Name(id='i', ctx=Store()), slice=Slice(), ctx=Store()), slice=Slice(Dict(keys=[], values=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Store())], value=Starred(value=Attribute(value=Expr(value=BoolOp(op=Or(), values=[])), attr='q0DVnL', ctx=Load()), ctx=Del()), type_comment='K')
P[:][set():set():set()] = m[:][:] = N_oX5 # type:  # Assign(targets=[Subscript(value=Subscript(value=Name(id='P', ctx=Store()), slice=Slice(), ctx=Store()), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Store()), Subscript(value=Subscript(value=Name(id='m', ctx=Store()), slice=Slice(), ctx=Store()), slice=Slice(), ctx=Store())], value=Name(id='N_oX5', ctx=Load()), type_comment="")
p[:][set():][not set():set() - set()] = [set(), set()] # type: #1`OUQ- # Assign(targets=[Subscript(value=Subscript(value=Subscript(value=Name(id='p', ctx=Store()), slice=Slice(), ctx=Store()), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Store()), slice=Slice(UnaryOp(op=Not(), operand=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Sub(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), ctx=Store())], value=List(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[])], ctx=Del()), type_comment="#1`OUQ-")
YcpPj1BY93lr = jMtI(*set()) # type: iM.=!X{(4u# # Assign(targets=[Name(id='YcpPj1BY93lr', ctx=Store())], value=Call(func=Name(id='jMtI', ctx=Del()), args=[Starred(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ctx=Load())], keywords=[]), type_comment='iM.=!X{(4u#')
C = u = V = h = Zdk = [] # type: b>S     # Assign(targets=[Name(id='C', ctx=Store()), Name(id='u', ctx=Store()), Name(id='V', ctx=Store()), Name(id='h', ctx=Store()), Name(id='Zdk', ctx=Store())], value=List(elts=[], ctx=Load()), type_comment="b>S")
H = _ = t = J = ({set()},).K # type: 5s$wu # Assign(targets=[Name(id='H', ctx=Store()), Name(id='_', ctx=Store()), Name(id='t', ctx=Store()), Name(id='J', ctx=Store())], value=Attribute(value=Tuple(elts=[Set(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])])], ctx=Load()), attr='K', ctx=Del()), type_comment="5s$wu")
B = d = T = (set() / set())[set():] # type: c0^ # Assign(targets=[Name(id='B', ctx=Store()), Name(id='d', ctx=Store()), Name(id='T', ctx=Store())], value=Subscript(value=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Div(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Load()), type_comment='c0^')
s = w = y = R = set() | set() >> set() # type: !'GaPe # Assign(targets=[Name(id='s', ctx=Store()), Name(id='w', ctx=Store()), Name(id='y', ctx=Store()), Name(id='R', ctx=Store())], value=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitOr(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=RShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), type_comment="!'GaPe")
l = (set() ^ set() & set()) % (set() * set() ** set()) # type: @zG # Assign(targets=[Name(id='l', ctx=Store())], value=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitXor(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitAnd(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), op=Mod(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mult(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Pow(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])))), type_comment='@zG')

<AugAssign>
[C[(set() / set()[:]).n:], *(L[set():set():set()],)].S %=  # AugAssign(target=Attribute(value=List(elts=[Subscript(value=Name(id='C', ctx=Store()), slice=Slice(Attribute(value=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Div(), right=Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Load())), attr='n', ctx=Del())), ctx=Store()), Starred(value=Tuple(elts=[Subscript(value=Name(id='L', ctx=Store()), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Store())], ctx=Store()), ctx=Store())], ctx=Store()), attr='S', ctx=Store()), op=Mod(), value=BoolOp(op=Or(), values=[]))
h[:][:][set():set()] <<= *[set(), set()[:]()] # AugAssign(target=Subscript(value=Subscript(value=Subscript(value=Name(id='h', ctx=Store()), slice=Slice(), ctx=Store()), slice=Slice(), ctx=Store()), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[])), ctx=Store()), op=LShift(), value=Starred(value=List(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Del()), args=[], keywords=[])], ctx=Del()), ctx=Load()))
j[set():][set():set()][None:
set():~set()] >>= (gFgO,) # AugAssign(target=Subscript(value=Subscript(value=Subscript(value=Name(id='j', ctx=Store()), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Store()), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Store()), slice=Slice(Constant(value=None), Expr(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), UnaryOp(op=Invert(), operand=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), ctx=Store()), op=RShift(), value=Tuple(elts=[Name(id='gFgO', ctx=Del())], ctx=Load()))
uE5s3hoYmSlJPAkZ @= {Q7p: {*().p}}       # AugAssign(target=Name(id='uE5s3hoYmSlJPAkZ', ctx=Store()), op=MatMult(), value=Dict(keys=[Name(id='Q7p', ctx=Load())], values=[Set(elts=[Attribute(value=Starred(value=Tuple(elts=[], ctx=Del()), ctx=Del()), attr='p', ctx=Load())])]))
O4Ka1 |= [(set() ^ set()) & set() ** set() // (set() * set())] # AugAssign(target=Name(id='O4Ka1', ctx=Store()), op=BitOr(), value=List(elts=[BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitXor(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=BitAnd(), right=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Pow(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=FloorDiv(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mult(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))))], ctx=Load()))
_Gze_CMwfTBiRHvqQDVuL9jXy8Nnx2Ib0crU6tWdfbjH += 9.446 # AugAssign(target=Name(id='_Gze_CMwfTBiRHvqQDVuL9jXy8Nnx2Ib0crU6tWdfbjH', ctx=Store()), op=Add(), value=Constant(value=9.446))
r -= True                                # AugAssign(target=Name(id='r', ctx=Store()), op=Sub(), value=Constant(value=True))
k *= -(not +60)                          # AugAssign(target=Name(id='k', ctx=Store()), op=Mult(), value=UnaryOp(op=USub(), operand=UnaryOp(op=Not(), operand=UnaryOp(op=UAdd(), operand=Constant(value=60)))))
D >>= e().J(w(I=set()), G=set(), Y=set()) # AugAssign(target=Name(id='D', ctx=Store()), op=RShift(), value=Call(func=Attribute(value=Call(func=Name(id='e', ctx=Del()), args=[], keywords=[]), attr='J', ctx=Load()), args=[Call(func=Name(id='w', ctx=Load()), args=[], keywords=[keyword(arg='I', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))])], keywords=[keyword(arg='G', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), keyword(arg='Y', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))]))
a += ''                                  # AugAssign(target=Name(id='a', ctx=Store()), op=Add(), value=Constant(value=''))
Statements

Now for statements. There's quite a lot of these.

PYTHON_AST_STMTS_GRAMMAR: Grammar = extend_grammar(PYTHON_AST_ASSIGNMENTS_GRAMMAR, {
    '<start>': [ '<stmt>' ],

    '<stmt>': PYTHON_AST_ASSIGNMENTS_GRAMMAR['<stmt>'] + [
        '<For>', '<While>', '<If>',
        '<Return>', '<Delete>', '<Assert>',
        '<Pass>', '<Break>', '<Continue>',
        '<With>'
    ],

    # Control structures
    '<For>': [
        'For(target=<lhs_expr>, iter=<expr>, body=<nonempty_stmt_list>, orelse=<stmt_list><type_comment>)'
    ],
    '<stmt_list>': [ '[<stmts>?]' ],
    '<nonempty_stmt_list>': [ '[<stmts>]' ],
    '<stmts>': [ '<stmt>', '<stmt>, <stmts>' ],

    '<While>': [
        'While(test=<expr>, body=<nonempty_stmt_list>, orelse=<stmt_list>)'
    ],

    '<If>': [
        'If(test=<expr>, body=<nonempty_stmt_list>, orelse=<stmt_list>)'
    ],

    '<With>': [
        'With(items=<withitem_list>, body=<nonempty_stmt_list><type_comment>?)'
    ],
    '<withitem_list>': [ '[<withitems>?]' ],
    '<withitems>': [ '<withitem>', '<withitems>, <withitem>' ],
    '<withitem>': [
        'withitem(context_expr=<expr>)',
        'withitem(context_expr=<expr>, optional_vars=<lhs_expr>)',
    ],

    # Other statements
    '<Return>': [
        'Return()',
        'Return(value=<expr>)'
    ],
    '<Delete>': [
        'Delete(targets=<expr_list>)'
    ],
    '<Assert>': [
        'Assert(test=<expr>)',
        'Assert(test=<expr>, msg=<expr>)'
    ],
    '<Pass>': [ 'Pass()'],
    '<Break>': [ 'Break()' ],
    '<Continue>': [ 'Continue()']

    # FIXME: A few more: AsyncFor, AsyncWith, Match, Try, TryStar, With
    # Import, ImportFrom, Global, Nonlocal...
})
assert is_valid_grammar(PYTHON_AST_STMTS_GRAMMAR)
for elt in PYTHON_AST_STMTS_GRAMMAR['<stmt>']:
    print(elt)
    test_samples(PYTHON_AST_STMTS_GRAMMAR, start_symbol=elt)
    print()
<Assign>
[V, L][:].pzMx7 = {}                     # Assign(targets=[Attribute(value=Subscript(value=List(elts=[Name(id='V', ctx=Store()), Name(id='L', ctx=Store())], ctx=Store()), slice=Slice(), ctx=Store()), attr='pzMx7', ctx=Store())], value=Dict(keys=[], values=[Attribute(value=Compare(left=Name(id='QnD', ctx=Del()), ops=[Is()], comparators=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), attr='J', ctx=Del())]))
*X[:] = (I[:], _, x) = ()[set():set():set()](set(), set()) # type: ] # Assign(targets=[Starred(value=Subscript(value=Name(id='X', ctx=Store()), slice=Slice(), ctx=Store()), ctx=Store()), Tuple(elts=[Subscript(value=Name(id='I', ctx=Store()), slice=Slice(), ctx=Store()), Name(id='_', ctx=Store()), Name(id='x', ctx=Store())], ctx=Store())], value=Call(func=Subscript(value=Tuple(elts=[], ctx=Del()), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Load()), args=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], keywords=[]), type_comment=']')
H[:][set():set():set()][+set():] = b[set():set()][set():set()[:]] = 3.32 # type: ] # Assign(targets=[Subscript(value=Subscript(value=Subscript(value=Name(id='H', ctx=Store()), slice=Slice(), ctx=Store()), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Store()), slice=Slice(UnaryOp(op=UAdd(), operand=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), ctx=Store()), Subscript(value=Subscript(value=Name(id='b', ctx=Store()), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Store()), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Del())), ctx=Store())], value=Constant(value=3.32), type_comment="]")
u[set():set():set()][set():set() / set():[]] = 
*() # type: { # Assign(targets=[Subscript(value=Subscript(value=Name(id='u', ctx=Store()), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Store()), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Div(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), List(elts=[], ctx=Del())), ctx=Store())], value=Expr(value=Starred(value=BoolOp(op=Or(), values=[]), ctx=Load())), type_comment='{')
j = d = q = o = GkRtG = Rbp2Z # type: M+2% # Assign(targets=[Name(id='j', ctx=Store()), Name(id='d', ctx=Store()), Name(id='q', ctx=Store()), Name(id='o', ctx=Store()), Name(id='GkRtG', ctx=Store())], value=Name(id='Rbp2Z', ctx=Load()), type_comment="M+2%")
aS = kd = *[(), {set(), set()}] # type: IF&!p # Assign(targets=[Name(id='aS', ctx=Store()), Name(id='kd', ctx=Store())], value=Starred(value=List(elts=[Tuple(elts=[], ctx=Load()), Set(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])])], ctx=Load()), ctx=Del()), type_comment='IF&!p')
e = ((set() @ set()) ** (set() - set())).Wf # type: )$ # Assign(targets=[Name(id='e', ctx=Store())], value=Attribute(value=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=MatMult(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=Pow(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Sub(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), attr='Wf', ctx=Load()), type_comment=")$")
i = n = r = g = Z = (set() << set()) // (set() >> set()) # type: c, # Assign(targets=[Name(id='i', ctx=Store()), Name(id='n', ctx=Store()), Name(id='r', ctx=Store()), Name(id='g', ctx=Store()), Name(id='Z', ctx=Store())], value=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=LShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=FloorDiv(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=RShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), type_comment="c,")
Ml_8 = (set() + (set() ^ set())) * (set() & set()) # type: _Ag # Assign(targets=[Name(id='Ml_8', ctx=Store())], value=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Add(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitXor(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), op=Mult(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitAnd(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), type_comment='_Ag')
h = v = K = sL5 = None # type: u8?srW    # Assign(targets=[Name(id='h', ctx=Store()), Name(id='v', ctx=Store()), Name(id='K', ctx=Store()), Name(id='sL5', ctx=Store())], value=Constant(value=None), type_comment="u8?srW")

<AugAssign>
[(*c[:][set():d][():].lQ8DVvkj,), Qy] += 
False # AugAssign(target=List(elts=[Tuple(elts=[Starred(value=Attribute(value=Subscript(value=Subscript(value=Subscript(value=Name(id='c', ctx=Store()), slice=Slice(), ctx=Store()), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Name(id='d', ctx=Load())), ctx=Store()), slice=Slice(Compare(left=BoolOp(op=And(), values=[]), ops=[Gt()], comparators=[])), ctx=Store()), attr='lQ8DVvkj', ctx=Store()), ctx=Store())], ctx=Store()), Name(id='Qy', ctx=Store())], ctx=Store()), op=Add(), value=Expr(value=Constant(value=False)))
k[:][set():set()][~():*set():{}] >>= set()[:]()[set().O:] / [set()] # AugAssign(target=Subscript(value=Subscript(value=Subscript(value=Name(id='k', ctx=Store()), slice=Slice(), ctx=Store()), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Store()), slice=Slice(UnaryOp(op=Invert(), operand=Tuple(elts=[], ctx=Load())), Starred(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ctx=Del()), Dict(keys=[], values=[])), ctx=Store()), op=RShift(), value=BinOp(left=Subscript(value=Call(func=Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Del()), args=[], keywords=[]), slice=Slice(Attribute(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), attr='O', ctx=Del())), ctx=Load()), op=Div(), right=List(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], ctx=Load())))
gBq7H @= uCP3rJUW                        # AugAssign(target=Name(id='gBq7H', ctx=Store()), op=MatMult(), value=Name(id='uCP3rJUW', ctx=Del()))
H4X6IEiGm *= {*((set() | set()) % set(),), [(set() ** set())._w]} # AugAssign(target=Name(id='H4X6IEiGm', ctx=Store()), op=Mult(), value=Set(elts=[Starred(value=Tuple(elts=[BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitOr(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=Mod(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))], ctx=Del()), ctx=Load()), List(elts=[Attribute(value=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Pow(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), attr='_w', ctx=Load())], ctx=Del())]))
vuZtohx0TazRYSNAdc9lnfp2OK_sb5eFgLMma1nI &= None # AugAssign(target=Name(id='vuZtohx0TazRYSNAdc9lnfp2OK_sb5eFgLMma1nI', ctx=Store()), op=BitAnd(), value=Constant(value=None))
Jo ^= 2.506 // +-(35 - (not Y()).R8('/', B=, C=set())) # AugAssign(target=Name(id='Jo', ctx=Store()), op=BitXor(), value=BinOp(left=Constant(value=2.506), op=FloorDiv(), right=UnaryOp(op=UAdd(), operand=UnaryOp(op=USub(), operand=BinOp(left=Constant(value=35), op=Sub(), right=Call(func=Attribute(value=UnaryOp(op=Not(), operand=Call(func=Name(id='Y', ctx=Load()), args=[], keywords=[])), attr='R8', ctx=Load()), args=[Constant(value='/')], keywords=[keyword(arg='B', value=BoolOp(op=Or(), values=[])), keyword(arg='C', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))]))))))
K <<= True << '{'                        # AugAssign(target=Name(id='K', ctx=Store()), op=LShift(), value=BinOp(left=Constant(value=True), op=LShift(), right=Constant(value="{")))
m ^= (Z >= set() <= set())               # AugAssign(target=Name(id='m', ctx=Store()), op=BitXor(), value=Compare(left=Compare(left=Name(id='Z', ctx=Load()), ops=[GtE(), LtE(), NotIn()], comparators=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), ops=[], comparators=[Name(id='A', ctx=Load()), Name(id='F8', ctx=Del())]))
MhvX_Zo &= xG                            # AugAssign(target=Name(id='MhvX_Zo', ctx=Store()), op=BitAnd(), value=Name(id='xG', ctx=Del()))
EcHz *= U                                # AugAssign(target=Name(id='EcHz', ctx=Store()), op=Mult(), value=Name(id='U', ctx=Load()))

<Expr>
                                         # Expr(value=BoolOp(op=Or(), values=[]))

*set()[:](set(), j=set(), d=set())[{}:set() ** set()].Z_ # Expr(value=Compare(left=Attribute(value=Expr(value=Starred(value=Subscript(value=Call(func=Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Load()), args=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], keywords=[keyword(arg='j', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), keyword(arg='d', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))]), slice=Slice(Dict(keys=[], values=[]), BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Pow(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), ctx=Del()), ctx=Del())), attr='Z_', ctx=Del()), ops=[], comparators=[Tuple(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Set(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), Name(id='XH', ctx=Del())], ctx=Load())]))
2.51                                     # Expr(value=Constant(value=2.51))
[not (*((set() + set()) / set() | set()).xZh1, [mtdnNaMG])] # Expr(value=List(elts=[UnaryOp(op=Not(), operand=Tuple(elts=[Starred(value=Attribute(value=BinOp(left=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Add(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=Div(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=BitOr(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), attr='xZh1', ctx=Load()), ctx=Load()), List(elts=[Name(id='mtdnNaMG', ctx=Load())], ctx=Load())], ctx=Del()))], ctx=Del()))
(set() << set()) % set() @ (set() & set()) * (set() // (set() >> set())) # Expr(value=BinOp(left=BinOp(left=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=LShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=Mod(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=MatMult(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitAnd(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), op=Mult(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=FloorDiv(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=RShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])))))
False                                    # Expr(value=Constant(value=False))
309                                      # Expr(value=Constant(value=309))
+~('' - A03WRPr(None[set()[:]():(-set())[:]:set() ^ set()])) # Expr(value=UnaryOp(op=UAdd(), operand=UnaryOp(op=Invert(), operand=BinOp(left=Constant(value=""), op=Sub(), right=Call(func=Name(id='A03WRPr', ctx=Load()), args=[Subscript(value=Constant(value=None), slice=Slice(Call(func=Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Del()), args=[], keywords=[]), Subscript(value=UnaryOp(op=USub(), operand=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), slice=Slice(), ctx=Load()), BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitXor(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), ctx=Del())], keywords=[])))))
set().v(set()[:](U=set()), n=set(), E=set(), C=set()) # Expr(value=Call(func=Attribute(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), attr='v', ctx=Load()), args=[Call(func=Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Load()), args=[], keywords=[keyword(arg='U', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))])], keywords=[keyword(arg='n', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), keyword(arg='E', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), keyword(arg='C', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))]))
True['4':] and 48[767:]                  # Expr(value=BoolOp(op=And(), values=[Subscript(value=Constant(value=True), slice=Slice(Constant(value='4')), ctx=Del()), Subscript(value=Constant(value=48), slice=Slice(Constant(value=767)), ctx=Load())]))

<For>
for [a.M, N] in (): # type: 
    for b in set(): # type: 
        return # For(target=List(elts=[Attribute(value=Name(id='a', ctx=Store()), attr='M', ctx=Store()), Name(id='N', ctx=Store())], ctx=Store()), iter=Tuple(elts=[], ctx=Load()), body=[For(target=Name(id='b', ctx=Store()), iter=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), body=[Return()], orelse=[], type_comment="")], orelse=[], type_comment='')
for f in : # type: DU
    pass
    continue
    break
else:
    return
    continue # For(target=Name(id='f', ctx=Store()), iter=BoolOp(op=And(), values=[]), body=[Pass(), Continue(), Break()], orelse=[Return(), Continue()], type_comment='DU')
for (R,) in p: # type: 
    return
    return
else:
    while set():
        return
    return
    return # For(target=Tuple(elts=[Name(id='R', ctx=Store())], ctx=Store()), iter=Name(id='p', ctx=Del()), body=[Return(), Return()], orelse=[While(test=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), body=[Return()], orelse=[]), Return(), Return()], type_comment="")
for *D in B(): # type: %W
    return
else:
    return
    return
    return # For(target=Starred(value=Name(id='D', ctx=Store()), ctx=Store()), iter=Call(func=Name(id='B', ctx=Load()), args=[], keywords=[]), body=[Return()], orelse=[Return(), Return(), Return()], type_comment="%W")
for g[set():set()] in set(): # type: Y
    return
else:
    return # For(target=Subscript(value=Name(id='g', ctx=Store()), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Store()), iter=Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), body=[Return()], orelse=[Return()], type_comment='Y')
for x[:] in set(): # type: Sk'
    if set():
        return
else:
    del  # For(target=Subscript(value=Name(id='x', ctx=Store()), slice=Slice(), ctx=Store()), iter=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), body=[If(test=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), body=[Return()], orelse=[])], orelse=[Delete(targets=[])], type_comment="Sk'")
for U in []: # type: :6n
    return
    return
else:
    return
    return # For(target=Name(id='U', ctx=Store()), iter=List(elts=[], ctx=Del()), body=[Return(), Return()], orelse=[Return(), Return()], type_comment=':6n')
for T[set():set():set()] in {}: # type: 
    assert set() # For(target=Subscript(value=Name(id='T', ctx=Store()), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Store()), iter=Dict(keys=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], values=[]), body=[Assert(test=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))], orelse=[], type_comment="")
for V[set():] in set() ** set(): # type: 6B
    return # For(target=Subscript(value=Name(id='V', ctx=Store()), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Store()), iter=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Pow(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), body=[Return()], orelse=[], type_comment="6B")
for H[:][:] in (~set())[set():]: # type: C
    return # For(target=Subscript(value=Subscript(value=Name(id='H', ctx=Store()), slice=Slice(), ctx=Store()), slice=Slice(), ctx=Store()), iter=Subscript(value=UnaryOp(op=Invert(), operand=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Load()), body=[Return()], orelse=[], type_comment='C')

<While>
while {}:
    while False:
        break
        return
        pass
    else:
        continue
    with : # type: 
        a = set()
        del  # While(test=Dict(keys=[Name(id='d4', ctx=Del())], values=[]), body=[While(test=Constant(value=False), body=[Break(), Return(), Pass()], orelse=[Continue()]), With(items=[], body=[Assign(targets=[Name(id='a', ctx=Store())], value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), Delete(targets=[])], type_comment='')], orelse=[])
while set() % set():
    if set():
        return
else:
    Y |= set()
    return # While(test=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mod(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), body=[If(test=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), body=[Return()], orelse=[])], orelse=[AugAssign(target=Name(id='Y', ctx=Store()), op=BitOr(), value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), Return()])
while not i():
    assert set().n
else:
    for g in set(): # type: 
        return
    return
    set()
    return # While(test=UnaryOp(op=Not(), operand=Call(func=Name(id='i', ctx=Load()), args=[], keywords=[])), body=[Assert(test=Attribute(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), attr='n', ctx=Del()))], orelse=[For(target=Name(id='g', ctx=Store()), iter=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), body=[Return()], orelse=[], type_comment=""), Return(), Expr(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), Return()])
while 
[set(), set()]:
    assert (())[[]:], *()
else:
    return set() or set() # While(test=Expr(value=List(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], ctx=Load())), body=[Assert(test=Subscript(value=Compare(left=Tuple(elts=[], ctx=Del()), ops=[], comparators=[]), slice=Slice(List(elts=[], ctx=Del())), ctx=Load()), msg=Starred(value=Tuple(elts=[], ctx=Load()), ctx=Load()))], orelse=[Return(value=BoolOp(op=Or(), values=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]))])
while {set()[:]}:
    F >>= set()
    U += set()
    return
else:
    R.p //= set().b # While(test=Set(elts=[Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Del())]), body=[AugAssign(target=Name(id='F', ctx=Store()), op=RShift(), value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), AugAssign(target=Name(id='U', ctx=Store()), op=Add(), value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), Return()], orelse=[AugAssign(target=Attribute(value=Name(id='R', ctx=Store()), attr='p', ctx=Store()), op=FloorDiv(), value=Attribute(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), attr='b', ctx=Load()))])
while *(set() ^ set()) ** set() & set() - set():
    m *= set()
    return
    return
else:
    V @= set() # While(test=Starred(value=BinOp(left=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitXor(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=Pow(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=BitAnd(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Sub(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), ctx=Del()), body=[AugAssign(target=Name(id='m', ctx=Store()), op=Mult(), value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), Return(), Return()], orelse=[AugAssign(target=Name(id='V', ctx=Store()), op=MatMult(), value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))])
while None:
    [N] <<= set()
    E += set()
    for G in set(): # type: 
        return
else:
    *(q, B)[set():-set()] /= +5 # While(test=Constant(value=None), body=[AugAssign(target=List(elts=[Name(id='N', ctx=Store())], ctx=Store()), op=LShift(), value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), AugAssign(target=Name(id='E', ctx=Store()), op=Add(), value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), For(target=Name(id='G', ctx=Store()), iter=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), body=[Return()], orelse=[], type_comment="")], orelse=[AugAssign(target=Subscript(value=Starred(value=Tuple(elts=[Name(id='q', ctx=Store()), Name(id='B', ctx=Store())], ctx=Store()), ctx=Store()), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), UnaryOp(op=USub(), operand=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), ctx=Store()), op=Div(), value=UnaryOp(op=UAdd(), operand=Constant(value=5)))])
while 97.1:
    _ = set()
    with :
        return
    return # While(test=Constant(value=97.1), body=[Assign(targets=[Name(id='_', ctx=Store())], value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), With(items=[], body=[Return()]), Return()], orelse=[])
while set()[:]():
    return
    return
else:
    Q = set()
    return
    return # While(test=Call(func=Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Del()), args=[], keywords=[]), body=[Return(), Return()], orelse=[Assign(targets=[Name(id='Q', ctx=Store())], value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), Return(), Return()])
while (~set())[set():v():set()]:
    w = set()
else:
    t = f() # While(test=Subscript(value=BoolOp(op=And(), values=[UnaryOp(op=Invert(), operand=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))]), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id='v', ctx=Del()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Del()), body=[Assign(targets=[Name(id='w', ctx=Store())], value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))], orelse=[Assign(targets=[Name(id='t', ctx=Store())], value=Call(func=Name(id='f', ctx=Del()), args=[], keywords=[]))])

<If>
if ()[set():(set()).B:
set()]:
    S = Q = {} # If(test=Subscript(value=BoolOp(op=Or(), values=[]), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Attribute(value=Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[]), attr='B', ctx=Del()), Expr(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), ctx=Load()), body=[Assign(targets=[Name(id='S', ctx=Store()), Name(id='Q', ctx=Store())], value=Dict(keys=[], values=[]))], orelse=[])
if C:
    T[:] /= *set()
    pass
    continue
else:
    return
    with :
        return
    return
    break # If(test=Name(id='C', ctx=Del()), body=[AugAssign(target=Subscript(value=Name(id='T', ctx=Store()), slice=Slice(), ctx=Store()), op=Div(), value=Starred(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ctx=Del())), Pass(), Continue()], orelse=[Return(), With(items=[], body=[Return()]), Return(), Break()])
if set() % ():
    set()[:]()
    return
    return
elif [set(), set()]:
    return
else:
    return
    return # If(test=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mod(), right=Tuple(elts=[], ctx=Load())), body=[Expr(value=Call(func=Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Del()), args=[], keywords=[])), Return(), Return()], orelse=[If(test=List(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], ctx=Del()), body=[Return()], orelse=[Return(), Return()])])
if not 189:
    del mBbQXt0
else:
    while ():
        return
        return
    else:
        return
        return
    assert {[set(), *set()]} # If(test=UnaryOp(op=Not(), operand=Constant(value=189)), body=[Delete(targets=[Name(id='mBbQXt0', ctx=Load())])], orelse=[While(test=Tuple(elts=[], ctx=Del()), body=[Return(), Return()], orelse=[Return(), Return()]), Assert(test=Set(elts=[List(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Starred(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ctx=Load())], ctx=Load())]))])
if (set() >> set()).n:
    for M in set(): # type: 
        return
    return
    return
else:
    return
    return
    return # If(test=Attribute(value=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=RShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), attr='n', ctx=Load()), body=[For(target=Name(id='M', ctx=Store()), iter=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), body=[Return()], orelse=[], type_comment=""), Return(), Return()], orelse=[Return(), Return(), Return()])
if (set() ^ set()) ** ((set() - set()) // (set() & set())):
    return
    return
else:
    assert set() @ set(), set() # If(test=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitXor(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=Pow(), right=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Sub(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=FloorDiv(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitAnd(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])))), body=[Return(), Return()], orelse=[Assert(test=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=MatMult(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), msg=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))])
if 3.54:
    D += set()
    return
    return
else:
    return set() * (set() << set()) # If(test=Constant(value=3.540), body=[AugAssign(target=Name(id='D', ctx=Store()), op=Add(), value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), Return(), Return()], orelse=[Return(value=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mult(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=LShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))))])
if True:
    p.E += set()
    z ^= set()
    return
else:
    *[c] |= +set() # If(test=Constant(value=True), body=[AugAssign(target=Attribute(value=Name(id='p', ctx=Store()), attr='E', ctx=Store()), op=Add(), value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), AugAssign(target=Name(id='z', ctx=Store()), op=BitXor(), value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), Return()], orelse=[AugAssign(target=Starred(value=List(elts=[Name(id='c', ctx=Store())], ctx=Store()), ctx=Store()), op=BitOr(), value=UnaryOp(op=UAdd(), operand=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])))])
if xd():
    with :
        return
    return
else:
    W = set() # type:  # If(test=Call(func=Name(id='xd', ctx=Del()), args=[], keywords=[]), body=[With(items=[], body=[Return()]), Return()], orelse=[Assign(targets=[Name(id='W', ctx=Store())], value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), type_comment='')])
if -~'WK':
    with : # type: 
        return
else:
    with :
        return
    return # If(test=UnaryOp(op=USub(), operand=UnaryOp(op=Invert(), operand=Constant(value="WK"))), body=[With(items=[], body=[Return()], type_comment='')], orelse=[With(items=[], body=[Return()]), Return()])

<Return>
return                                   # Return()
return *
not {}[None is not set():](j={set(), [], set()[:]}) # Return(value=Call(func=Subscript(value=Starred(value=Expr(value=UnaryOp(op=Not(), operand=Dict(keys=[], values=[]))), ctx=Del()), slice=Slice(Compare(left=Constant(value=None), ops=[IsNot()], comparators=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])])), ctx=Del()), args=[], keywords=[keyword(arg='j', value=BoolOp(op=Or(), values=[Set(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), List(elts=[], ctx=Del()), Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Load())])]))]))
return xv0d8WfnL4cOlRJbAHmUhPzruj3VoZIsx7SK6w2Yt9DEXkQN5_pyTiGBaFMCgq1e6 # Return(value=Name(id='xv0d8WfnL4cOlRJbAHmUhPzruj3VoZIsx7SK6w2Yt9DEXkQN5_pyTiGBaFMCgq1e6', ctx=Del()))
return ([set() | set(), set() * set(), *set() & (set() ^ set())].Y << (ZU,),).r # Return(value=Attribute(value=Tuple(elts=[BinOp(left=Attribute(value=List(elts=[BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitOr(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mult(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), Starred(value=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitAnd(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitXor(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), ctx=Load())], ctx=Load()), attr='Y', ctx=Load()), op=LShift(), right=Tuple(elts=[Name(id='ZU', ctx=Load())], ctx=Load()))], ctx=Del()), attr='r', ctx=Del()))
return (True + False) @ (set() // set()) / (set() % set() - (set() ** set() >> (set() >> set()))) # Return(value=BinOp(left=BinOp(left=BinOp(left=Constant(value=True), op=Add(), right=Constant(value=False)), op=MatMult(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=FloorDiv(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), op=Div(), right=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mod(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=Sub(), right=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Pow(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=RShift(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=RShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))))))
return 45                                # Return(value=Constant(value=45))
return +-None[K():s()][False:h():set()]  # Return(value=UnaryOp(op=UAdd(), operand=UnaryOp(op=USub(), operand=Subscript(value=Subscript(value=Constant(value=None), slice=Slice(Call(func=Name(id='K', ctx=Load()), args=[], keywords=[]), Call(func=Name(id='s', ctx=Del()), args=[], keywords=[])), ctx=Load()), slice=Slice(Constant(value=False), Call(func=Name(id='h', ctx=Del()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Load()))))
return 6073.8                            # Return(value=Constant(value=6073.8))
return ''                                # Return(value=Constant(value=''))
return ~(set()).f((set()), set(), O=set(), C=set(), W=set()) # Return(value=BoolOp(op=And(), values=[UnaryOp(op=Invert(), operand=Call(func=Attribute(value=Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[]), attr='f', ctx=Del()), args=[Compare(left=Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[]), ops=[], comparators=[]), Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[])], keywords=[keyword(arg='O', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), keyword(arg='C', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), keyword(arg='W', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))]))]))

<Delete>
del (not k) - 
set().eH1zDsLajRSO4ZKkTvgqi9PblMJ_pectBUY8xd7nwmF5VEurAGfQ06yoNhWCX2I3O92 # Delete(targets=[BinOp(left=UnaryOp(op=Not(), operand=Name(id='k', ctx=Del())), op=Sub(), right=Expr(value=Attribute(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), attr='eH1zDsLajRSO4ZKkTvgqi9PblMJ_pectBUY8xd7nwmF5VEurAGfQ06yoNhWCX2I3O92', ctx=Del())))])
del                                      # Delete(targets=[])
del *([set(), set(), ()] > set()[:]), {set()[:](set()): G}, (870,) # Delete(targets=[Starred(value=Compare(left=List(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Tuple(elts=[], ctx=Load())], ctx=Del()), ops=[Gt()], comparators=[Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Load()), Subscript(value=Name(id='R', ctx=Load()), slice=Slice(), ctx=Del())]), ctx=Load()), Dict(keys=[Call(func=Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Del()), args=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], keywords=[])], values=[Name(id='G', ctx=Load())]), Tuple(elts=[BoolOp(op=Or(), values=[Constant(value=870)])], ctx=Del())])
del *set() & set(), {set()}, [set(), set(), set() + set(), *set()].lg # Delete(targets=[Starred(value=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitAnd(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Del()), Set(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), Attribute(value=List(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Add(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), Starred(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ctx=Load())], ctx=Load()), attr='lg', ctx=Load())])
del set() * set() // (set() % set() ** set()) << (set() | set()) @ (set() ^ set()) # Delete(targets=[BinOp(left=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mult(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=FloorDiv(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mod(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Pow(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])))), op=LShift(), right=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitOr(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=MatMult(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitXor(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))))])
del None, (~(set() / set()[:]()))[set():S()][+-(set().M() >> set()[:]):], '', False # Delete(targets=[Constant(value=None), Subscript(value=Subscript(value=UnaryOp(op=Invert(), operand=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Div(), right=Call(func=Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Del()), args=[], keywords=[]))), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id='S', ctx=Del()), args=[], keywords=[])), ctx=Load()), slice=Slice(UnaryOp(op=UAdd(), operand=UnaryOp(op=USub(), operand=BinOp(left=Call(func=Attribute(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), attr='M', ctx=Load()), args=[], keywords=[]), op=RShift(), right=Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Load()))))), ctx=Del()), Constant(value=''), Constant(value=False)])
del 6652.9                               # Delete(targets=[Constant(value=6652.9)])
del True and 'XG2|!d'['':]['fnD':None:N()] # Delete(targets=[BoolOp(op=And(), values=[Constant(value=True), Subscript(value=Subscript(value=Constant(value="XG2|!d"), slice=Slice(Constant(value="")), ctx=Load()), slice=Slice(Constant(value="fnD"), Constant(value=None), Compare(left=Call(func=Name(id='N', ctx=Load()), args=[], keywords=[]), ops=[], comparators=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])])), ctx=Load())])])
del Q(u=set()), v(set(), V=set(), W=set(), y=set()) # Delete(targets=[Call(func=Name(id='Q', ctx=Load()), args=[], keywords=[keyword(arg='u', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))]), Call(func=Name(id='v', ctx=Del()), args=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], keywords=[keyword(arg='V', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), keyword(arg='W', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), keyword(arg='y', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))])])
del X, x                                 # Delete(targets=[Name(id='X', ctx=Load()), Name(id='x', ctx=Load())])

<Assert>
assert *[3.589].Ir                       # Assert(test=Attribute(value=Starred(value=List(elts=[Constant(value=3.589)], ctx=Del()), ctx=Load()), attr='Ir', ctx=Load()))
assert 
, eSQD3uB78giRhc2nmLxoUVYEs_pvHKXefqP6Rxd((set(),), fI0dA5OCbjaNZJ4wyltTWz19FGkM) # Assert(test=Expr(value=BoolOp(op=And(), values=[])), msg=Call(func=Name(id='eSQD3uB78giRhc2nmLxoUVYEs_pvHKXefqP6Rxd', ctx=Del()), args=[Tuple(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], ctx=Load()), Name(id='fI0dA5OCbjaNZJ4wyltTWz19FGkM', ctx=Load())], keywords=[]))
assert {set()[:]: set()}, (()) << (not set())[set():] # Assert(test=Dict(keys=[Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Del()), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Attribute(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), attr='_', ctx=Del())], values=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), msg=BinOp(left=Compare(left=Tuple(elts=[], ctx=Del()), ops=[], comparators=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), op=LShift(), right=Subscript(value=UnaryOp(op=Not(), operand=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Load())))
assert *{set() + set(), set() % set()}, *[set() // set() >> set(), (set() & set()) - (set() | set())] # Assert(test=Starred(value=Set(elts=[BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Add(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mod(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))]), ctx=Del()), msg=Starred(value=List(elts=[BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=FloorDiv(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=RShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitAnd(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=Sub(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitOr(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])))], ctx=Load()), ctx=Del()))
assert True / 25 @ (None ^ '=')          # Assert(test=BinOp(left=BinOp(left=Constant(value=True), op=Div(), right=Constant(value=25)), op=MatMult(), right=BinOp(left=Constant(value=None), op=BitXor(), right=Constant(value="="))))
assert +~set(), -set()[set():set()][set()[:]():set():set()] # Assert(test=UnaryOp(op=UAdd(), operand=UnaryOp(op=Invert(), operand=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), msg=UnaryOp(op=USub(), operand=Subscript(value=Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Load()), slice=Slice(Call(func=Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Del()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), ctx=Load())))
assert () ** (set() // ()), p() * ()     # Assert(test=BinOp(left=BoolOp(op=Or(), values=[]), op=Pow(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=FloorDiv(), right=BoolOp(op=And(), values=[]))), msg=BinOp(left=Call(func=Name(id='p', ctx=Load()), args=[], keywords=[]), op=Mult(), right=BoolOp(op=Or(), values=[])))
assert False .k(12 == Lq <= xq, '', hQ=44) # Assert(test=Call(func=Attribute(value=Constant(value=False), attr='k', ctx=Load()), args=[Compare(left=Constant(value=12), ops=[Eq(), LtE()], comparators=[Name(id='Lq', ctx=Load()), Name(id='xq', ctx=Load())]), Constant(value='')], keywords=[keyword(arg='hQ', value=Constant(value=44))]))
assert Mg, s                             # Assert(test=Name(id='Mg', ctx=Del()), msg=Name(id='s', ctx=Del()))
assert a, cE                             # Assert(test=Name(id='a', ctx=Load()), msg=Name(id='cE', ctx=Del()))

<Pass>
pass                                     # Pass()
pass                                     # Pass()
pass                                     # Pass()
pass                                     # Pass()
pass                                     # Pass()
pass                                     # Pass()
pass                                     # Pass()
pass                                     # Pass()
pass                                     # Pass()
pass                                     # Pass()

<Break>
break                                    # Break()
break                                    # Break()
break                                    # Break()
break                                    # Break()
break                                    # Break()
break                                    # Break()
break                                    # Break()
break                                    # Break()
break                                    # Break()
break                                    # Break()

<Continue>
continue                                 # Continue()
continue                                 # Continue()
continue                                 # Continue()
continue                                 # Continue()
continue                                 # Continue()
continue                                 # Continue()
continue                                 # Continue()
continue                                 # Continue()
continue                                 # Continue()
continue                                 # Continue()

<With>
with not *C, set() as (Y, U):
    break
    t = s(V=set()) # type: 7 # With(items=[withitem(context_expr=UnaryOp(op=Not(), operand=Starred(value=Name(id='C', ctx=Load()), ctx=Del()))), withitem(context_expr=Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[]), optional_vars=Tuple(elts=[Name(id='Y', ctx=Store()), Name(id='U', ctx=Store())], ctx=Store()))], body=[Break(), Assign(targets=[Name(id='t', ctx=Store())], value=Call(func=Name(id='s', ctx=Del()), args=[], keywords=[keyword(arg='V', value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))]), type_comment='7')])
with : # type: 
    if 2:
        for g in set(): # type: 
            pass
        set()
        del 
        continue
        return
    else:
        assert set()
        return # With(items=[], body=[If(test=Constant(value=2), body=[For(target=Name(id='g', ctx=Store()), iter=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), body=[Pass()], orelse=[], type_comment=""), Expr(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), Delete(targets=[]), Continue(), Return()], orelse=[Assert(test=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), Return()])], type_comment="")
with set().W as _[:]: # type: O(
    while set()[:]:
        return
    else:
        return # With(items=[withitem(context_expr=Attribute(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), attr='W', ctx=Load()), optional_vars=Subscript(value=Name(id='_', ctx=Store()), slice=Slice(), ctx=Store()))], body=[While(test=Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Del()), body=[Return()], orelse=[Return()])], type_comment="O(")
with  as *T: # type: rYuH
    A.u *= ()  # With(items=[withitem(context_expr=BoolOp(op=Or(), values=[]), optional_vars=Starred(value=Name(id='T', ctx=Store()), ctx=Store()))], body=[AugAssign(target=Attribute(value=Name(id='A', ctx=Store()), attr='u', ctx=Store()), op=Mult(), value=Tuple(elts=[], ctx=Load()))], type_comment='rYuH')
with [set()] as [P]: # type: &IG
    with set(), set(): # type: 
        assert {}, set() # With(items=[withitem(context_expr=List(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], ctx=Del()), optional_vars=List(elts=[Name(id='P', ctx=Store())], ctx=Store()))], body=[With(items=[withitem(context_expr=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), withitem(context_expr=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))], body=[Assert(test=Dict(keys=[], values=[]), msg=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))], type_comment='')], type_comment="&IG")
with *set()[:] + set()[:]: # type: KT"
    return 
    {set()}.m # With(items=[withitem(context_expr=BinOp(left=Starred(value=Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Load()), ctx=Load()), op=Add(), right=Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Load())))], body=[Return(value=Expr(value=Attribute(value=Set(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])]), attr='m', ctx=Del())))], type_comment='KT"')
with [] as i[:]: # type: J-K+Z/
    n /= ()
    L **= set()
    return # With(items=[withitem(context_expr=List(elts=[], ctx=Load()), optional_vars=Subscript(value=Name(id='i', ctx=Store()), slice=Slice(), ctx=Store()))], body=[AugAssign(target=Name(id='n', ctx=Store()), op=Div(), value=Tuple(elts=[], ctx=Del())), AugAssign(target=Name(id='L', ctx=Store()), op=Pow(), value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), Return()], type_comment="J-K+Z/")
with set() & set() as E[:]: # type: :x<
    G //= set()
    return # With(items=[withitem(context_expr=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitAnd(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), optional_vars=Subscript(value=Name(id='E', ctx=Store()), slice=Slice(), ctx=Store()))], body=[AugAssign(target=Name(id='G', ctx=Store()), op=FloorDiv(), value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), Return()], type_comment=':x<')
with set() @ set(): # type: WN
    D %= set()
    return
    return
    return # With(items=[withitem(context_expr=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=MatMult(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])))], body=[AugAssign(target=Name(id='D', ctx=Store()), op=Mod(), value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), Return(), Return(), Return()], type_comment="WN")
with (set() | set()) - (set() << set()): # type: 6&f
    e[:] >>= None # With(items=[withitem(context_expr=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=BitOr(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=Sub(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=LShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))))], body=[AugAssign(target=Subscript(value=Name(id='e', ctx=Store()), slice=Slice(), ctx=Store()), op=RShift(), value=Constant(value=None))], type_comment='6&f')

Let us see if we can also parse code properly. Here is a sample:

with_tree = ast.parse("""
with open('foo.txt') as myfile:
    content = myfile.readlines()
    if content is not None:
        print(content)
""")
python_ast_stmts_grammar = convert_ebnf_grammar(PYTHON_AST_STMTS_GRAMMAR)
with_tree_str = ast.dump(with_tree.body[0])  # get the `With(...)` subtree
with_solver = ISLaSolver(python_ast_stmts_grammar)
assert with_solver.check(with_tree_str)

It seems our grammar can also parse non-trivial code properly. We are doing well!

Function Definitions

Now for function definitions. Not too many surprises here.

print(ast.dump(ast.parse("""
def f(a, b=1):
    pass
"""
), indent=4))
Module(
    body=[
        FunctionDef(
            name='f',
            args=arguments(
                posonlyargs=[],
                args=[
                    arg(arg='a'),
                    arg(arg='b')],
                kwonlyargs=[],
                kw_defaults=[],
                defaults=[
                    Constant(value=1)]),
            body=[
                Pass()],
            decorator_list=[])],
    type_ignores=[])
PYTHON_AST_DEFS_GRAMMAR: Grammar = extend_grammar(PYTHON_AST_STMTS_GRAMMAR, {
    '<stmt>': PYTHON_AST_STMTS_GRAMMAR['<stmt>'] + [ '<FunctionDef>' ],

    '<FunctionDef>': [
        'FunctionDef(name=<identifier>, args=<arguments>, body=<nonempty_stmt_list>, decorator_list=<expr_list><returns>?<type_comment>?)'
    ],
    '<arguments>': [
        'arguments(posonlyargs=<arg_list>, args=<arg_list><vararg>?, kwonlyargs=<arg_list>, kw_defaults=<expr_list><kwarg>?, defaults=<expr_list>)'
    ],

    '<arg_list>': [ '[<args>?]' ],
    '<args>': [ '<arg>', '<arg>, <arg>' ],
    '<arg>': [ 'arg(arg=<identifier>)' ],

    '<vararg>': [ ', vararg=<arg>' ],
    '<kwarg>': [ ', kwarg=<arg>' ],
    '<returns>': [ ', returns=<expr>' ],

    # FIXME: Not handled: AsyncFunctionDef, ClassDef
})

In Python 3.12 and later, function definitions also have a type_param field:

import sys
# do import this unconditionally
if sys.version_info >= (3, 12):
    PYTHON_AST_DEFS_GRAMMAR: Grammar = extend_grammar(PYTHON_AST_DEFS_GRAMMAR, {
    '<FunctionDef>': [
        'FunctionDef(name=<identifier>, args=<arguments>, body=<nonempty_stmt_list>, decorator_list=<expr_list><returns>?<type_comment>?<type_params>?)'
    ],
    '<type_params>': [
        ', type_params=<type_param_list>',
    ],
    '<type_param_list>': [ '[<type_param>?]' ],
    '<type_param>': [ '<TypeVar>', '<ParamSpec>', '<TypeVarTuple>' ],
    '<TypeVar>': [
        'TypeVar(name=<identifier>(, bound=<expr>)?)'
    ],
    '<ParamSpec>': [
        'ParamSpec(name=<identifier>)'
    ],
    '<TypeVarTuple>': [
        'TypeVarTuple(name=<identifier>)'
    ]
    })
assert is_valid_grammar(PYTHON_AST_DEFS_GRAMMAR)
for elt in [ '<arguments>', '<FunctionDef>' ]:
    print(elt)
    test_samples(PYTHON_AST_DEFS_GRAMMAR, start_symbol=elt)
    print()
<arguments>
f, /, xXK31GmhN, dCoZpr                  # arguments(posonlyargs=[arg(arg='f')], args=[arg(arg='xXK31GmhN'), arg(arg='dCoZpr')], kwonlyargs=[], kw_defaults=[BinOp(left=Name(id='P4PQ97yLj', ctx=Del()), op=Mod(), right=Starred(value=BoolOp(op=And(), values=[Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Del()), Constant(value=True)]), ctx=Load()))], defaults=[])
M, /, *L, b=set(), **ccU                 # arguments(posonlyargs=[arg(arg='M')], args=[], vararg=arg(arg='L'), kwonlyargs=[arg(arg='b'), arg(arg='D')], kw_defaults=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], kwarg=arg(arg='ccU'), defaults=[])
O, p, /, N, R, *l, **F                   # arguments(posonlyargs=[arg(arg='O'), arg(arg='p')], args=[arg(arg='N'), arg(arg='R')], vararg=arg(arg='l'), kwonlyargs=[arg(arg='S')], kw_defaults=[], kwarg=arg(arg='F'), defaults=[])
W, i=(), /, n={}, *B, I=[set().H], **T   # arguments(posonlyargs=[arg(arg='W'), arg(arg='i')], args=[arg(arg='n')], vararg=arg(arg='B'), kwonlyargs=[arg(arg='I')], kw_defaults=[List(elts=[Attribute(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), attr='H', ctx=Del())], ctx=Del())], kwarg=arg(arg='T'), defaults=[Tuple(elts=[], ctx=Del()), Dict(keys=[], values=[])])
J, /, o, a=set(), *C, X=set(), **U       # arguments(posonlyargs=[arg(arg='J')], args=[arg(arg='o'), arg(arg='a')], vararg=arg(arg='C'), kwonlyargs=[arg(arg='X'), arg(arg='Q')], kw_defaults=[Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[])], kwarg=arg(arg='U'), defaults=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])])
E=set(), Z=set(), *Y                     # arguments(posonlyargs=[], args=[arg(arg='E'), arg(arg='Z')], vararg=arg(arg='Y'), kwonlyargs=[arg(arg='t')], kw_defaults=[], defaults=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])])
y, h, /, *j, s=set()                     # arguments(posonlyargs=[arg(arg='y'), arg(arg='h')], args=[], vararg=arg(arg='j'), kwonlyargs=[arg(arg='s')], kw_defaults=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], defaults=[])
V, /, m=+set(), *, v=set(), **G          # arguments(posonlyargs=[arg(arg='V')], args=[arg(arg='m')], kwonlyargs=[arg(arg='v'), arg(arg='r')], kw_defaults=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], kwarg=arg(arg='G'), defaults=[UnaryOp(op=UAdd(), operand=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))])
w, /, u=
set(), *A, K=set(), **_         # arguments(posonlyargs=[arg(arg='w')], args=[arg(arg='u')], vararg=arg(arg='A'), kwonlyargs=[arg(arg='K')], kw_defaults=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], kwarg=arg(arg='_'), defaults=[Expr(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))])
k=set(), *g, **q                         # arguments(posonlyargs=[], args=[arg(arg='k')], vararg=arg(arg='g'), kwonlyargs=[], kw_defaults=[Call(func=Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Load()), args=[], keywords=[])], kwarg=arg(arg='q'), defaults=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), Starred(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ctx=Del())])

<FunctionDef>
def s():
    return                      # FunctionDef(name='s', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Return()], decorator_list=[])
def J() -> set():
    continue           # FunctionDef(name='J', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Continue()], decorator_list=[], returns=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))
@set()
def P() -> set(): # type: 
    break # FunctionDef(name='P', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Break()], decorator_list=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], returns=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), type_comment="")
@set()
def C() -> set(): # type: 
    pass # FunctionDef(name='C', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Pass()], decorator_list=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], returns=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), type_comment='')
def R() -> set(): # type: 
    return
    return # FunctionDef(name='R', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Return(), Return()], decorator_list=[], returns=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), type_comment="")
def o() -> set(): # type: 
    return    # FunctionDef(name='o', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Return()], decorator_list=[], returns=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), type_comment='')
def c():
    return                      # FunctionDef(name='c', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Return()], decorator_list=[])
def S():
    return                      # FunctionDef(name='S', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Return()], decorator_list=[])
def e():
    return                      # FunctionDef(name='e', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Return()], decorator_list=[])
def f() -> set().d: # type: 
    return  # FunctionDef(name='f', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Return()], decorator_list=[], returns=Attribute(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), attr='d', ctx=Load()), type_comment="")
Modules

We close with modules – sequences of definitions. After all the other definitions, this is now fairly straightforward.

PYTHON_AST_MODULE_GRAMMAR: Grammar = extend_grammar(PYTHON_AST_DEFS_GRAMMAR, {
    '<start>': [ '<mod>' ],
    '<mod>': [ '<Module>' ],
    '<Module>': [ 'Module(body=<nonempty_stmt_list>, type_ignores=<type_ignore_list>)'],

    '<type_ignore_list>': [ '[<type_ignores>?]' ],
    '<type_ignores>': [ '<type_ignore>', '<type_ignore>, <type_ignore>' ],
    '<type_ignore>': [ 'TypeIgnore(lineno=<integer>, tag=<string>)' ],
})
assert is_valid_grammar(PYTHON_AST_MODULE_GRAMMAR)
for elt in [ '<Module>' ]:
    print(elt)
    test_samples(PYTHON_AST_MODULE_GRAMMAR, start_symbol=elt)
    print()
<Module>
pass                                     # Module(body=[Pass()], type_ignores=[TypeIgnore(lineno=427, tag="5"), TypeIgnore(lineno=5, tag='M')])
for H in {}: # type: ,
    return
with set(): # type: .
    continue # Module(body=[For(target=Name(id='H', ctx=Store()), iter=Dict(keys=[], values=[]), body=[Return()], orelse=[], type_comment=','), With(items=[withitem(context_expr=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))], body=[Continue()], type_comment=".")], type_ignores=[])
assert (set()[:](), p).u4UqXZs1[:]       # Module(body=[Assert(test=Subscript(value=Attribute(value=Tuple(elts=[Call(func=Subscript(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), slice=Slice(), ctx=Del()), args=[], keywords=[]), Name(id='p', ctx=Load())], ctx=Load()), attr='u4UqXZs1', ctx=Load()), slice=Slice(), ctx=Load()))], type_ignores=[TypeIgnore(lineno=348, tag='>. #sju8dCaHzS/')])
if [set()]:
    K = set()
else:
    del  # Module(body=[If(test=List(elts=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], ctx=Del()), body=[Assign(targets=[Name(id='K', ctx=Store())], value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))], orelse=[Delete(targets=[])])], type_ignores=[TypeIgnore(lineno=90, tag="=cQa"), TypeIgnore(lineno=21, tag="q[")])
(O.J, *T, [k]) ^= -(set())               # Module(body=[AugAssign(target=Tuple(elts=[Attribute(value=Name(id='O', ctx=Store()), attr='J', ctx=Store()), Starred(value=Name(id='T', ctx=Store()), ctx=Store()), List(elts=[Name(id='k', ctx=Store())], ctx=Store())], ctx=Store()), op=BitXor(), value=UnaryOp(op=USub(), operand=BoolOp(op=And(), values=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])])))], type_ignores=[TypeIgnore(lineno=59, tag="Ow"), TypeIgnore(lineno=63, tag="J<7d{VPe")])
def P() -> set(): # type: 
    break     # Module(body=[FunctionDef(name='P', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Break()], decorator_list=[], returns=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), type_comment='')], type_ignores=[TypeIgnore(lineno=6, tag=''), TypeIgnore(lineno=86, tag='')])
while set() % *():

    False
else:
    set().o
    assert h, set() # Module(body=[While(test=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mod(), right=Starred(value=Tuple(elts=[], ctx=Del()), ctx=Load())), body=[Expr(value=Expr(value=Constant(value=False)))], orelse=[Expr(value=Compare(left=Attribute(value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), attr='o', ctx=Del()), ops=[], comparators=[])), Assert(test=Name(id='h', ctx=Del()), msg=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))])], type_ignores=[TypeIgnore(lineno=1966, tag='r0V2:~Jg][Z9{e')])
return [*set() ** set() / (set() << set())]
return {set() // set()} # Module(body=[Return(value=List(elts=[Starred(value=BinOp(left=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Pow(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), op=Div(), right=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=LShift(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), ctx=Del())], ctx=Load())), Return(value=Set(elts=[BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=FloorDiv(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))]))], type_ignores=[TypeIgnore(lineno=75, tag="?`9pEA!RkSL~>")])
W += set() * set()
i[:] -= set()
x >>= set() # Module(body=[AugAssign(target=Name(id='W', ctx=Store()), op=Add(), value=BinOp(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), op=Mult(), right=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), AugAssign(target=Subscript(value=Name(id='i', ctx=Store()), slice=Slice(), ctx=Store()), op=Sub(), value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), AugAssign(target=Name(id='x', ctx=Store()), op=RShift(), value=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))], type_ignores=[TypeIgnore(lineno=2, tag="Gv")])
M[:][~set():] &= 387.68                  # Module(body=[AugAssign(target=Subscript(value=Subscript(value=Name(id='M', ctx=Store()), slice=Slice(), ctx=Store()), slice=Slice(UnaryOp(op=Invert(), operand=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]))), ctx=Store()), op=BitAnd(), value=Constant(value=387.68))], type_ignores=[TypeIgnore(lineno=7, tag='D"(pxn<_!-B')])

At this point, we have covered (almost) all AST elements of Python. There would be a few more Python elements to consider (marked as FIXME, above), but we'll leave these to the reader. Let us define PYTHON_AST_GRAMMAR as the official grammar coming out of this chapter.

PYTHON_AST_GRAMMAR = PYTHON_AST_MODULE_GRAMMAR
python_ast_grammar = convert_ebnf_grammar(PYTHON_AST_GRAMMAR)

Here are a few (very weird) examples of Python functions we can produce. All of these are valid, but only syntactically – very few of the code samples produced this way will actually result in something meaningful.

for elt in [ '<FunctionDef>' ]:
    print(elt)
    test_samples(PYTHON_AST_GRAMMAR, start_symbol=elt)
    print()
<FunctionDef>
def T(): # type: 
    break              # FunctionDef(name='T', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Break()], decorator_list=[], type_comment='')
@set()
def zv() -> ():
    pass          # FunctionDef(name='zv', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Pass()], decorator_list=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], returns=Tuple(elts=[], ctx=Del()))
def j():
    continue                    # FunctionDef(name='j', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Continue()], decorator_list=[])
def L():
    return                      # FunctionDef(name='L', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Return()], decorator_list=[])
@set()
def A() -> set(): # type: 
    return
    return # FunctionDef(name='A', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Return(), Return()], decorator_list=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], returns=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), type_comment="")
@set()
def X() -> 
set(): # type: 
    del 
    while set():
        return # FunctionDef(name='X', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Delete(targets=[]), While(test=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), body=[Return()], orelse=[])], decorator_list=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], returns=Expr(value=Compare(left=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), ops=[], comparators=[])), type_comment='')
def e():
    return                      # FunctionDef(name='e', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Return()], decorator_list=[])
@set()
def CQ(): # type: 
    return     # FunctionDef(name='CQ', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Return()], decorator_list=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])], type_comment="")
def N():
    return                      # FunctionDef(name='N', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Return()], decorator_list=[])
def I() -> set(): # type: 
    return    # FunctionDef(name='I', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Return()], decorator_list=[], returns=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), type_comment='')

A Class for Fuzzing Python

For convenience, let us introduce a class PythonFuzzer that makes use of the above grammar in order to produce Python code. This will be fairly easy to use.

class PythonFuzzer(ISLaSolver):
    """Produce Python code."""

    def __init__(self,
                 start_symbol: Optional[str] = None, *,
                 grammar: Optional[Grammar] = None,
                 constraint: Optional[str] =None,
                 **kw_params) -> None:
        """Produce Python code. Parameters are:

        * `start_symbol`: The grammatical entity to be generated (default: `<FunctionDef>`)
        * `grammar`: The EBNF grammar to be used (default: `PYTHON__AST_GRAMMAR`); and
        * `constraint` an ISLa constraint (if any).

        Additional keyword parameters are passed to the `ISLaSolver` superclass.
        """
        if start_symbol is None:
            start_symbol = '<FunctionDef>'
        if grammar is None:
            grammar = PYTHON_AST_GRAMMAR
        assert start_symbol in grammar

        g = convert_ebnf_grammar(grammar)
        if constraint is None:
            super().__init__(g, start_symbol=start_symbol, **kw_params)
        else:
            super().__init__(g, constraint, start_symbol=start_symbol, **kw_params)

    def fuzz(self) -> str:
        """Produce a Python code string."""
        abstract_syntax_tree = eval(str(self.solve()))
        ast.fix_missing_locations(abstract_syntax_tree)
        return ast.unparse(abstract_syntax_tree)

By default, the PythonFuzzer will produce a function definition - that is, a function header and body.

fuzzer = PythonFuzzer()
print(fuzzer.fuzz())
def b(): # type: 
    return

By passing a start symbol as parameter, you can have PythonFuzzer produce arbitrary Python elements:

fuzzer = PythonFuzzer('<While>')
print(fuzzer.fuzz())
while o():
    pass
else:
    continue
    for [m, f[:]] in [*set(), +set()]: # type: 
        set()
        with :
            return
        break

Here is a list of all possible start symbols:

sorted(list(PYTHON_AST_GRAMMAR.keys()))
['<Assert>',
 '<Assign>',
 '<Attribute>',
 '<AugAssign>',
 '<BinOp>',
 '<BoolOp>',
 '<Break>',
 '<Call>',
 '<Compare>',
 '<Constant>',
 '<Continue>',
 '<Delete>',
 '<Dict>',
 '<EmptySet>',
 '<Expr>',
 '<For>',
 '<FunctionDef>',
 '<If>',
 '<List>',
 '<Module>',
 '<Name>',
 '<Pass>',
 '<Return>',
 '<Set>',
 '<Slice>',
 '<Starred>',
 '<Subscript>',
 '<Tuple>',
 '<UnaryOp>',
 '<While>',
 '<With>',
 '<arg>',
 '<arg_list>',
 '<args>',
 '<arguments>',
 '<bool>',
 '<boolop>',
 '<cmpop>',
 '<cmpop_list>',
 '<cmpops>',
 '<digit>',
 '<digits>',
 '<expr>',
 '<expr_list>',
 '<exprs>',
 '<float>',
 '<func>',
 '<id>',
 '<id_continue>',
 '<id_start>',
 '<identifier>',
 '<integer>',
 '<keyword>',
 '<keyword_list>',
 '<keywords>',
 '<kwarg>',
 '<lhs_Attribute>',
 '<lhs_List>',
 '<lhs_Name>',
 '<lhs_Starred>',
 '<lhs_Subscript>',
 '<lhs_Tuple>',
 '<lhs_expr>',
 '<lhs_exprs>',
 '<literal>',
 '<mod>',
 '<none>',
 '<nonempty_expr_list>',
 '<nonempty_lhs_expr_list>',
 '<nonempty_stmt_list>',
 '<nonzerodigit>',
 '<not_double_quotes>',
 '<not_single_quotes>',
 '<operator>',
 '<returns>',
 '<start>',
 '<stmt>',
 '<stmt_list>',
 '<stmts>',
 '<string>',
 '<type_comment>',
 '<type_ignore>',
 '<type_ignore_list>',
 '<type_ignores>',
 '<unaryop>',
 '<vararg>',
 '<withitem>',
 '<withitem_list>',
 '<withitems>']

Customizing the Python Fuzzer

When fuzzing, you may be interested in specific properties of the produced output. How can we influence the code that PythonFuzzer produces? We explore two ways:

  • By adjusting the grammar to our needs
  • By adding constraints that customize the output for us.

Adjusting the Grammar

A simple way to adjust output generation is to adapt the grammar.

Let us assume you'd like to have function definitions without decorators. To achieve this, you can alter the rule that produces function definitions:

PYTHON_AST_GRAMMAR['<FunctionDef>']
['FunctionDef(name=<identifier>, args=<arguments>, body=<nonempty_stmt_list>, decorator_list=<expr_list><returns>?<type_comment>?)']

As any AST rule, it comes in abstract syntax, so we first have to identify the element we'd like to adjust. In our case, this is decorator_list.

Since decorator_list is a list, we can alter the rule to produce empty lists only. To create a new adapted grammar, we do not alter the existing PYTHON_AST_GRAMMAR. Instead, we use the extend_grammar() function to create a new grammar with a new, adapted rule for <FunctionDef>:

python_ast_grammar_without_decorators: Grammar = extend_grammar(PYTHON_AST_GRAMMAR,
{
    '<FunctionDef>' :
        ['FunctionDef(name=<identifier>, args=<arguments>, body=<nonempty_stmt_list>, decorator_list=[])']
})

However, we're not done yet. We also need to ensure that our grammar is valid, as any misspelled nonterminal identifier will result in problems during production. For this, we use the is_valid_grammar() function:

from ExpectError import ExpectError
with ExpectError():
    assert is_valid_grammar(python_ast_grammar_without_decorators)
'<returns>': defined, but not used. Consider applying trim_grammar() on the grammar
'<returns>': unreachable from <start>. Consider applying trim_grammar() on the grammar
Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_4461/3611578183.py", line 2, in <cell line: 1>
    assert is_valid_grammar(python_ast_grammar_without_decorators)
AssertionError (expected)

We see that with our change, our grammar has an orphaned rule: The <returns> rule is no longer used. This is because <returns> is part of the <type_annotation> we just have deleted. (<type_annotation> is still used when defining types for variables.)

To fix this, we need to delete the <returns> rule from our grammar. Fortunately, we have a function trim_grammar(), which deletes all orphaned rules:

python_ast_grammar_without_decorators = trim_grammar(python_ast_grammar_without_decorators)

With this, our grammar becomes valid...

assert is_valid_grammar(python_ast_grammar_without_decorators)

... and we can use it for fuzzing - now without decorators:

fuzzer = PythonFuzzer(grammar=python_ast_grammar_without_decorators)
print(fuzzer.fuzz())
def A():
    return
    continue

Adjusting the grammar is straightforward once you understood the grammar structure, but the AST grammar is complex; also, your changes and extensions tie you closely to the grammar structure. Carefully study how the individual rules are defined, above.

Using Constraints for Customizing

A more elegant alternative to altering the grammar is to make use of constraints that tune the grammar to your needs. Since PythonFuzzer is derived from ISLaSolver, we can pass a constraint argument constraining the grammar, as discussed in the chapter on fuzzing with constraints.

If we want to have a function definition with 10 characters in each identifier, we make use of an ISLa constraint:

fuzzer = PythonFuzzer(constraint='str.len(<id>) = 10')
print(fuzzer.fuzz())
def ADJ4l5Tlq_():
    return

We can also constrain individual children – say, the actual identifier of the function.

# Also works (the <identifier> has quotes)
fuzzer = PythonFuzzer(constraint='<FunctionDef>.<identifier> = "\'my_favorite_function\'"')
print(fuzzer.fuzz())
@'K'
def my_favorite_function(I, /, *, Dq={set(): set(), set(): set()}, **Q) -> ([][():] in set() >= O) // (*[q6[*set():()]],):
    return (not ai.w(set() or set())).voetXABV

Assume we want to test how the compiler handles large numbers. Let us define a constraint such that the function body (<nonempty_stmt_list>) contains at least one integer (<integer>) with a value of at least 1000:

fuzzer = PythonFuzzer(constraint=
"""
    exists <integer> x:
        (inside(x, <nonempty_stmt_list>) and str.to.int(x) > 1000)
""")
print(fuzzer.fuzz())
@{}
@
{set()[:]}
@R(() - set())
def Fz(yw, l=~(), /, *, **rhnC) -> [Ep4vmVd3RqNGD1s87UXjKgLH9VY].u0Pl6bOkaorfZeucYIyiE2SQ5TAtW_MBFJx: # type: 
    del 1008

Assume we'd like to test compilers with non-trivial functions. Here's how to define a constraint such that the function body has exactly three statements (<stmt>). Note that this can take more than a minute to resolve, but the result definitely is a nontrivial function.

# This will not work with ISLa 2
fuzzer = PythonFuzzer(constraint="""
    forall <FunctionDef> def: count(def, "<stmt>", "3")
""")
print(fuzzer.fuzz())
@[{(*RkVG[(set(), set(), set()):]): 'F'}]
def Qg(_, *dd, V=set() | A, **H) -> (set() // set() ^ set() @ set() & set()) >> (set() - set()) ** (set() / set()): # type: Q-(U'@<*=j4`tH1R%u&L53ImK~Z/8$NEWr!l].o,_Xgf?#JvM d)PTGk[b07a}C>yVzS^A2qDnB|+;sw:eO{ixYc6h9mpA4:&
    with *
    set().z.DS0(set(), set(), (), l=set()[:]): # type: k
        return
    pass

And finally, if we want the decorator list to be empty, as in our grammar-altering example, we can constrain the decorator list to be empty:

fuzzer = PythonFuzzer(constraint='<FunctionDef>..<expr_list> = "[]"')
print(fuzzer.fuzz())
def j1(UWFJ_yVsHQOzbYEIth6DiGUSfwnM0NXPma8BdLvRgTlp7u423qcx5k9CAZeKjrBoKM, *Ol, **T):
    break

Mutating Code

When producing code for compilers (or actually, producing inputs in general), it is often a good idea to not just create everything from scratch, but rather to mutate existing inputs. This way, one can achieve a better balance between common inputs (the ones to mutate) and uncommon inputs (the new parts added via mutation).

Parsing Inputs

To mutate inputs, we first need to be able to parse them. This is where a grammar is really put to test - can it really parse all possible code? This is why relying on an existing parser that is tried and proven (in our case the Python parser) and operating on an abstraction (in our case the AST) is really handy.

We already have seen how to parse code into an AST, using ast.parse():

def sum(a, b):    # A simple example
    the_sum = a + b
    return the_sum
sum_source = inspect.getsource(sum)
sum_tree = ast.parse(sum_source)
print(ast.unparse(sum_tree))
def sum(a, b):
    the_sum = a + b
    return the_sum
sum_str = ast.dump(sum_tree)
sum_str
"Module(body=[FunctionDef(name='sum', args=arguments(posonlyargs=[], args=[arg(arg='a'), arg(arg='b')], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Assign(targets=[Name(id='the_sum', ctx=Store())], value=BinOp(left=Name(id='a', ctx=Load()), op=Add(), right=Name(id='b', ctx=Load()))), Return(value=Name(id='the_sum', ctx=Load()))], decorator_list=[])], type_ignores=[])"

Our grammar is able to parse this (non_trivial) string:

solver = ISLaSolver(python_ast_grammar)
solver.check(sum_str)
True

To mutate the input, we first have to parse it into a derivation tree structure. This is (again) a tree representation of the code, but this time, using the elements of our grammar.

sum_tree = solver.parse(sum_str)

Let us inspect what a derivation tree looks like. Alas, the string representation is very long and not that useful:

len(repr(sum_tree))
8356
repr(sum_tree)[:200]
"DerivationTree('<start>', (DerivationTree('<mod>', (DerivationTree('<Module>', (DerivationTree('Module(body=', (), id=490746), DerivationTree('<nonempty_stmt_list>', (DerivationTree('[', (), id=490744"

However, we can visualize the derivation tree:

from GrammarFuzzer import display_tree
display_tree(sum_tree)
0 <start> 1 <mod> 0->1 2 <Module> 1->2 3 Module(body= 2->3 4 <nonempty_stmt_list> 2->4 193 , type_ignores= 2->193 194 <type_ignore_list> 2->194 198 ) (41) 2->198 5 [ (91) 4->5 6 <stmts> 4->6 192 ] (93) 4->192 7 <stmt> 6->7 8 <FunctionDef> 7->8 9 FunctionDef(name= 8->9 10 <identifier> 8->10 23 , args= 8->23 24 <arguments> 8->24 75 , body= 8->75 76 <nonempty_stmt_list> 8->76 184 , decorator_list= 8->184 185 <expr_list> 8->185 189 <returns-1> 8->189 190 <type_comment-3> 8->190 191 ) (41) 8->191 11 ' (39) 10->11 12 <id> 10->12 22 ' (39) 10->22 13 <id_start> 12->13 15 <id_continue-1> 12->15 14 s (115) 13->14 16 <id_continue> 15->16 18 <id_continue-1> 15->18 17 u (117) 16->17 19 <id_continue> 18->19 21 <id_continue-1> 18->21 20 m (109) 19->20 25 arguments(posonlyargs= 24->25 26 <arg_list> 24->26 30 , args= 24->30 31 <arg_list> 24->31 57 <vararg-1> 24->57 58 , kwonlyargs= 24->58 59 <arg_list> 24->59 63 , kw_defaults= 24->63 64 <expr_list> 24->64 68 <kwarg-1> 24->68 69 , defaults= 24->69 70 <expr_list> 24->70 74 ) (41) 24->74 27 [ (91) 26->27 28 <args-1> 26->28 29 ] (93) 26->29 32 [ (91) 31->32 33 <args-1> 31->33 56 ] (93) 31->56 34 <args> 33->34 35 <arg> 34->35 45 , 34->45 46 <arg> 34->46 36 arg(arg= 35->36 37 <identifier> 35->37 44 ) (41) 35->44 38 ' (39) 37->38 39 <id> 37->39 43 ' (39) 37->43 40 <id_start> 39->40 42 <id_continue-1> 39->42 41 a (97) 40->41 47 arg(arg= 46->47 48 <identifier> 46->48 55 ) (41) 46->55 49 ' (39) 48->49 50 <id> 48->50 54 ' (39) 48->54 51 <id_start> 50->51 53 <id_continue-1> 50->53 52 b (98) 51->52 60 [ (91) 59->60 61 <args-1> 59->61 62 ] (93) 59->62 65 [ (91) 64->65 66 <exprs-1> 64->66 67 ] (93) 64->67 71 [ (91) 70->71 72 <exprs-1> 70->72 73 ] (93) 70->73 77 [ (91) 76->77 78 <stmts> 76->78 183 ] (93) 76->183 79 <stmt> 78->79 148 , 78->148 149 <stmts> 78->149 80 <Assign> 79->80 81 Assign(targets= 80->81 82 <nonempty_lhs_expr_list> 80->82 115 , value= 80->115 116 <expr> 80->116 146 <type_comment-1> 80->146 147 ) (41) 80->147 83 [ (91) 82->83 84 <lhs_exprs> 82->84 114 ] (93) 82->114 85 <lhs_expr> 84->85 86 <lhs_Name> 85->86 87 Name(id= 86->87 88 <identifier> 86->88 113 , ctx=Store()) 86->113 89 ' (39) 88->89 90 <id> 88->90 112 ' (39) 88->112 91 <id_start> 90->91 93 <id_continue-1> 90->93 92 t (116) 91->92 94 <id_continue> 93->94 96 <id_continue-1> 93->96 95 h (104) 94->95 97 <id_continue> 96->97 99 <id_continue-1> 96->99 98 e (101) 97->98 100 <id_continue> 99->100 102 <id_continue-1> 99->102 101 _ (95) 100->101 103 <id_continue> 102->103 105 <id_continue-1> 102->105 104 s (115) 103->104 106 <id_continue> 105->106 108 <id_continue-1> 105->108 107 u (117) 106->107 109 <id_continue> 108->109 111 <id_continue-1> 108->111 110 m (109) 109->110 117 <BinOp> 116->117 118 BinOp(left= 117->118 119 <expr> 117->119 130 , op= 117->130 131 <operator> 117->131 133 , right= 117->133 134 <expr> 117->134 145 ) (41) 117->145 120 <Name> 119->120 121 Name(id= 120->121 122 <identifier> 120->122 129 , ctx=Load()) 120->129 123 ' (39) 122->123 124 <id> 122->124 128 ' (39) 122->128 125 <id_start> 124->125 127 <id_continue-1> 124->127 126 a (97) 125->126 132 Add() 131->132 135 <Name> 134->135 136 Name(id= 135->136 137 <identifier> 135->137 144 , ctx=Load()) 135->144 138 ' (39) 137->138 139 <id> 137->139 143 ' (39) 137->143 140 <id_start> 139->140 142 <id_continue-1> 139->142 141 b (98) 140->141 150 <stmt> 149->150 151 <Return> 150->151 152 Return(value= 151->152 153 <expr> 151->153 182 ) (41) 151->182 154 <Name> 153->154 155 Name(id= 154->155 156 <identifier> 154->156 181 , ctx=Load()) 154->181 157 ' (39) 156->157 158 <id> 156->158 180 ' (39) 156->180 159 <id_start> 158->159 161 <id_continue-1> 158->161 160 t (116) 159->160 162 <id_continue> 161->162 164 <id_continue-1> 161->164 163 h (104) 162->163 165 <id_continue> 164->165 167 <id_continue-1> 164->167 166 e (101) 165->166 168 <id_continue> 167->168 170 <id_continue-1> 167->170 169 _ (95) 168->169 171 <id_continue> 170->171 173 <id_continue-1> 170->173 172 s (115) 171->172 174 <id_continue> 173->174 176 <id_continue-1> 173->176 175 u (117) 174->175 177 <id_continue> 176->177 179 <id_continue-1> 176->179 178 m (109) 177->178 186 [ (91) 185->186 187 <exprs-1> 185->187 188 ] (93) 185->188 195 [ (91) 194->195 196 <type_ignores-1> 194->196 197 ] (93) 194->197

We see that a derivation tree consists of nonterminal nodes whose children make up an expansion from the grammar. For instance, at the very top, we see that a <start> nonterminal expands into a <mod> nonterminal, which again expands into a <Module> nonterminal. This comes right from the grammar rules

python_ast_grammar['<start>']
['<mod>']

and

python_ast_grammar['<mod>']
['<Module>']

The child of <mod> is a <Module>, which expands into the nodes

  • (body=
  • <nonempty_stmt_list>
  • , type_ignores=
  • <type_ignore_list>
  • )

Here, nodes like (body= or , type_ignores= are called terminal nodes (because they have no more elements to expand). The nonterminals like <nonempty_stmt_list> get expanded further below – notably, <nonempty_stmt_list> expands into a <FunctionDef> node that represents the sum() definition.

Again, the structure exactly follows the <Module> definition in our grammar:

python_ast_grammar['<Module>']
['Module(body=<nonempty_stmt_list>, type_ignores=<type_ignore_list>)']

If we traverse the tree depth-first, left to right, and only collect the terminal symbols, we obtain the original string we parsed. Applying the str() function to the derivation tree gets us exactly that string:

str(sum_tree)
"Module(body=[FunctionDef(name='sum', args=arguments(posonlyargs=[], args=[arg(arg='a'), arg(arg='b')], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Assign(targets=[Name(id='the_sum', ctx=Store())], value=BinOp(left=Name(id='a', ctx=Load()), op=Add(), right=Name(id='b', ctx=Load()))), Return(value=Name(id='the_sum', ctx=Load()))], decorator_list=[])], type_ignores=[])"

And again, we can convert this string into an AST and thus obtain our original function:

sum_ast = ast.fix_missing_locations(eval(str(sum_tree)))
print(ast.unparse(sum_ast))
def sum(a, b):
    the_sum = a + b
    return the_sum

Mutating Inputs

With derivation trees, we can have a structured representation of our input. In our case, we already have that with ASTs, so why bother introducing a new one? The answer is simple: Derivation trees also allow us to synthesize new inputs, because we have a grammar that describes their structure.

Most notably, we can mutate inputs as follows:

  1. Parse the input into a derivation tree, as shown above.
  2. Randomly choose some node <symbol> in the derivation tree to be mutated.
  3. Use the grammar to produce a new expansion for <symbol>.
  4. Replace the children of <symbol> by the expansion just generated.
  5. Repeat the process as often as needed.

This is a decent programming task, and if you'd like a blueprint, have a look at the FragmentMutator in this tutorial on greybox fuzzing with grammars.

Fortunately, ISLa already provides us with functionality that does exactly this. The ISLaSolver.mutate() method takes an input and mutates it according to the rules in the grammar. The input to mutate can be given as a derivation tree, or as a string; its output is a derivation tree (which can again be converted into a string).

Let us apply mutate() on our sum() function. The min_mutations and max_mutations parameters define how many mutation steps should be performed; we set both to 1 in order to have exactly one mutation.

sum_mutated_tree = solver.mutate(sum_str, min_mutations=1, max_mutations=1)
sum_mutated_ast = ast.fix_missing_locations(eval(str(sum_mutated_tree)))
print(ast.unparse(sum_mutated_ast))
def sum(a, b):
    the_sum = a + b
    return iX

Toy with the above to see the effect of a mutation. Note if one of the top-level nodes (like <FunctionDef> or <Module>) is selected for mutation, then sum() will be replaced by something entirely different. Otherwise, though, the code will still be pretty similar to the original sum() code.

Of course, the more we increase the number of mutations, the more different the code will look like:

sum_mutated_tree = solver.mutate(sum_str, min_mutations=10, max_mutations=20)
sum_mutated_ast = ast.fix_missing_locations(eval(str(sum_mutated_tree)))
print(ast.unparse(sum_mutated_ast))
def suaGysF9c4Uww(b, b):
    ([p[:]].ESL, *a[:][{set(): ()}.L_([set()], set()[:] + (not set()), ''):][:]) = h = aheuj_suGm + nHiZXdkWQJw7AjoUBmIb_qxCOfVmRz2pYrK6n9gDMh0cE43PTNve15tl
    return t

By toying with the mutate() parameters, we can control how common and how uncommon our input should be.

How Effective is Mutation?

Does mutating existing code help us in finding bugs? Let us assume we have a buggy compiler that generates bad code for an expression of the form <elem> * (<elem> + <elem>). The code in has_distributive_law() checks an AST for the presence of this bug:

def has_distributive_law(tree) -> bool:
    for node in walk(tree):  # iterate over all nodes in `tree`
        # print(node)
        if isinstance(node, ast.BinOp):
            if isinstance(node.op, ast.Mult):
                if isinstance(node.right, ast.BinOp):
                    if isinstance(node.right.op, ast.Add):
                        return True

                if isinstance(node.left, ast.BinOp):
                    if isinstance(node.left.op, ast.Add):
                        return True

    return False

To understand how this works, a visualization of the AST comes in handy:

show_ast(ast.parse("1 + (2 * 3)"))
0 Expr 1 BinOp 0--1 2 Constant 1--2 4 Add 1--4 5 BinOp 1--5 3 1 2--3 6 Constant 5--6 8 Mult 5--8 9 Constant 5--9 7 2 6--7 10 3 9--10
has_distributive_law(ast.parse("1 * (2 + 3)"))
True
has_distributive_law(ast.parse("(1 + 2) * 3"))
True
has_distributive_law(ast.parse("1 + (2 * 3)"))
False
has_distributive_law(ast.parse("def f(a, b):\n    return a * (b + 10)"))
True

How many attempts does it take for each until we find a mutation that triggers the bug in has_distributive_law()? Let us write a function that computes this number.

def how_many_mutations(code: str) -> int:
    solver = ISLaSolver(python_ast_grammar)

    code_ast = ast.parse(code)
    code_ast = ast.fix_missing_locations(code_ast)
    code_ast_str = ast.dump(code_ast)
    code_derivation_tree = solver.parse(code_ast_str)
    mutations = 0
    mutated_code_ast = code_ast

    while not has_distributive_law(mutated_code_ast):
        mutations += 1
        if mutations % 100 == 0:
            print(f'{mutations}...', end='')

        mutated_code_str = str(solver.mutate(code_derivation_tree))
        mutated_code_ast = eval(mutated_code_str)
        # mutated_code_ast = ast.fix_missing_locations(mutated_code_ast)
        # print(ast.dump(mutated_code_ast))
        # print(ast.unparse(mutated_code_ast))

    return mutations

If we pass an input that already exhibits the bug, we do not need any mutation:

assert how_many_mutations('1 * (2 + 3)') == 0

However, the further we are away from the bug, the more mutations (and the more time) it takes to find it. Notably, mutating 2 + 2 until we have a distributive law still is much faster than mutating 2.

how_many_mutations('2 + 2')    # <-- Note: this can take a minute
65
how_many_mutations('2')  # <-- Note: this can take several minutes
91

We conclude that mutating existing code can indeed be helpful, especially if it is syntactically close to inputs that trigger bugs. If you want to have a good chance in finding bugs, focus on inputs that have triggered bugs before – sometimes a simple mutation of these already helps finding a new bug.

Evolutionary Fuzzing

One interesting application of mutating inputs is to use mutations for evolutionary fuzzing. The idea is to have a population of inputs, to apply mutations on them, and to check whether they improve on a particular goal (mostly code coverage). Those inputs that do improve are being retained ("survival of the fittest") as the next generation, and evolved further. By repeating this process often enough, we may obtain inputs that cover large parts of code and thus improve chances to uncover bugs.

Let us assume we have a buggy compiler that generates bad code for an expression of the form <elem> * (<elem> + <elem>). The function has_distributive_law(), above, checks an AST for the presence of this bug.

Our aim is to detect this bug via fuzzing. But if we simply generate random inputs from scratch, it may take a long time until we generate the exact copmbination of operators that triggers the bug.

Getting Coverage

To have our fuzzers guided by coverage, we first need to measure code coverage. We make use of the Coverage module from the Fuzzing Book, which is particularly easy to use. It simply uses a with clause to obtain coverage from the code in the with body. Here is how to obtain coverage for our has_distributive_law() code, above:

from Coverage import Coverage
mult_ast = ast.parse("1 * 2")
with Coverage() as cov:
    has_distributive_law(mult_ast)

The coverage() method tells us which lines in the code actually have been reached. This includes lines from has_distributive_law(), but also lines from other functions called.

cov.coverage()
{('_handle_fromlist', 1063),
 ('_handle_fromlist', 1064),
 ('_handle_fromlist', 1071),
 ('_handle_fromlist', 1075),
 ('_handle_fromlist', 1087),
 ('has_distributive_law', 2),
 ('has_distributive_law', 4),
 ('has_distributive_law', 5),
 ('has_distributive_law', 6),
 ('has_distributive_law', 10),
 ('has_distributive_law', 14),
 ('iter_child_nodes', 264),
 ('iter_child_nodes', 265),
 ('iter_child_nodes', 266),
 ('iter_child_nodes', 267),
 ('iter_child_nodes', 268),
 ('iter_child_nodes', 269),
 ('iter_child_nodes', 270),
 ('iter_fields', 252),
 ('iter_fields', 253),
 ('iter_fields', 254),
 ('walk', 378),
 ('walk', 379),
 ('walk', 380),
 ('walk', 381),
 ('walk', 382),
 ('walk', 383)}

Which are the lines executed? With a bit of code inspection, we can easily visualize the covered lines:

def show_coverage(cov, fun):
    fun_lines, fun_start = inspect.getsourcelines(fun)
    fun_name = fun.__name__
    coverage = cov.coverage()
    for line in range(len(fun_lines)):
        if (fun_name, line + fun_start) in coverage:
            print('# ', end='')  # covered lines
        else:
            print('  ', end='')  # uncovered lines
        print(line + fun_start, fun_lines[line], end='')
show_coverage(cov, has_distributive_law)
  1 def has_distributive_law(tree) -> bool:
# 2     for node in walk(tree):  # iterate over all nodes in `tree`
  3         # print(node)
# 4         if isinstance(node, ast.BinOp):
# 5             if isinstance(node.op, ast.Mult):
# 6                 if isinstance(node.right, ast.BinOp):
  7                     if isinstance(node.right.op, ast.Add):
  8                         return True
  9 
# 10                 if isinstance(node.left, ast.BinOp):
  11                     if isinstance(node.left.op, ast.Add):
  12                         return True
  13 
# 14     return False

In this listing, a # indicates that the code has been executed (covered). We see that our input "1 * 2" satisfies the conditions in Lines 4 and 5, but does not satisfy the conditions in later lines.

Fitness

Let us now use coverage as a fitness function to guide evolution. The higher the fitness (the coverage), the higher the chances of an input to be retained for further evolution. Our ast_fitness() function simply counts the number of lines covered in has_distributive_law().

def ast_fitness(code_ast) -> int:
    with Coverage() as cov:
        has_distributive_law(code_ast)
    lines = set()
    for (name, line) in cov.coverage():
        if name == has_distributive_law.__name__:
            lines.add(line)
    return len(lines)

Here is the fitness of a number of given inputs:

ast_fitness(ast.parse("1"))
3
ast_fitness(ast.parse("1 + 1"))
4
ast_fitness(ast.parse("1 * 2"))
6
ast_fitness(ast.parse("1 * (2 + 3)"))
6

Now, let's set up a fitness function that takes derivation trees. Essentially, our tree_fitness() function is based on the ast_fitness() function, above; however, we also add a small component 1 / len(code_str) to give extra fitness to shorter inputs. Otherwise, our inputs may grow and keep on growing, making mutations inefficient.

def tree_fitness(tree) -> float:
    code_str = str(tree)
    code_ast = ast.fix_missing_locations(eval(code_str))
    fitness = ast_fitness(code_ast)
    # print(ast.unparse(code_ast), f"\n=> Fitness = {fitness}\n")
    return fitness + 1 / len(code_str)
tree_fitness(sum_tree)
4.002666666666666

Evolving Inputs

Let us now make use of our fitness function to implement a simple evolutionary fuzzing algorithm. We start with evolution – that is, taking a population and adding offspring via mutations. Our initial population consists of a single candidate – in our case, sum_tree reflecting the sum() function, above.

def initial_population(tree):
    return [ (tree, tree_fitness(tree)) ]
sum_population = initial_population(sum_tree)
len(sum_population)
1

Our evolve() function adds two new children to each population member.

OFFSPRING = 2
def evolve(population, min_fitness=-1):
    solver = ISLaSolver(python_ast_grammar)

    for (candidate, _) in list(population):
        for i in range(OFFSPRING):
            child = solver.mutate(candidate, min_mutations=1, max_mutations=1)
            child_fitness = tree_fitness(child)
            if child_fitness > min_fitness:
                population.append((child, child_fitness))
    return population
sum_population = evolve(sum_population)
len(sum_population)
3

As we can evolve all these, too, we get an exponential growth.

sum_population = evolve(sum_population)
len(sum_population)
9
sum_population = evolve(sum_population)
len(sum_population)
27
sum_population = evolve(sum_population)
len(sum_population)
81
sum_population = evolve(sum_population)
len(sum_population)
243

Survival of the Fittest

No population can expand forever and still survive. Let us thus limit the population to a certain size.

POPULATION_SIZE = 100

The select() function implements survival of the fittest: It limits the population to at most POPULATION_SIZE elements, sorting them by their fitness (highest to lowest). Members with low fitness beyond POPULATION_SIZE do not survive.

def get_fitness(elem):
    (candidate, fitness) = elem
    return fitness

def select(population):
    population = sorted(population, key=get_fitness, reverse=True)
    population = population[:POPULATION_SIZE]
    return population

We can use the following call to trim our sum_population to the fittest members:

sum_population = select(sum_population)
len(sum_population)
100

Evolution

We now have everything in place:

  • We have a population (say, sum_population)
  • We can evolve the population (using evolve())
  • We can have only the fittest survive (using select())

Let us repeat this process over several generations. We track whenever we have found a new "best" candidate and log them. If we find a candidate that triggers the bug, we stop. Note that this may take a long time, and not necessarily yield a perfect result.

As common in search-based approaches, we stop and restart the search if we have not found a sufficient solution after a number of generations (here: GENERATIONS). Other than that, we keep searching until we have a solution.

GENERATIONS = 100  # Upper bound
trial = 1
found = False

while not found:
    sum_population = initial_population(sum_tree)
    prev_best_fitness = -1

    for generation in range(GENERATIONS):
        sum_population = evolve(sum_population, min_fitness=prev_best_fitness)
        sum_population = select(sum_population)
        best_candidate, best_fitness = sum_population[0]
        if best_fitness > prev_best_fitness:
            print(f"Generation {generation}: found new best candidate (fitness={best_fitness}):")
            best_ast = ast.fix_missing_locations(eval(str(best_candidate)))
            print(ast.unparse(best_ast))
            prev_best_fitness = best_fitness

            if has_distributive_law(best_ast):
                print("Done!")
                found = True
                break

    trial = trial + 1
    print(f"\n\nRestarting; trial #{trial}")
Generation 0: found new best candidate (fitness=4.002673796791444):
def sum(a, b):
    the_sum = a + b
    return the_su
Generation 1: found new best candidate (fitness=4.002695417789758):
def sum(a, b):
    thK = a + the_sum
    return b
Generation 5: found new best candidate (fitness=4.002770083102493):
def sum(q):
    the_sum = a + the_sum
    return b
Generation 8: found new best candidate (fitness=4.003039513677812):
def sum(q):
    the_sum = a + the_sum
    pass
Generation 9: found new best candidate (fitness=6.001290322580645):
def sum(a, b):
    for JQ in 
    {} * set(): # type: 
        the_sum = a + b
        return the_su
    u = *q = ~() # type: 
    continue
Generation 22: found new best candidate (fitness=6.001426533523538):
if Q():
    with :
        pass
    return
else:

    def sum(a, b):
        the_sum = a + b
        return the_sum
    while set() * set():
        continue
        break
Generation 26: found new best candidate (fitness=6.001432664756447):
if Q():
    with :
        pass
    return
else:

    def sum(a, b):
        the_ = a + b
        return the_sum
    while set() * set():
        continue
        break
Generation 28: found new best candidate (fitness=6.001438848920864):
if Q():
    with :
        pass
    return
else:

    def sum(a, b):
        the_ = a + b
        return the_
    while set() * set():
        continue
        break
Generation 45: found new best candidate (fitness=6.0014450867052025):
if Q():
    with :
        pass
    return
else:

    def sum(a, b):
        n = a + b
        return the_
    while set() * set():
        continue
        break
Generation 47: found new best candidate (fitness=6.001449275362319):
if Q():
    with :
        pass
    return
else:

    def sum(a, b):
        n = a + b
        return tS
    while set() * set():
        continue
        break
Generation 56: found new best candidate (fitness=6.001474926253687):
if Q():
    with :
        pass
    return
else:

    def sum(o):
        n = a + b
        return the_
    while set() * set():
        continue
        break
Generation 57: found new best candidate (fitness=6.001524390243903):
if Q():
    with :
        pass
    return
else:

    def sum(o):
        n = a + b
        return the_
    while set() * 
    Yy:
        continue
        break
Generation 72: found new best candidate (fitness=7.001300390117035):
if Q():
    with :
        pass
    return
else:

    def sum(o5):
        n = a + b
        return the_
    while set() * set() * set():
        continue
        break
Generation 79: found new best candidate (fitness=7.001302083333333):
if Q():
    with :
        pass
    return
else:

    def sum(o5):
        n = M + b
        return the_
    while set() * set() * set():
        continue
        break
Generation 89: found new best candidate (fitness=7.0013054830287205):
if Q():
    with :
        pass
    return
else:

    def sum(o5):
        n = M + b
        return iI
    while set() * set() * set():
        continue
        break
Generation 98: found new best candidate (fitness=7.001307189542484):
if Q():
    with :
        pass
    break
else:

    def sum(o5):
        n = M + b
        return iI
    while set() * set() * set():
        continue
        break

Restarting; trial #2
Generation 0: found new best candidate (fitness=4.002710027100271):
def sum(a, b):
    h = a + b
    return the_sum
Generation 7: found new best candidate (fitness=4.002739726027397):
def sum(a, b):
    h = a + b
    return the
Generation 9: found new best candidate (fitness=4.002949852507375):
def sum(a, b):
    the_sum = a + b
    return
Generation 13: found new best candidate (fitness=4.002994011976048):
def sum(a, b):
    tf = a + b
    return
Generation 16: found new best candidate (fitness=6.000926784059314):
def sum(a, b):
    the_sum = {}
    return
Generation 19: found new best candidate (fitness=6.000931966449208):
def sum(a, b):
    t = {}
    return
Generation 21: found new best candidate (fitness=6.001097694840834):
def sum(a, b):
    h = a + ly(b, rC=set()[set():set():set()], T_=set() * set())
    return the_sum
Generation 31: found new best candidate (fitness=6.001721170395869):
def sum(a, b):
    h = (
    set() // *()) * (a + b)
    return t5
Done!

Restarting; trial #3

Success! We found a piece of code that triggers the bug. Check for occurrences of the distributive law.

print(ast.unparse(best_ast))
def sum(a, b):
    h = (
    set() // *()) * (a + b)
    return t5
assert has_distributive_law(best_ast)

You may note that not all of the code is required to trigger the bug. We could run our evolutionary fuzzer a bit longer to see whether it can be further reduced, or use a dedicated input reduction technique such as Delta Debugging.

Chances of Evolutionary Fuzzing

Could the bug in distributive_law() have been found without evolutionary guidance - i.e., simply by applying one mutation to sum()?

When producing an expression (<expr>), we calculate how big the chances are to

  • Produce a binary operator, and
  • Produce a *, and
  • Produce another binary operator as one child, and
  • Produce a +

Let's do a few queries on our grammar to compute the chances.

assert '<BinOp>' in python_ast_grammar['<expr>']
len(python_ast_grammar['<expr>'])
15
assert 'Add()' in python_ast_grammar['<operator>']
assert 'Mult()' in python_ast_grammar['<operator>']
len(python_ast_grammar['<operator>'])
13
(len(python_ast_grammar['<expr>'])       # chances of choosing a `BinOp`
* len(python_ast_grammar['<operator>'])  # chances of choosing a `*`
* len(python_ast_grammar['<expr>'])      # chances of choosing a `BinOp` as a child
* len(python_ast_grammar['<operator>'])  # chances of choosing a `+`
/ 2)   # two chances - one for the left child, one for the right
19012.5

On average, it would take about 19000 (non-evolutionary) runs until we have an expression that triggers the distributive law. So it is definitely better to make use of additional information (say, coverage) in order to guide mutations towards a goal.

Lessons Learned

  • When creating and processing complex inputs such as program code,
    • try to rely on existing infrastructure to parse inputs into some abstract syntax, and then
    • have your grammars process that abstract syntax rather than the concrete syntax.
  • Specifically, program code is normally converted into abstract syntax trees before being compiled or interpreted, and you can (and should) make use of such conversions.
  • Once program code is turned into an AST, it is fairly easy to generate, mutate, and evolve despite its complexity.

Background

The seminal work on compiler testing is Csmith [Yang et al, 2011], a generator of C programs. Csmith has been used to thoroughly test compilers such as Clang or GCC; beyond producing code that is syntactically correct, it also aims at semantic correctness as well as avoiding undefined and unspecified behaviors. This is a must read for anyone in the field in compiler testing.

Creative Commons License The content of this project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The source code that is part of the content, as well as the source code used to format and display that content is licensed under the MIT License. Last change: 2024-01-17 22:45:32+01:00CiteImprint