Rendering the FST

This section is quite advanced and you will maybe never need to use what is in here. But if you want to process the whole rendered fst or part of it as a chunk, please read along since several helpers are provided.

Understanding core rendering

Baron renders the FST back into source code by following the instructions given by the nodes_rendering_order dictionary. It gives, for every FST node, the order in which the node components must be rendered and the nature of those components.

In [1]: from baron import nodes_rendering_order, parse

In [2]: from baron.helpers import show_node

In [3]: nodes_rendering_order["name"]
Out[3]: [('string', 'value', True)]

In [4]: show_node(parse("a_name")[0])
{
    "type": "name", 
    "value": "a_name"
}

In [5]: nodes_rendering_order["tuple"]
Out[5]: 
[('formatting', 'first_formatting', 'with_parenthesis'),
 ('constant', '(', 'with_parenthesis'),
 ('formatting', 'second_formatting', 'with_parenthesis'),
 ('list', 'value', True),
 ('formatting', 'third_formatting', 'with_parenthesis'),
 ('constant', ')', 'with_parenthesis'),
 ('formatting', 'fourth_formatting', 'with_parenthesis'),
 ('bool', 'with_parenthesis', False)]

In [6]: show_node(parse("(a_name,another_name,yet_another_name)")[0])
{
    "first_formatting": [], 
    "fourth_formatting": [], 
    "value": [
        {
            "type": "name", 
            "value": "a_name"
        }, 
        {
            "first_formatting": [], 
            "type": "comma", 
            "second_formatting": []
        }, 
        {
            "type": "name", 
            "value": "another_name"
        }, 
        {
            "first_formatting": [], 
            "type": "comma", 
            "second_formatting": []
        }, 
        {
            "type": "name", 
            "value": "yet_another_name"
        }
    ], 
    "second_formatting": [], 
    "third_formatting": [], 
    "with_parenthesis": true, 
    "type": "tuple"
}

In [7]: nodes_rendering_order["comma"]
Out[7]: 
[('formatting', 'first_formatting', True),
 ('constant', ',', True),
 ('formatting', 'second_formatting', True)]

For a “name” node, it is a list containing a unique component stored in a tuple but it can contain multiple ones like for a “tuple” node.

To render a node, you just need to render each element of the list, one by one, in the given order. As you can see, they are all formatted as a 3-tuple. The first column is the type which is one of the following:

In [8]: from baron.render import node_types

In [9]: node_types
Out[9]: {'bool', 'constant', 'formatting', 'key', 'list', 'node', 'string'}

With the exception of the “constant” node, the second column contains the key of the FST node which must be rendered. The first column explains how that key must be rendered. We’ll see the third column later.

  • A node node is one of the nodes in the nodes_rendering_order we just introduced, it is rendered by following the rules mentionned here. This is indeed a recursive definition.
  • A key node is a branch of the tree that contains another node (a python dictionary).
  • A string node is a leaf of the tree that contains a variable value, like the name of a function. former case, it is rendered by rendering its content.
  • A list node is like a key node but can contain 0, 1 or several other nodes stored in a python list. For example, Baron root node is a list node since a python program is a list of statements. It is rendered by rendering each of its elements in order.
  • A formatting node is similar in behaviour to a list node but contains only formatting nodes. This is basically where Baron distinguish itself from other ASTs.
  • A constant node is a leaf of the FST tree. The second column always contain a string which is outputted directly. Compared to a string node, the constant node is identical for every instance of the nodes (e.g. the left parenthesis character ( in a function call node or the def keyword of a function definition) while the string node’s value can change (e.g. the name of the function in a function definition node).
  • A bool node is a node used exclusively for conditional rendering. It’s exact use will be explained later on with the tuple’s third column but the main point for now is to know that they are never rendered.

Walkthrough

Let’s see all this is in action by rendering a “lambda” node. First, the root node is always a “list” node and since we are only parsing one statement, the root node contains our “lambda” node at index 0:

In [10]: fst = parse("lambda x, y = 1: x + y")

In [11]: fst[0]["type"]
Out[11]: 'lambda'

For reference, you can find the (long) FST produced by the lambda node at the end of this section.

Now, let’s see how to render a “lambda” node:

In [12]: nodes_rendering_order["lambda"]
Out[12]: 
[('constant', 'lambda', True),
 ('formatting', 'first_formatting', True),
 ('list', 'arguments', True),
 ('formatting', 'second_formatting', True),
 ('constant', ':', True),
 ('formatting', 'third_formatting', True),
 ('key', 'value', True)]

Okay, first the string constant “lambda”, then a first_formatting node which represents the space between the string “lambda” and the variable “x”.

In [13]: fst[0]["first_formatting"]
Out[13]: [{'type': 'space', 'value': ' '}]

The “first_formatting” contains a list whose unique element is a “space” node.

In [14]: fst[0]["first_formatting"][0]
Out[14]: {'type': 'space', 'value': ' '}

In [15]: nodes_rendering_order["space"]
Out[15]: [('string', 'value', True)]

Which in turn is rendered by printing the value of the string of the space node.

In [16]: fst[0]["first_formatting"][0]["value"]
Out[16]: ' '

So far we have outputted “lambda “. Tedious but exhaustive.

We have exhausted the “first_formatting” node so we go back up the tree. Next is the “list” node representing the arguments:

In [17]: fst[0]["arguments"]
Out[17]: 
[{'annotation': {},
  'annotation_first_formatting': [],
  'annotation_second_formatting': [],
  'first_formatting': [],
  'second_formatting': [],
  'target': {'type': 'name', 'value': 'x'},
  'type': 'def_argument',
  'value': {}},
 {'first_formatting': [],
  'second_formatting': [{'type': 'space', 'value': ' '}],
  'type': 'comma'},
 {'annotation': {},
  'annotation_first_formatting': [],
  'annotation_second_formatting': [],
  'first_formatting': [{'type': 'space', 'value': ' '}],
  'second_formatting': [{'type': 'space', 'value': ' '}],
  'target': {'type': 'name', 'value': 'y'},
  'type': 'def_argument',
  'value': {'section': 'number', 'type': 'int', 'value': '1'}}]

Rendering a “list” node is done one element at a time. First a “def_argument”, then a “comma” and again a “def_argument”.

In [18]: fst[0]["arguments"][0]
Out[18]: 
{'annotation': {},
 'annotation_first_formatting': [],
 'annotation_second_formatting': [],
 'first_formatting': [],
 'second_formatting': [],
 'target': {'type': 'name', 'value': 'x'},
 'type': 'def_argument',
 'value': {}}

In [19]: nodes_rendering_order["def_argument"]
Out[19]: 
[('key', 'target', True),
 ('formatting', 'annotation_first_formatting', 'annotation'),
 ('constant', ':', 'annotation'),
 ('formatting', 'annotation_second_formatting', 'annotation'),
 ('key', 'annotation', 'annotation'),
 ('formatting', 'first_formatting', 'value'),
 ('constant', '=', 'value'),
 ('formatting', 'second_formatting', 'value'),
 ('key', 'value', 'value')]

The first “def_argument” is rendered by first outputting the content of a name “string” node:

In [20]: fst[0]["arguments"][0]["name"]

KeyErrorTraceback (most recent call last)
<ipython-input-20-7e5f373e673d> in <module>()
----> 1 fst[0]["arguments"][0]["name"]

KeyError: 'name'

Now, we have outputted “lambda x”. At first glance we could say we should render the second element of the “def_argument” node but as we’ll see in the next section, it is not the case because of the third column of the tuple.

For reference, the FST of the lambda node:

In [21]: show_node(fst[0])
{
    "first_formatting": [
        {
            "type": "space", 
            "value": " "
        }
    ], 
    "value": {
        "first_formatting": [
            {
                "type": "space", 
                "value": " "
            }
        ], 
        "value": "+", 
        "second_formatting": [
            {
                "type": "space", 
                "value": " "
            }
        ], 
        "second": {
            "type": "name", 
            "value": "y"
        }, 
        "type": "binary_operator", 
        "first": {
            "type": "name", 
            "value": "x"
        }
    }, 
    "second_formatting": [], 
    "third_formatting": [
        {
            "type": "space", 
            "value": " "
        }
    ], 
    "arguments": [
        {
            "annotation_second_formatting": [], 
            "first_formatting": [], 
            "annotation_first_formatting": [], 
            "target": {
                "type": "name", 
                "value": "x"
            }, 
            "type": "def_argument", 
            "annotation": {}, 
            "value": {}, 
            "second_formatting": []
        }, 
        {
            "first_formatting": [], 
            "type": "comma", 
            "second_formatting": [
                {
                    "type": "space", 
                    "value": " "
                }
            ]
        }, 
        {
            "annotation_second_formatting": [], 
            "first_formatting": [
                {
                    "type": "space", 
                    "value": " "
                }
            ], 
            "annotation_first_formatting": [], 
            "target": {
                "type": "name", 
                "value": "y"
            }, 
            "type": "def_argument", 
            "annotation": {}, 
            "value": {
                "section": "number", 
                "type": "int", 
                "value": "1"
            }, 
            "second_formatting": [
                {
                    "type": "space", 
                    "value": " "
                }
            ]
        }
    ], 
    "type": "lambda"
}

Dependent rendering

Sometimes, some node elements must not be outputted. In our “def_argument” example, all but the first are conditional. They are only rendered if the FST’s “value” node exists and is not empty. Let’s compare the two “def_arguments” FST nodes:

In [22]: fst[0]["arguments"][0]
Out[22]: 
{'annotation': {},
 'annotation_first_formatting': [],
 'annotation_second_formatting': [],
 'first_formatting': [],
 'second_formatting': [],
 'target': {'type': 'name', 'value': 'x'},
 'type': 'def_argument',
 'value': {}}

In [23]: fst[0]["arguments"][2]
Out[23]: 
{'annotation': {},
 'annotation_first_formatting': [],
 'annotation_second_formatting': [],
 'first_formatting': [{'type': 'space', 'value': ' '}],
 'second_formatting': [{'type': 'space', 'value': ' '}],
 'target': {'type': 'name', 'value': 'y'},
 'type': 'def_argument',
 'value': {'section': 'number', 'type': 'int', 'value': '1'}}

In [24]: nodes_rendering_order[fst[0]["arguments"][2]["type"]]
Out[24]: 
[('key', 'target', True),
 ('formatting', 'annotation_first_formatting', 'annotation'),
 ('constant', ':', 'annotation'),
 ('formatting', 'annotation_second_formatting', 'annotation'),
 ('key', 'annotation', 'annotation'),
 ('formatting', 'first_formatting', 'value'),
 ('constant', '=', 'value'),
 ('formatting', 'second_formatting', 'value'),
 ('key', 'value', 'value')]

The “value” is empty for the former “def_argument” but not for the latter because it has a default value of “= 1”.

In [25]: from baron import dumps

In [26]: dumps(fst[0]["arguments"][0])
Out[26]: 'x'

In [27]: dumps(fst[0]["arguments"][2])
Out[27]: 'y = 1'

The rule here is that the third column of a node is one of:

  • True, it is always rendered;
  • False, it is never rendered;
  • A string, it is rendered conditionnally. It is not rendered if the key it references is either empty or False. It also must reference an existing key. In our example above, it references the existing “value” key which is empty in the first case and not empty in the second.

This is how “bool” nodes are never outputted: their third column is always False.

We will conclude here now that we have seen an example of every aspect of FST rendering. Understanding everything is not required to use Baron since several helpers like render, RenderWalker or dumps handle all the complexity under the hood.

Render Helper

Baron provides a render function helper which walks recursively the nodes_rendering_order dictionnary for you:

baron.render.render(node, strict=False)

Recipe to render a given FST node.

The FST is composed of branch nodes which are either lists or dicts and of leaf nodes which are strings. Branch nodes can have other list, dict or leaf nodes as childs.

To render a string, simply output it. To render a list, render each of its elements in order. To render a dict, you must follow the node’s entry in the nodes_rendering_order dictionary and its dependents constraints.

This function hides all this algorithmic complexity by returning a structured rendering recipe, whatever the type of node. But even better, you should subclass the RenderWalker which simplifies drastically working with the rendered FST.

The recipe is a list of steps, each step correspond to a child and is actually a 3-uple composed of the following fields:

  • key_type is a string determining the type of the child in the second field (item) of the tuple. It can be one of:
    • ‘constant’: the child is a string
    • ‘node’: the child is a dict
    • ‘key’: the child is an element of a dict
    • ‘list’: the child is a list
    • ‘formatting’: the child is a list specialized in formatting
  • item is the child itself: either a string, a dict or a list.
  • render_key gives the key used to access this child from the parent node. It’s a string if the node is a dict or a number if its a list.

Please note that “bool” key_types are never rendered, that’s why they are not shown here.

RenderWalker Helper

But even easier, Baron provides a walker class whose job is to walk the fst while rendering it and to call user-provided callbacks at each step:

class baron.render.RenderWalker(strict=False)

Inherit me and overload the methods you want.

When calling walk() on a FST node, this class will traverse all the node’s subtree by following the recipe given by the render function for the node and recursively for all its childs. At each recipe step, it will call methods that you can override to make a specific process.

For every “node”, “key”, “list”, “formatting” and “constant” childs, it will call the before method when going down the tree and the after method when going up. There are also specific before_[node,key,list,formatting,constant] and after_[node,key,list,formatting,constant] methods provided for convenience.

The latter are called on specific steps:

  • before_list: called before encountering a list of nodes
  • after_list: called after encountering a list of nodes
  • before_formatting: called before encountering a formatting list
  • after_formatting: called after encountering a formatting list
  • before_node: called before encountering a node
  • after_node: called after encountering a node
  • before_key: called before encountering a key type entry
  • after_key: called after encountering a key type entry
  • before_leaf: called before encountering a leaf of the FST (can be a constant (like “def” in a function definition) or an actual value like the value a name node)
  • after_leaf: called after encountering a leaf of the FST (can be a constant (like “def” in a function definition) or an actual value like the value a name node)

Every method has the same signature: (self, node, render_pos, render_key).

Internally, Baron uses the RenderWalker for multiple tasks like for the dumps function:

from baron.render import RenderWalker

def dumps(tree):
    return Dumper().dump(tree)

class Dumper(RenderWalker):
    def before_constant(self, constant, key):
        self.dump += constant

    def before_string(self, string, key):
        self.dump += string

    def dump(self, tree):
        self.dump = ''
        self.walk(tree)
        return self.dump

As you can see it is quite simple since it only needs the before_constant and the before_string methods with the same exact code.

PathWalker Helper

If while walking you need to know the current path of the node, then you should subclass PathWalker instead:

class baron.path.PathWalker(strict=False)

Gives the current path while walking the rendered tree

It adds an attribute “current_path” which is updated each time the walker takes a step.

Here is a succint example of what you should expect when using the PathWalker:

In [28]: from baron.path import PathWalker

In [29]: fst = parse("a = 1")

In [30]: class PathWalkerPrinter(PathWalker):
   ....:     def before(self, key_type, item, render_key):
   ....:         super(PathWalkerPrinter, self).before(key_type, item, render_key)
   ....:         print(self.current_path)
   ....: 

In [31]: def after(self, key_type, item, render_key):
   ....:         print(self.current_path)
   ....:         super(PathWalkerPrinter, self).after(key_type, item, render_key)
   ....: 

Like in the example, don’t forget to call the before and after methods of the parent class. Furthermore, you need to respect the order specified above, that is:

  • Calling super().before() should be done before your code using the self.path attribute.
  • Calling super().after() should be done after your code using the self.path attribute.