Codebase 101 – Command Line Parsing, part 1

The command line parsing of Taskwarrior is the most complex part of the code, but only because its reach extends into many areas of the codebase.  The parser itself is scheduled to be replaced in version 2.4, but this is mostly about the pre- and post-processing that must occur.  The process of handling command line arguments deserves a detailed explanation.  Let’s start with an example, and pick it apart.  Here is a deceptively easy one:

$ task +work list

There are only two elements of consequence here, which is the “+work” filter and the command “list”.  But even with this simple example, the processing is complex.  The first step is to identify the command, which in this case is “list”.  The “list” report is a read-only command, which means if there are any other arguments, they are all part of a search filter, which includes “+work”.  But there are hidden filter elements – the “list” report defines it’s own filter:

$ task show report.list.filter
Config Variable    Value
------------------ --------------
report.list.filter status:pending

With the “list” report filter, and the “+work” specified, the command line is now effectively:

$ task status:pending +work list

This filter has two components – the task has to be pending, and must also contain the ‘work’ tag.  These are combined with the default ‘AND’ operator and evaluated for all tasks.  The resultant set of tasks that match this filter are displayed according to the definition of the “list” report.

The point of this example is to show that there are things going on that add complexity even to the simplest query.  Now we will select a slightly more complex example, and go deeper into the processing:

$ task /he[a-z]p/ next
ID  Proj   A   Age Urg  Description
--- ------ --- --- ---- ----------------------------------
230 td.100 ✓    6d 9.62 Help is not working without --data
 2013-08-13 Started task

The intention is to find a task containing a regular expression and display using the next report format.  One result is displayed.  In order to see the processing, we rerun that command in debug mode, which generates much more information (I/O, terminal details, timing, performance).  We are interested in the command line parsing, so the command now includes an rc override.

$ task /he[a-z]p/ next rc.debug=1

The debug mode output is voluminous, so only the parsing-related output is shown here.  The first step is a high-level categorization of the command line arguments, needed because subsequent parsing depends on whether this is a read-only or write command.

parsing.1

Here you see that the categorization recognizes the program name, the command and the rc override.  The program name is of no consequence here, although it has a purpose in other contexts.

[ As an aside, if you create a symbolic link to ‘task’, and call it ‘cal’, this invokes ‘task calendar’ automatically.  Try it! ]

As ‘next’ is a read-only command (all reports are), the parser knows that if it removes ‘program’, ‘command’ and ‘override’, all that remains can be safely assumed to be filter, shown in white.  This is actually an over-simplification, but serves our purpose here.

The “rc.debug:1” override is another special case.  The parser must first scan the command line for these, and apply the override to the values loaded from the .taskrc file, because, as in this case, the overrides change behavior, so the behavioral change must occur before processing continues.

Now that the ‘next’ command is known to be read-only, the next step is to inject the report filter terms.  Below you see the terms for “status:pending” and a “limit:page”.

parsing.2

The “limit:page” is a special pseudo-attribute.  It is equivalent to filtering, but instead is removed and processed later when the output is being generated.  Let’s ignore pseudo-attributes.

The A3::categorize step (A3 is just a silly way of writing “Args” as an A followed by three characters) handles re-categorization.

This leaves two terms in the filter, and these are now extracted and tokenized.  They are also re-classified so that type information is known.  Here is the result of re-classification:

parsing.3

You can see here that the “status:pending” term has been identified as an attribute (status) that is being compared to a string value (pending).  Furthermore, the / / characters in the second term identify this as a pattern.

Taskwarrior evaluates filters as Boolean expressions, which is to say an expression that evaluates to either True or False, and only by evaluating to True does the task pass the filter and become part of the results displayed.  The next step is to expand the filter terms into expression terms, so that the expression evaluator can run it.  But first it needs to be converted into something recognizable as an expression.

parsing.4

Now we see that “status:pending”  has been expanded to three tokens, “status”, “=” and “pending”.  The “status” attribute has been identified as a DOM reference (name of a task attribute), the “=” is identified as a comparison operator, and the “pending” as a string literal.

The second term is also expanded to three tokens (because these are binary operators), namely “description”, a DOM reference, “~” which is a regular expression match operator, and the pattern, here identified as a regular expression in string form.  Next the terms are scanned to see if any default operators are needed.  In this case there is an implied “AND” between the two terms, and because it was not specified, it is added now:

parsing.5

This is now a recognizable infix expression, and ready for evaluation.  However the expression evaluator (E9 – “Evaluation”) can only handle postfix expressions, so with the use of a Dijkstra shunting algorithm, the expression is converted to postfix.

parsing.6

Now in postfix form, the expression can be evaluated by a simple stack-based postfix calculator.  That calculator just needs to know how to access DOM values, and apply operators.  There is the question of operator precedence too, but that is built into the Dijkstra algorithm.  Incidentally, that is a fascinating algorithm, well worth reading about:

http://en.wikipedia.org/wiki/Shunting-yard_algorithm

We have looked at a read-only command, which is all we will cover in part 1.  For write commands it gets a little more complicated, and that will be discussed in part 2.

Advertisements