Documentation for "expecto"

Download the whole documentation as one plain text file


4. Template Syntax

4.6. Branches inside Groups

Suppose we have a cron job that mails us entries from a log file every night. Most of this is actually harmless, so we want to get rid of it. Here is an example:

Lost database connection, reconnecting ... Connection established. Timeout reading proxy data, retrying ... Retry successful.

There are several things to note: First of all, we have to handle entries in groups (see the previous chapter). That's because the "Lost connection" and "Timeout" messages are only harmless if they're followed by the appropriate "Connection established" and "retry successful" entries. That means that our template has to match them only if these lines appear together, therefore we need to use groups.

Second, these pairs of lines can occur multiple times, or not at all. This is not a problem: In the previous chapter we have learned that a repetition mark can be applied to groups. So we could create two groups of two lines each, applying an asterisk "*" as repetition mark to both groups. But ...

Most importantantly, we have to note that these pairs of lines can occur in any order. Therefore we cannot use two separate groups in our template. If we placed the group to match the database lines first in our template, followed by the group to match proxy lines, it would not match if a proxy message came first, followed by a database message.

So we need something like an "or" operator that applies to multiple lines of a group. Guess what? Such a thing exists. In fact, this is one of the most powerful features of expecto.

* ( "Lost database connection, reconnecting ..." "Connection established." | "Timeout reading proxy data, retrying ..." "Retry successful." )

As you can see, the pipe character "|" is used to separate the cases that can occur in our cron job's output. These cases are called branches of a group. In the above example, the group has only two branches, but you can specify as many as you like. The branch separators (pipe characters) have to be on their own lines, just like the closing parenthesis. The only thing allowed on the same line is white space so you can use indentation.

Basically, branches are alternative cases. In order for the group to match as a whole, at least one of the branches inside has to match. To be more exact, expecto tries the branches from top to bottom until one branch matches. Therefore, if more than one branch could match, the order of the branches might matter.

In order to explain how expecto handles branches, let's have a look at an example that is a little different from the above:

The following problem occured: Lost database connection, reconnecting ... Connection established. Problem resolved. The following problem occured: Timeout reading proxy data, retrying ... Retry successful. Problem resolved.

Now the groups each consist of four lines, of which the first and the last line are identical. Our template for that kind of output looks like this:

* ( "The following problem occured:" "Lost database connection, reconnecting ..." "Connection established." "Problem resolved." | "The following problem occured:" "Timeout reading proxy data, retrying ..." "Retry successful." "Problem resolved." )

That's not surprising, we just added the additional lines to the group. So what's the big deal? The point is that this example demonstrates how expecto works in detail.

Let's assume that our cron job produced a message regarding a proxy timeout (i.e. the second case):

The following problem occured: Timeout reading proxy data, retrying ... Retry successful. Problem resolved.

When expecto reads the first line, it starts by comparing it with the first line of the first branch of the group. This match is successful, even though we know it is the wrong branch. But expecto doesn't know yet. Both branches start with the same line, so, at this point, expecto has no way to know which branch this is.

TEMPLATE INPUT ======== ===== * ( "The following problem occured:" <--- The following problem occured: "Lost database connection, reconnecting ..." "Connection established." "Problem resolved." | "The following problem occured:" "Timeout reading proxy data, retrying ..." "Retry successful." "Problem resolved." )

Now comes the second line. Of course, expecto tries to continue matching the first branch, but the second line of input does not match the second line of the first branch of the template. Is expecto in trouble now? No, not at all. Now it knows that the first branch does not match this group of lines, and that the first line cannot be part of this group either, even though it matched previously. Remember that a branch must always match as a whole.

TEMPLATE INPUT ======== ===== * ( "The following problem occured:" <--- The following problem occured: "Lost database connection, reconnecting ..." <-/- Timeout reading proxy data, retrying ... "Connection established." "Problem resolved." | "The following problem occured:" "Timeout reading proxy data, retrying ..." "Retry successful." "Problem resolved." )

So, at this point, expecto goes back to the first line and tries to match it against the first line of the second branch. This matches again, and now the remaining lines match, too, so this branch is accepted by expecto.

TEMPLATE INPUT ======== ===== * ( "The following problem occured:" "Lost database connection, reconnecting ..." "Connection established." "Problem resolved." | "The following problem occured:" <--- The following problem occured: "Timeout reading proxy data, retrying ..." <--- Timeout reading proxy data, retrying ... "Retry successful." <--- Retry successful. "Problem resolved." <--- Problem resolved. )

The important thing we learn here: expecto actually goes back in the input if a branch fails to match, so it can check if one of the other branches matches.



[Valid XHTML 1.0]