next up previous contents index
Next: Rewrite Definitions Up: Input Previous: Life Time of Terms

   
Function Definitions

The structure of the generated C data types (see Section 2.1) for the phyla is very regular. Nevertheless it appears tedious to write C functions over these data types. Therefore there is a mechanism that allows easier expression of functions over phyla. In this way case analysis and subterm selection is simplified. For example:   

int len(exprlist $el) { Nilexprlist: { return 0; } tt = Consexprlist(*, t): { return len(t) + 1; } }
Here an integer-valued function len is defined with one argument of type exprlist (for exprlist see Section 1.1). The function does pattern matching on the argument that is prefixed with $. In the case where more than one pattern matches, the most specific match is taken.   The patterns can be arbitrary terms with variables, string-literals (double-quoted) and int-literals. Non-leaf variables can be denoted as variable=subpattern, as tt in the example above. The construct * can be used to denote an `anonymous' variable. As degenerate pattern an operator name not followed by parentheses can be used when one is not interested in the (number of) subphyla. The Nilexprlist pattern above is an example of such a pattern. The `pattern' default can be used to indicate a default case. In case there is no default, the default becomes to give a run time error message.

For each pattern a piece of C code is given between curly brackets. If several patterns share the same piece of C code, the patterns can be grouped (with separating commas). In this C code, pattern variables denote the various components of the term. The values $1 etc. denote the subphyla of the term: in the example above len(t) could also be written as len($1). The value $0 denotes the term itself. Attributes can be referred to as e.g. variable $\rightarrow$value.

Alternatively, function bodies can be an arbitrary piece of C code. This code can contain with-statements , in which the same pattern matching can be expressed. For example, an alternative description of the function len is:   


int len(exprlist el) { with(el) { Nilexprlist: { return 0; } tt = Consexprlist(*, t): { return len(t) + 1; } } }

Another construct in function bodies and C code is the foreach-statement, which expresses the iteration over a list. Its components are the loop variable, which automatically gets the type of the list element, the list to loop over, and a body. Yet another example of the len function:


int len(exprlist el) { int length = 0; foreach( l; exprlist el ) { length++; } return length; }

The foreach-with-statement is useful if the body of the foreach loop consists of only a with-statement for the loop variable. Then the same syntactical shorthand as for the function definitions can be used:


expr sum(exprlist el) { expr sub_total = Zero(); foreach( $e; exprlist el ) { Add( x ): { sub_total = Plus( sub_total, x ); } Subtract( x ): { sub_total = Minus( sub_total, x ); } } return sub_total; }

The following foreach-pattern-statement is useful if there is only one interesting pattern. Instead of a loop variable it takes a pattern. The body is only executed for those list elements that match the pattern.


expr add_Adds(exprlist el) { expr all_Adds = Zero(); foreach( Add( x ); exprlist el ) { all_Adds = Plus( all_Adds, x ); } return all_Adds; }

It is also possible to do pattern matching over multiple expressions in one with-statement and loop over several lists in a foreach-statement. The syntax for it is a straightforward extension of the 'singular' case. For example:   


boolean equiv(expr $a, expr $b) { Add( asub ) & Add( bsub ), Subtract( asub ) & Subtract( bsub ), Const( * ) & Const( * ): { return equiv( asub, bsub ); } Plus( asub1, asub2 ) & Plus( bsub1, bsub2 ), Minus( asub1, asub2 ) & Minus( bsub1, bsub2 ): { return equiv( asub1, bsub1 ) && equiv( asub2, bsub2 ); } default: { return False; } }

Here we compare two expr trees: if they have the same tree structure (form) we say that they are equivalent (and we don't care about the constant values in the leaves). For each dollar-prefixed argument we have a pattern; the patterns are grouped together with separating ampersand (&). Pattern groups that share the same piece of C code are grouped together with separating commas.

Pattern-variables may appear multiple times in the patterns of an ampersand-linked pattern group, to indicate that subtrees have to be (structurally) equivalent. We can even use that to 'parameterize' a pattern:   


boolean has_sub(expr $a, expr $b) { Add( asub ) & asub = * : { /* code here will be executed if 'a' has top-operator 'Add', and asub == b */ } default: { return False; } }

If comma-separated pattern groups share a common pattern or ampersand-linked pattern group, it can be factored out. For example, the two pattern groups below are equivalent:   


Add( asub ) & Add ( bsub ) , Add( asub ) & Subtract ( bsub ) : { /* C-code */ }
Add( asub ) & ( Add ( bsub ) , Subtract ( bsub ) ) : { /* C-code */ }

The foreach-statement over multiple lists is actually a combination of the foreach-statement, the foreach-with-statement and the foreach-pattern-statement. It loops over all lists at the same time, as long as each list still contains elements. For example:   


boolean equiv_lists(exprlist el1, exprlist el2) { boolean result = True; foreach( e1 & e2; exprlist el1, explist el2 ) { /* this body is executed as long as both lists have elements */ result = result && equiv( e1, e2 ); } /* we don't know if one list is longer than the other; we only */ /* know the 'result' for the elements that we compared with 'equiv' */ return result; }
Here we have a function in which we check that the elements of two lists are pairwise equivalent. At the right of the semicolon two list expressions appear (separated by commas, and each prefixed with its type). At the left of the semicolon a corresponding number of foreach-items appears (seperated by ampersands), where each foreach-item is either a variable (as in a foreach-statement), a variable prefixed with a dollar (as in a foreach-with-statement) or a pattern (as in a foreach-pattern-statement). The body contains either C-code, or patterns with C-code (if one or more foreach-items was a dollar-prefixed variable).

Our equiv_lists function has one disadvantage: the body of the foreach is executed for each pair of elements, as long as all (both) lists are non-empty, so, to know whether the two lists have the same length, we would have to test explicitly for the length. There is an experimental feature to get around this: it is possible to specify a clause that will be executed after the end of the foreach body, in which variables are bound to (or patterns are matched with) the remaining sub-lists:


boolean equiv_lists(exprlist el1, exprlist el2) { boolean result = True; foreach( e1 & e2; exprlist el1, explist el2 ) { /* this body is executed as long as both lists have elements */ result = result && equiv( e1, e2 ); } afterforeach( $re1 & $re2 /* same number of items here as in foreach */ ) { Nilexprlist() & Nilexprlist() : { /* both lists same length: result unchanged */ } default: { /* lists have different length: result changed */ result = False; } } return result; }
The argument of the afterforeach has the same number of items as the corresponding foreach. The items can be patterns (over the listtype, not the listelement type), dollar-prefixed variables and ordinary variables. The body, which can contain C-code, or patterns (if there are dollar-prefixed variables), is exectuted exactely one time.

Actually, the use of patterns and dollar-prefixed variables are just syntactical sugar for a foreach statement with only variables, but with nested with-statements in its body. For example, the two foreach-statements below are equivalent:   


/* here we combine all kinds of items: patterns, dollar-prefixed variables * and ordinary variables. Of course, the items can appear in arbitrary order. */ foreach( pattern1 & ... & patternk & $dvar1 & ... & $dvarn & var1 & ... & varm ; ... ) { /* body, in which we can refer to * pattern variables, dollar-prefixed variables (dvar*) * and ordinary variables (var*) * but (most often) not to dollar-variables ($i, i >= 0) */ }
/* here we have the same statement, in which we only use ordinary variables, * (and we introduced 'anonymous' variables for the patterns pat* ), * together with nested with statements. */ foreach( var_pat1 & ... & var_patk & dvar1 & ... & dvarn & var1 & ... & varm ; ... ) { with( var_pat1, ..., var_patk ) { pattern1 & ... & patternk : { with( dvar1, ..., dvarn ) { /* body, in which we can refer to * pattern variables, dollar-prefixed variables (dvar*) * and ordinary variables (var*) * but (most often) not to dollar-variables ($i, i >= 0) */ } } } }

Most C code can contain so-called dollar-variables : expressions of the form $i where i is 0, 1, 2, .... The context in which the dollar-variables are used defines whether dollar-variables may be used, and to which values they are bound. The dollar-variables may not be used in a with-statement over more than one expression, or in a foreach-statements that induce such a with-statement: foreach-statements with more than one dollar-prefixed variable, and foreach-statements without dollar-prefixed variables but with more than one pattern. The dollar-variable $0 is bound to one of the following: the C-expression that is the argument of a with-statement, the function- or foreach-parameter that is preceded by a $, the pattern of a foreach-pattern-statement, or the phylum that is being created, when used in phylum-initialisation code. A dollar-variable $i, i > 0 is bound to the i-th subterm of the enclosing with-, foreach-with- or foreach-pattern-statement. Note that a foreach-statement does not introduce dollar-variables. In case of multiple candidates for bindings, the one from the smallest enclosing scope is chosen.

It is seldom necessary to use dollar-variables, in most cases pattern variables can be used; the only exception is $0 used in phylum-initialisation code. We strongly advise the use of patterns and pattern-variables, instead of degenerate patterns and dollar-variables, if only because it allows the term processor to warn against inconsistencies in its input. For example, if the number of subphyla of an operator is changed, the term processor will report an error if a pattern was not updated accordingly; however, it will not notice it when a dollar-variable was not updated to reflect the same change (lint may catch this, but there are cases that lint cannot catch).

There is one use of dollar-variables that cannot be achieved in another way: in-place modification of terms.  Try to avoid it, because it is quite easy to mess things up. Do not try to in-place modify terms maintained under unique storage. If you really need to in-place modify a term, you have the following options: In phylum initialization code (see Section 1.2), an assignment to $0 will modify the term under construction. In all other C texts an assignment to $0 is equivalent to a no-op. In those contexts, it is possible to assign to $i (i > 0), or to do something like: ... replacement_term = ...; *$0 = *replacement_term; ... , but this last solution may not be completely portable (it assigns a struct to a dereferenced (*) struct pointer).


next up previous contents index
Next: Rewrite Definitions Up: Input Previous: Life Time of Terms

2000-04-17