To describe the language syntax in detail, Extended Backus-Naur Form (EBNF) is used: Alternatives are separated by vertical bars |. Parentheses ( and ) are used for grouping. Expressions enclosed in square brackets [ and ] are optional. Curly braces { and } denote repetition (0 or more times) of the enclosed expression. Names of EBNF productions referring to terminal symbols start with a lower-case letter; all other production names start with a capital letter. Literal characters and strings are enclosed in single quotes ' and '. A precise definition of EBNF can be found in the Appendix.
Occasionally, the language syntax is presented in a slightly simplified form to facilitate reading. For instance, this is the case for the definition of string literals where a complete formal definition would become very cumbersome to understand, without adding value to the reader. Annotations in small font may be used to explain some of the fine points in those cases.
Since processing is independent, data about all the processed log entries is aggregated externally to a Sawzall program proper. To that end, Sawzall supports the declaration of output types and corresponding output variables. Output variables are the connection to specific aggregators (such as accumulators, collectors, filters, etc.) of data. A special construct, the emit statement, is used to send data to such an aggregator. The aggregator receives the data emitted to it, and aggregates it in a manner specific to its type. For example, a sum aggregator, referred to via an output variable of sum output type, will add up all the values it receives. A collection aggregator will simply collect all the data, etc. The Sawzall language doesn't specify a particular implementation for aggregators, but the Szl implementation does provide an implementation.
Processing of a large set of log entries may take a considerable amount of time even if they are processed in parallel. Furthermore, it is not unlikely to occasionally encouter corrupted log entries. It is important to be able to handle these cases gracefully, without abortion of the process. To that end, Sawzall supports the notion of defined and undefined values. Expressions and statements may be silently aborted if undefined values are encountered. It is the hope that most programs will be able to continue execution even in the presence of occasional errors caused by corrupted input data.
Program = { Declaration | Statement }.
# Program to find the three most commonly used word in a list.
# Input is text lines containing a word and a count, separated with a comma.
topwords: table top(3) of word: string weight count: int;
fields: array of bytes = splitcsvline(input);
w: string = string(fields[0]);
c: int = int(string(fields[1]), 10);
if (c != 0) {
emit topwords <- w weight c;
}
Declaration = TypeDecl | StaticVarDecl | VarDecl.Once a name has been introduced with a declaration, it can be used within the scope of the declaration to refer to the associated entity. The scope of a declaration extends from the point of the declaration of the name to the end of the immediate surrounding block. Blocks - and therefore scopes - may be nested. A name declared in a nested scope shadows equal names declared in outer scopes; thus a name always refers to its innermost declaration. A Sawzall program implicitly defines a block enclosing all declarations and statements of the program.
TypeDecl = 'type' type_name '=' Type ';'. type_name = identifier.
type my_bool = bool;
type Coordinates = { x: float, y: float };
type CityMap = map [city_name: string] of Coordinates;
The variable type may be specified explicitly, followed by an optional
initialization expression. Alternatively, the explicit type
specification may be elided in favor of the initialization expression,
in which case the type of the variable is the type of the initialization
expression. Thus, either the variable type or the initialization expression
(but not both) may be missing.
A variable may
be declared static by prefixing the declaration with the
keyword static.
StaticVarDecl = 'static' VarDecl. VarDecl = var_name ':' [Type] ['=' Expression | Block] ';'. var_name = identifier.The initialization expression for static variables may be evaluated only once per program run (implementation specific). In particular, if the value of an initialization expression for a static variable changes during program execution, the program is erroneous. Variables of an output type are always static, and the
static keyword may be omitted in that case.
Initialization expressions for static variables must not refer to non-static variables; if variable names appear in the expression, the associated variables must be static. In particular, if a (non-intrinsic) function is called in an initialization expression, that function must be stored in a static variable.
A variable declaration with an initialization expression of the form:
variable_name: variable_type = expression;is a shortcut for the variable declaration followed by an assignment, possibly with a conversion
(variable_type).
The conversion is omitted if expression
already is of type variable_type:
variable_name: variable_type; variable_name = variable_type(expression);If the type of
expression is exactly variable_type
the declaration may be further shortened to:
variable_name := expression;Notice that even though there is no explicit type specified here, it nonetheless declares a new variable (the
: indicates the variable declaration).
Obviously, in this case no conversion is implied.
If the initialization expression is a function, the variable type must be a function type. Therefore, a function declaration looks like this:
function_name: function_type = function_type {
# function body
};
Since there is no type conversion possible for functions, the function types in the
declaration must be identical and this syntax can always be
shortened to:
function_name := function_type {
# function body
};
A slightly different notation was previously supported as a special
case for functions:
function_name: function_type {
# function body
};
but that form has been obsoleted by the appearance of the := style.
The evaluation of the initialization expression may result in an undefined value. In that case, the undefined value may be silently assigned to the variable (implementation-specific, see also Undefined values below).
n := 10;
counter: int = 0;
static pi := 3.14159265;
hypot := sqrt(x*x + y*y);
static word := load("/home/szluser/current/word");
a: array of float = { 2.4, PI, float("1.234") };
unique_language_values: table unique (10) of {language: string, value: string};
average := function(list: array of float): float;
min := function(x: int, y: int): int { if (x < y) return x; return y; };
static country_codes: array of string = LoadCountries("countries.txt");
If an undefined value is used within an expression, the value of the expression becomes undefined. In particular, if an undefined value is used as an argument for a function call, the function is not called and the function result is undefined.
If an undefined value is assigned to a variable, the variable becomes undefined. In particular, if an undefined value is stored as array element, tuple field, or map entry, the entire array, tuple, or map becomes undefined. Thus, variables are either entirely defined or entirely undefined. Initially, all variables are undefined until a defined value is assigned to them. Function parameters are defined upon invocation of the function because it can only be called if all arguments are defined.
The intrinsic function def(x: T): bool can be used to test
an arbitrary expression x of any type T for defined'ness.
def(x) returns true if the value of x is defined;
it returns false otherwise. Thus, def() acts like a guard
against the propagation of undefined values.
def(),
by default the current implementation aborts with a run-time error if an
undefined value is encountered. When the --ignore_undefs
flag is specified with either the szl or saw application,
all undefined values are propagated silently, without
runtime errors.
Undefined initialization expressions don't cause a run-time trap so that it is possible to write code such as
x: T = expr; # potentially undefined if (def(x)) # x is defined; it can be used w/o runtime trap else # x is undefined; using it will cause a runtime trap # (unless used again in one of the exception cases described here)without the need to re-evaluate
expr if its value is defined.
Undefined return expressions don't cause a run-time trap so that it is
possible to return an undefined value from a user-defined function; usually
to indicate that the function failed. For instance, one might write a function
f that returns a result of type T, and also use
the result mechanism to indicate if the function was successful or not:
f := function(...): T {
...
# return an undefined value to indicate failure of f
u: T;
return u;
};
result: T = f(...);
if (def(result))
# function succeeded; result is defined
else
# function failed; result is undefined
}
x held in a variable v1 is assigned to a second
variable v2, changing v2 after the assignment -
for instance by modifying components of v2 - won't affect the
value of v1.
proto keyword.
In that case, the type specification must be a tuple type (either a type
name referring to a tuple type, or a tuple type specification). The resulting
tuple type is an automatic proto tuple type generated from the original
tuple type (see Automatic proto tuple types
for details).
Type = type_name | ArrayType | MapType | TupleType | OutputType | FunctionType | 'proto' Type. type_name = identifier.
bool # the boolean values true and false bytes # arrays of unsigned bytes int # 64-bit signed integer values float # 64-bit IEEE floating point values time # unsigned integral representations of time, with microsecond resolution fingerprint # unsigned hash values computed by an implementation-defined hash function string # arrays of unicode characters uint # 64-bit unsigned integer valuse
Component = [component_name ':'] ComponentType. component_name = identifier. ComponentType = Type.
ArrayType = 'array' 'of' Element. Element = Component.
array of int array of names: string array of array of point: Point
The type of a tuple field, static declaration or type declaration may refer to the name of an enclosing type; that is, type declarations may be recursive. When used in a field or type declaration, the inner type reference must occur within an array type or map type enclosed by the outer type declaration. (That is, there must be a way to omit the value of the field. Otherwise it would be impossible to represent an object of that type.)
Although types may be recursive, objects can neither contain nor refer to enclosing objects or themselves (directly or indirectly). This is because Sawzall is value-based and has no pointers or references.
Types cannot be mutually recursive. Since Sawzall does not have forward references it is not possible to have two types (e.g. tuples) each of which makes use of the other.
Named fields only may also have an integer value called (field) tag associated with it. Tags are used to associate the field with a field in the protocol buffer representation of the tuple (see also Protocol Buffers). No two fields within a tuple can have the same tag, and all tags must be > 0. Tuples with tagged fields are called proto tuples; tuples without tagged fields are called simple tuples. All fields of proto tuples must have tags.
Tagged fields of proto tuples may also
specify a constant default value. The default value is used to
initialize a field during proto conversion if the corresponding field value is
missing in the proto buffer being converted. The special intrinsic
inproto(field: T): bool can be used to test whether a proto
tuple field was present in the protocol buffer converted into the proto
tuple (see also
A Manual for the Sawzall Intrinsics).
Tagged fields of proto tuples may also specify a corresponding underlying protocol buffer field type. This is the field type the tuple field is converted into when the tuple is converted from a Sawzall tuple back into the protocol buffer representation. If no protocol buffer field type is specified, a default field type is assumed (see table below).
For historical reasons, a proto tuple type may start with
the keyword parsedmessage to indicate that the tuple is
a proto tuple. The keyword is treated as a comment.
TupleType = SimpleTupleType | ProtoTupleType.
SimpleTupleType = '{' [SimpleMemberList] '}'.
SimpleMemberList = SimpleMember {',' SimpleMember} [','].
SimpleMember = TypeDecl | StaticVarDecl | SimpleFieldDecl.
SimpleFieldDecl = Component.
ProtoTupleType = ['parsedmessage'] '{' [ProtoMemberList] '}'.
ProtoMemberList = ProtoMember {',' ProtoMember} [','].
ProtoMember = TypeDecl | StaticVarDecl | ProtoFieldDecl.
ProtoFieldDecl = Component ['=' ProtoFieldDefault] '@' proto_field_tag [':' proto_field_type].
ProtoFieldDefault = Expression.
proto_field_tag = integer.
proto_field_type = identifier.
The type declarations for proto tuples are usually automatically generated
by the protocol-compiler from a .proto file. The following table
shows the relationship between the Sawzall and protocol buffer field types:
| Protocol buffer type | Sawzall type | Comments |
bool |
bool |
default conversion |
boolean |
bool |
|
string |
bytes |
default conversion |
fixed32 |
int |
may overflow |
fixed64 |
fingerprint |
default conversion |
fixed64 |
int |
default conversion |
fixed64 |
time |
default conversion |
int32 |
int |
may overflow |
int64 |
int |
|
uint64 |
uint |
default conversion |
float |
float |
may overflow |
double |
float |
default conversion |
For each protocol buffer field type there is a corresponding Sawzall
field type; for each Sawzall field type there may be more than one
protocol buffer field type. Some Sawzall fields may lose information
(overflow may occur) if they are converted into a protocol buffer
field of a narrower type (note that all basic Sawzall types - except
bytes and string - are 64 bits wide).
The default conversions are used if no
protocol buffer field type is specified explicitly.
proto. The resulting
proto tuple type consists of the original tuple type converted into
a parsedmessage proto tuple type augmented with field
tags. The tags are assigned in increasing order, starting with tag value
1 for the first field, tag value 2 for the second field, and so forth. The
same conversion is applied to any nested tuples in the tuple type (field
tags start with tag value 1 again; see also the examples below).
A proto keyword applied to a proto tuple type
does not change its type (i.e., proto is idempotent).
{}
{ x: float, y: float, int }
# the Vector tuple type
{ x: float, y: float,
static Magnitude := function(p: Vector): float {
return sqrt(p.x*p.x + p.y*p.y);
}
}
{ ip: int = 0xffffff00 @ 1,
value: bytes = bytes("britney") @ 2, # proto strings are Sawzall bytes
timestamp: time @ 5,
type Server = {
id: int,
location: string
},
static location1 := "ROB",
static location2 := "GRI",
static location3 := "BGI",
}
parsedmessage {
g: array of TimeProtocol_G @ 1, # 11
debug: array of bytes @ 4: bytes # 34
}
proto T # T must be a tuple type
proto {
x: int,
y: float
}
proto proto proto { # proto is idempotent
x: int,
t: {
s: bytes,
t: bytes
}
}
parsedmessage { # this type is equivalent to the previous one
x: int @ 1,
t: parsedmessage {
s: bytes @ 1,
t: bytes @ 2
} @ 2
}
MapType = 'map' '[' Key ']' 'of' Value. Key = Component. Value = Component.
map [int] of bool
map [symbol: string] of int
map [point: {x: float, y: float}] of name: string
emit
statements are used to send data to an aggregator.
Output types may be parametrized and indexed. Parameters are used to
set up the aggregation variable. Indices are used to create "arrays" of
aggregators. Unlike with arrays, indices of various types (not just int)
may be specified. Some output types require the specification of a weight
type. When emitting to a variable of output type, the weight is used to
scale the emitted value for aggregation.
OutputType =
'table' table_type [Parameter] {Index} 'of' Element [Weight]
[FileSpec | ProcSpec ] [FormatSpec].
table_type = identifier.
Parameter = '(' Expression ')'.
Index = '[' Component ']'.
Element = Component.
Weight = 'weight' Component.
FileSpec = 'file' '(' ArgumentList ')'.
ProcSpec = 'proc' '(' ArgumentList ')'.
FormatSpec = 'format' '(' ArgumentList ')'.
ArgumentList = ExprList.
ExprList = Expression {',' Expression }.
If a file or proc specifier is present, data is not emitted
to an external aggregator, but to a file or process respectively, with the file
or process name computed via the specifier as described below. Unless
a format specifier is present as well, the element type must be
bytes.
file or proc specifiers are only allowed
for collection output types.
If a format specifier is present, any value emitted is
formatted into a string as expressed via the specifier.
The emitted values are always of type string, independently
of the element type specified in the output type; except if there is
a file or proc specifier present as well, in which
case the string is converted into (UTF-8 encoded) bytes before
emission.
The file, proc, and format specifier format
expressions similar to printf in C: The first argument must be
a string possibly containing format characters. The remaining arguments
must correspond in number and type to the format characters in the
format string. Within the arguments of file or proc
specifiers, output type index values may be referred to via the
corresponding component name of a particular index. Within the arguments
of format specifiers, the element value may be referred to via
its component name.
# Type of intrinsic output variable stdout
table collection of x: string file("/dev/stdout") format("%s\n", x)
# Type used to collect all the values into a single stream
table collection of string;
# Type used to count the number of times each value is seen
table sum[value: string] of count: int;
# Type used to record the top 10 values for each category
table top(10)[category: string] of value: string weight count: int;
# Type used to record the ten most expensive operations
table maximum(10)[category: string] of operatikon: string weight cost: float;
# Type used to count how many unique values there are, using an (internal)
# sampled table of 10000 values to estimate the distribution
table unique(10000) of value: string;
FunctionType = 'function' '(' [ParameterList] ')' [ResultSpec].
ParameterList = Parameter {',' Parameter}.
Parameter = identifier ':' Type.
ResultSpec = ':' Type.
function() function(n: int): int function(name: string, hint: Coordinates): Coordinates
Statement = Assignment | Block | BreakStatement | ContinueStatement | DoStatement | EmitStatement | ExprStatement | ForStatement | IfStatement | ResultStatement | ReturnStatement | SwitchStatement | WhenStatement | WhileStatement.
a[f()] = g(); is erroneous if calling f() before
g() changes the result of g(), and vice versa.
Assignment = Factor '=' Expression ';'If either the left-hand side or the right-hand side of the assignment is undefined, the variable designated by the left-hand side becomes entirely undefined. For example, in the assignment
a[i] = x; the array a becomes
undefined if either the index i or the value x is undefined.
Block = '{' { Declaration | Statement } '}'.
BreakStatement = 'break'.
ContinueStatement = 'continue'.
DoStatement = 'do' Statement 'while' '(' BoolExpression ')' ';'.
If the boolean expression has become undefined, the do statement
is aborted and execution continues with the next statement.
do d++; while (d*d < x);
EmitStatement = 'emit' OutputDesignator '<-' Expression ['weight' Expression] ';'.
OutputDesignator = var_name {'[' Expression ']'}.
emit stdout <- "Hello World!"; emit valuecount[values[somej]] <- 1;
++, or a decrement -- operator,
the variable, array element, or tuple field is incremented or decremented
by 1.
ExprStatement = Expression [ '++' | '--' ] ';'.If an undefined value occurs during expression evaluation, expression evaluation is aborted and execution continues with the next statement.
i++; a[i]--; f(x, y);
true as second expression.
ForStatement =
'for' '(' [ForDeclExpr] ';' [BoolExpression] ';' [ForDeclExpr] ')' Statement.
ForDeclExpr = Declaration | ExprStatement.
A for statement of the form:
for (expr1; cond; expr2) statement;is syntactic sugar for the block:
{ expr1;
while (cond) {
statement;
expr2;
}
}
IfStatement = 'if' '(' BoolExpression ')' Statement ['else' Statement].
If the boolean expression is undefined, the if statement is ignored and
execution continues with the next statement (following the if statement).
ResultStatement = 'result' Expression.A result statement cannot be used inside a function without an enclosing statement expression.
e := ?{
j := function(): int { result 1; }; # compile-time error
};
ReturnStatement = 'return' [ Expression ].If the return value is undefined, the function returns an undefined result. A return statement cannot be used inside a statement expression without an enclosing function.
x := function(): int {
n := ?{
return 2; # compile-time error
};
};
case keyword; and
semantically, the tag value is compared with individual case values in order
of appearance. Individual case values need not be compile-time constants.
In particular, the Sawzall switch statement works also for tag types other then int.
Furthermore, an implicit break statement is appended to each statement
list per case (no fall through).
SwitchStatement = 'switch' '(' Expression ')' '{' { Case } Default '}'.
Case = 'case' CaseLabelList ':' StatementList.
CaseLabelList = Expression {',' Expression}.
StatementList = Statement { Statement }.
Default = 'default' ':' StatementList.
A switch statement of the form:
switch (tag) {
case a0, a1, ... an:
as;
case b0, b1, ... bn:
bs;
...
case z0, z1, ... zn:
zs;
default:
s;
}
is syntactic sugar for the block (assuming no undefined values and
break statements):
{ $tag: typeof(tag) = tag;
if ($tag == a0 || $tag == a1 || ... $tag == an)
as;
else if ($tag == b0 || $tag == b1 || ... $tag == bn)
bs;
...
else if ($tag == z0 || $tag == z1 || ... $tag == zn)
zs;
else
s;
}
If tag is undefined, the switch statement is ignored, and execution continues
with the next statement. If any of the case labels
is undefined, the effect is as if that case were not mentioned. Since this can
lead to a potentially 'empty' switch statement, the default case is mandatory.
Break statements can be used to exit a switch statement before the end of a
statement list in a case (breaks are implicit at the end of each case).
WhenStatement = 'when' '(' {QuantifierDecl} BoolExpression ')' Statement.
QuantifierDecl = var_name ':' quantifier Type ';'.
quantifier = 'all' | 'each' | 'some'.
If the boolean expression is (or has become) undefined for all possible values
of the quantifiers, the when statement is aborted and execution continues
with the next statement.
break or continue statements are not permitted in a when statement unless the corresponding switch statement or loop is also in the when statement.
# Count the relevant values when ( somei: some int; somej: some int; lowercase(valuewords[somei]) == values[somej] ) emit valuecount[values[somej]] <- 1;
WhileStatement = 'while' '(' BoolExpression ')' Statement.
If the boolean expression is (or has become) undefined, the while statement
is aborted and execution continues with the next statement.
while (d*d < x) d++;
x - y - z
stands for (x - y) - z.
Expression = Conjunction {('||' | 'or') Conjunction}.
Conjunction = Comparison {('&&' | 'and') Comparison}.
Comparison = SimpleExpr [relation SimpleExpr].
relation = '==' | '!=' | '<' | '<=' | '>' | '>='.
SimpleExpr = Term {add_operator Term}.
add_operator = '+' | '-' | '|' | '^'.
Term = Factor {mul_operator Factor}.
mul_operator = '*' | '/' | '%' | '<<' | '>>' | '&'.
Factor = Operand { Selector | Index | Call | SawCall }.
Selector = '.' field_name.
field_name = identifier.
Index = '[' Expression [':' Expression] ']'.
Call = '(' ArgumentList ')'.
SawCall =
'(' [IterationCount ','] InputString
',' RegexpList [',' 'rest' Factor] ')'.
IterationCount = Expression.
InputString = Expression.
saw_name = identifier.
RegexpList = Regexp {',' Regexp}.
Regexp = ['skip'] ['submatch'] string.
T, then T.m
selects the tuple member m, and m must
refer to the type name of a type declaration, or the variable name of a static
variable declaration in the tuple type T.
If the type of an operand T is a tuple, then
T.f selects the field f of the tuple.
A is an array,
string, or bytes, then A[i] selects the element of A whose index corresponds
to the current value of i.
If the index i of A[i]
is not within the bounds of the array, the expression A[i]
becomes undefined.
Furthermore, A[i:j] selects a sub-array
or slice consisting of the elements A[m], A[m+1], ...
A[n-1] with m = max(0, i), and n = min(len(A), j).
Note that slices are never undefined, they simply become empty
(i.e., their length is 0) if the range is entirely outside the array boundaries.
M is a map, then M[k] selects the element of
M whose key corresponds to the current value of k. If M[k]
appears on the right-hand side of an assignment or in an expression, and there is no
map element for the key k, then M[k] is undefined. If M[k]
is the left-hand size of an assignment of the form M[k] = v;, then v
becomes the value associated with the key k.
F is a function, then F(p1, p2, ... pn) expresses
the call of F with parameters p1, p2, ... pn.
The parameter expressions
are evaluated and assigned to the formal function parameters before the call;
no evaluation order is specified. In particular,
a program that depends on a particular evaluation order of parameters is
erroneous. If F is defined by the
Sawzall program (i.e., it is not an intrinsic function),
the number and types of the parameter expressions must correspond (by position)
with the type of F. If F is an intrinsic function, function specific
rules apply. In particular, if F specifies a saw call (one of saw,
sawn, or sawzall), saw-specific syntax applies to
the function call. If the operand F is a type name, then the call
F(x, arg1, arg2, ...) is syntactic sugar for the call
convert(F, x, arg1, arg2, ...). Note that parameter passing is simply a special
form of assignment; and thus is done by value in Sawzall.
T is an identifier designating a type, then T(x, p1, ... pn)
is a shortcut for the conversion function call convert(T, x, p1, ... pn), where
convert is a special conversion intrinsic, T the type into which the
value x is to be converted, and p1, p2 ... pn,
are (potentially missing) parameters specifying the conversion more precisely.
For example,
int(x, 10) is the same as convert(int, x, 10). Conversions are defined between
most basic types, and between some array types as well. Currently a detailed list of all
implemented conversions is missing, but some additional information can be found in
Sawzall Language.
$ symbols,
nested expressions,
or statement expressions. A type identifier denotes the corresponding type. It
does not have a value and must be followed by a
member selector,
conversion call, or may be used as an argument for
certain special intrinsics. A variable
identifier stands for the current value of the variable. A literal value
specifies a constant value. A $ symbol must only be used within
an index expression and it stands for current length of the subscripted array,
string, or bytes value. Parentheses are used to specify different evaluation
orders than defined by the operator precedences.
Operand =
identifier | literal | Composite | Function | unary_operator Factor |
'$' | StatementExpr | '(' Expression ')'.
unary_operator = '+' | '-' | '~' | '!' | 'not'.
{} is equal to the type of any array, string, or bytes.
:'s, and if the all key
types equal the type of the map key, and if all value types equal the type of
the map value. Alternatively, the type of an empty map {:} is
equal to the type of any map.
Composite = '{' [ExprList | PairList | ':'] '}'.
PairList = Pair {',' Pair}.
Pair = Expression ':' Expression.
If any expression in the composite is undefined, the value of the composite
is undefined.
{}
{:}
{ 'a', 'b', 127 }
{ 12, "hello", 3.14, {} }
{ "foo" : 3, "bar" : 7 }
Function = Type Block.A function that is used as initialization expression for a static variable may not access non-static variables in surrounding scopes. Thus, such a function cannot produce any side-effects and its result depends only on the (global) state of static variables and the function arguments.
function() {}
function(radius: float): float { return 2.0 * PI * radius; }
Action { kill_bill(); }
format()), optional parameters (e.g., assert()),
and polymorphic parameter types (e.g., min()).
The intrinsic functions can be roughly grouped into the following
classes:
assert(), def(), convert(),
format(), len(), new(), etc.)max(), min(), ln(),
exp(), rand(), etc.)haskey(), addday(), trunctoday(),
now(), etc.)regex(), saw(), sawn(),
sawzall(), match(), etc.)StatementExpr = '?' Block .
# signum
signum := ?{ switch(true) { case x < 0: result -1; case x > 0: result 1; default: result 0; };
# array filtering
a = ?{
aa: array of int = {};
when (i: each int; Test(a[i])) aa = aa + {a[i]};
result aa;
};
The block in a statement expression must contain at least one
result statement, which determines the type of the statement expression.
symbol result + sum - difference * product / quotient % modulo & bitwise and | bitwise or ^ bitwise xor ~ bitwise negation << shift left >> shift right (logical)The behavior of shifts is unspecified if the shift count is greater than or equal to the word size. If an expression contains an integer division or modulo operation by 0, the expression becomes undefined. The operators
/ and % are related by the following formula:
x = (x / y) * y + (x % y)The operator
^ is related to the operators &, |,
and ~ as follows:
x ^ y = (x & ~y) | (y & ~x)
symbol comparison == equal != not equal < less <= less or equal > greater >= greater or equal
and true if and only if both arguments are true; false otherwise (deprecated) or false if and only if both arguments are false; true otherwise (deprecated) not true if operand is false; false otherwise && likeNote thatand, but evaluates rhs only if lhs is true || likeor, but evaluates rhs only if lhs is false ! same asnot
and and or evaluate both arguments unconditionally.
symbol result + concatenation
| Precedence | Category | Operators | Associativity |
| 6 (highest) | unary | + - ~ ! not |
right to left |
| 5 | multiplicative | * / % << >> & |
left to right |
| 4 | additive | + - | ^ |
left to right |
| 3 | comparison | == != < <= > >= |
left to right |
| 2 | conjuction | && and |
left to right |
| 1 (lowest) | disjunction | || or |
left to right |
2003 # an int
i + 1 # an int
0b100 << i # an int
(x - a) * (x + a) # a float
{"the", "answer", "is", 42} # a 4-tuple {string, string, string, int}
{1.2, 2.3, 3.4, 4.5, 5.6} # a 5-tuple and also an array of float holding 5 elements
a[i % $] # (i % len(a)).th element of array a
x.f # field f of tuple x
{1, 2, 3, 4, 5}[i] # i.th element of constructed array
(0 <= i) and (i < n) # true if i in [0, n)
min(x, y) # call of function min with arguments x and y
a[i:j] # the array (a[i], a[i+1], .. a[j-1])
# Copyright 2010 ################
Include = 'include' file_name. file_name = string.
file_name specifies a relative path (i.e., does not
start with '/'), the directory location of the file is relative to the including
file (i.e., the file containing the include clause), or relative to any
number of directories specified as include directories via
command line arguments; unless file_name starts with a '.' in
which case it must be located relative to the including file.
If other platforms are supported they do not necessarily follow this rule.
include "/home/szluser/stuff" include "lib/qsort.szl"
Proto = 'proto' file_name.
proto "/home/szluser/logfile.proto"
identifier = (letter | '_') {letter | '_' | digit}.
letter = 'A' .. 'Z' | 'a' .. 'z'.
digit = '0' .. '9'.
Various identifiers are predefined. They refer to basic data types, intrinsic
functions, incoming parameters, etc. The following is an incomplete list:
int float string time bytes bool fingerprint output stdout stderr _undef_cnt _undef_details true false PI SECOND SEC MINUTE MIN HOUR HR DEBUG def convert new regex saw sawn sawzall fingerprintof format len haskey inproto keys lookup max min assert addday addmonth addweek addyear dayofmonth dayofweek dayofyear hourof minuteof monthof secondof yearof trunctoday trunctohour trunctominute trunctomonth trunctosecond trunctoyear now highbit load getenv lowercase uppercase match strfind strrfind strreplace matchposns matchstrs ln log10 exp pow ___raise_segv rand nrand tobase64 frombase64 splitcsvline splitcsv
szl --explain=
literal = integer | unsigned_integer |
fingerprint | floating_point | char | string | time.
integer = ['-'] (binary_int | octal_int | decimal_int | hexadecimal_int).
binary_int = '0' ('B' | 'b') bin_digit {bin_digit}.
octal_int = '0' oct_digit {oct_digit}.
decimal_int = dec_digit {dec_digit}.
hexadecimal_int = '0' ('X' | 'x') hex_digit {hex_digit}.
bin_digit = '0' .. '1'.
oct_digit = '0' .. '7'.
dec_digit = '0' .. '9'.
hex_digit = '0' .. '9' | 'A' .. 'F' | 'a' .. 'f'.
1 # = 1 011 # = 9 0b1010 # = 10 42 # = 42 0x7f # = 127
- in front of
an expression; thus a negative integer value can be written in two different
ways. For instance, -3.14 can be written as -3.14 (signed
integer literal) or as - 3.14 (- operator followed
by unsigned integer literal). This apparent redundancy is required to permit
the entry of the most negative integer value -0x8000000000000000 as a literal
(the corresponding positive value 0x8000000000000000 is too large to be represented).
uint, not int.
unsigned_integer = (binary_int | octal_int | decimal_int | hexadecimal_int) ('U' | 'u').
1u # = 1 011u # = 9 0b1010U # = 10 42u # = 42 0x7fU # = 127 18446744073709551616U # = largest possible 64-bit value
fingerprint = integer ('P' | 'p').
0p # the fingerprint with value 0 0x6347P # the fingerprint with value 25415
floating_point = ['-'] [integer_part]['.' [fraction_part]] [scale_factor].
integer_part = decimal_int.
fraction_part = decimal_int.
scale_factor = ('E' | 'e') ['+'|'-'] exponent.
exponent = decimal_int.
.01 # = 0.01 1e-3 # = 0.001 2. # = 2.0 3.1415 # = 3.1415 2.18E5 # = 218000.0
char = ''' character '''.
character = ordinary_character | escaped_character.
escaped_character =
'\' oct_digit [oct_digit [oct_digit]] |
'\' 'x' hex_digit {hex_digit} |
'\' 'u' hex_digit hex_digit hex_digit hex_digit |
'\' 'U' hex_digit hex_digit hex_digit hex_digit hex_digit hex_digit hex_digit hex_digit |
'\' ('a' | 'b' | 'f' | 'n' | 'r' | 't' | 'v' | '\' | '"') |
'\' ordinary_character.
An ordinary character is any printable Unicode character except double quote
" or backslash \. The value of an ordinary character prefixed with '\' is
the value of the ordinary character (e.g., '\c' == 'c').
'\n' # = 10 'A' # = 65
string = '"' {character} '"' | '`' {character} '`'.
The above syntax for strings is somewhat simplified to
facilitate reading.
In particular, it doesn't exclude " or ' characters from the respective string
definitions. To include a double quote character in a string enclosed by
double quotes, escape the double quote character: "\"".
Back quotes cannot be included directly in a string enclosed by back quotes
since they cannot be escaped in such a string by definition.
Instead, use something like: `string1` + "`" + `string2`.
Also, it is illegal for to place a character with value 0 in a string literal.
"" # empty string "Sawzall\n" # Sawzall followed by a newline character `"Hello"` # "Hello" (including the "'s)
bytes = 'B' string | 'X' hex_string.
hex_string = '"' { hex_digit hex_digit } '"' .
B"" # empty bytes literal B"hello" # five bytes B`"` # one byte, a double quote B"\xff" # one byte, value 255 X"0011AB" # three bytes: 0x00 0x11 0xAB
time = integer ('T' | 't') | 'T' string.
0t # 0 microseconds 1000000T # 1 second T"Tue Jun 5 10:43:07 America/Los_Angeles 2007" T"Wed Feb 4 16:26:41 PST 2004"
Sawzall's time zone support is built upon the ICU C++ code. You can use any time zone recognized by that code. "America/Los_Angeles"-style identifiers are encouraged.
format which is also the identifier for an
intrinsic function).
+ - * / % & | ^
<< >> == != < <= > >=
( ) [ ] { } && ||
~ ! . , ; : = <-
$ -- ++
all and array break case continue default do each
else emit file for format function if include map
not of or parsedmessage proc proto rest result return
skip some static submatch switch table type weight when while
Production = production_name '=' Expression '.'.
Expression = Term { '|' Term }.
Term = Factor { Factor }.
Factor = production_name | symbol | Group | Option | Repetition.
Group = '(' Expression ')'.
Option = '[' Expression ']'.
Repetition = '{' Expression '}'.
production_name = identifier.