The Q Language

Per Bothner

Acknowledgements and licensing

Q is primarily written by Per Bothner (bothner@cygnus.com). However, it uses software from a number of other sources. Some code comes from the Free Software Foundation. It follows that Q is subject to the GNU general Public License, which should be in the file COPYING in the Q distribution.

Invoking Q

The way you run Q is very similar to how you run a typical shell.

Q [options] [file args]

First Q processes the options (these begin with a hyphen).

If there are no more arguments after processing the options, Q enters interactive mode: If prompts for an expression, evaluates it, and print it, before prompting for the next expression, and so on.

If there are arguments remaining after the options, the first argument must name a Q source file. Q evaluates that source file non-interactively, setting the argv array to whater arguments are left over.

These options are currently supported by Q:

-e expression: Evaluate the expresion
-f filename: Evaluate the code in the named file.
-i: Set interactive mode, regardless.
-Lnumber: Set various debugging/logging options.
--Lisp: Make Common Lisp the default language: User input is parsed in Common Lisp syntax, and output is printed in Common Lisp format.
--lisp: Synonum for --Lisp
--Scheme: Make Scheme the default language: User input is parsed in Scheme syntax, and output is printed in Scheme format.
--scheme: Synonum for --Scheme
--: Enter an interactive dialogue.

Syntax

Identifiers

Identifiers are used to name things in a program.

An identifier consists of one or more of the following:

A letter.
A digit (0 though 9).
The characters period (.) or underscore (_).
Any character (except end of line) if it is preceeded by a backslash. For example: \- means the same as -, except that the - as thought of as a letter.

An identifier cannot start with a digit, nor can the second character be a digit if the first character is a period. This is because such words are parsed as numbers.

Note that the identifiers . and .. are reserved (they are used to access the filesystem).

Comments

A comment is started by # , and continues to the end of the line. Note that the comment symbol # should be followed by a space (or other white space), though a letter or number is also OK. The reason is that # followed by certain special symbols has special (different) meanings.

For example, #( begins a nested comment, which extends to the next (unmatched) #).

To include the text in the file named filename, do: #<filename>. (This doesn't work quite right.)

Spacing

Statements may be delimited by : or (more usually) by a new line. Indentation is significant. ... A line may be continued with \.

A \ (unless inside a string literal written using double quotes, or at the end of the line) means that the next character is treated as if it were a letter.

Spaces are significant:

Q1> 2 + 3*4
14
Q2> 2+3 * 4
20

Precedence

A token is a simple expression which doesn't contain any sub-expressions. In this section we will examine the rules for how tokens are combined into more complex expressions.

A primary is either a token or an expression surrounded by parentheses, brackets, or braces.

A word is one or more primaries, with no spaces or other white space characters.

A statement is one or more words, separated by newlines or semi-colons.

Each word or statement consists of a number of sub-expressions (tokens or words, respectively).

Control structure

Q's mechanisms for control structure are uncommon. This is largely because Q is a logic programming language with functions.

Statement sequences

Q is an expression language, which means it does not distinuish statements from expressions. However, it is not a purely functional language: an expression may have a side effect, such as writing a value to a file. So it is useful to combine two or more expressions, evaluating them in order. Such a statement sequence is written as a list of expressions separated by semi-colons or new-lines.

The value that results from evaluating a statement sequence is essentially the concatenation of the values of the sub-expressions. This makes sense because the result of a command that runs a program is the text printed (on standard output) by the command, so the result of running two commands in sequence should be the concatendation of their outputs.

Declarations, assignments, and constraints are kinds of expressions that we evaluate for their side-effects. If they "succeed," we don't want anything printed. We do this by having such expressions evaluate to the null sequence. Concatenating a value with the null sequence yields the orginal value. It is useful to be able to evaluate a series of declarations, assignments, and constraints, and then evaluate a final expression, returning that final result. We can do this if when "concatenating" the sub-expressions of a statement sequence, if all of the results but one are null, return the non-null result. This is not what is normally meant by "concatenation," but it is a convenient extension.

So, in detail, the effect of evaluating expr1; expr2 is the following:

First expr1 is evaluated. Call the result value1.
Then expr2 is evaluated. Call the result value2.
If value1 is the null sequence, return value2.
If value2 is the null sequence, return value1. (If both are null, return null.)
If neither is null, each result is convented to a string, and the result is the concatenation of the two strings, as if computed by sprintf "%s%s" value1 value2.

Note: this definition satisfies the following properties:

Associativity: (a; b); c = a; (b; c)
[];a = a;[] = a
("string1"; "string2") = "string1string2"
sprintf "%s" (a;b) = (sprintf "%s" a; sprintf "%s" b)

Note that a statement list delimits the scope of a declaration within it: See section Block structure. Also, if any declaration defined within the list is exported, the block evaluates to a record.

False and multiple values

Normally, an expression evaluates to a single result. However, an expression may yield NO results. This is interpreted as "false." All other results are "true." This is similar to the Icon programming language Raising an exception is the same as yielding no results.

Operations that are consider "boolean" conventionally return the empty sequence to signify truth. (The rationale is that we don't constraints that are true to print anything.) (Contrast this to Lisp, where the empty list means false.)

An expression may also yield multiple values, perhaps using the | operator:

Q1> 3|4
3|4
Q2> print_readable:=1
Q3> (3>2)|5
[]|5
Q3> (3<2)|5
5

If the backtracking implementation of multiple values is used, multiple results are yielded on demand.

The implementation of multiple results is very liable to change.

The if-then-else construct has the folowing syntax:

Q4> if 3 > 4 => 5 || 6
6

Logic variables are also implemented. The _ evaluates to a new unbound logic variable cell. There is some preliminary support for logical constraints.

Iteration

Q does not have any special forms for writing loops. Instead, as a number of forms for manipulating sequences. These can be combined in various ways to express iterations.

Examples: ...

Function: X do

Get the elements of X, in order, until the end is reached. All the the elements are the converted to strings, which are then concatenated together. Informally, the same as: (X 0; X 1; X 2; ...).

Usually, do is evaluated for the side effects, when all the elements evaluate to "".

Numbers

Q has a fairly complete set of numeric types:

Integers

Integers can be arbitrarily large.

A fairly complete set of operations on Integers has been implemented.

Other bases than 10 can be specified with the r prefix:

Q1> 16rff
255

For convenience, the 0x prefix used in C is also recognized as being the same as 16r:

Q2> 0xFF
255

But a 0 prefix does not indicate an octal number: 0177 = 177

The variable print_base controls the base used for output:

Q3> print_base:=16
Q4> 255
FF
Q5> print_readable:=1
Q6> 255
0xFF
Q7> print_base:=8
Q8> 255
8r377

(Implementation note: Arbitrary-precision unsigned integers are implemented using the GNU Multi_Precision Package. Twos-complement integers are implemented using this package. FixInts are BigNums that fit in one word (usually, 32 bits). The most common FixInts (about -100 to +1000) are pre-allocated.)

Fractions

Fractions are always normalized:

Q1> 6/4
3/2

Floating point

Basic arithmetic on floating-point numbers is implemented, but many functions (such as the transcental ones) are missing.

Complex numbers

Complex numbers may be constructed with the %+ operator:

Q1> 3+%4 * 0+%1
-4+%3

Basic arithmetic on complex numbers is implemented, but many functions (such as the transcendental ones) are missing.

Units

There is preliminary support for "quantities" i.e. numbers involving "physical" units and dimensions:

Q1> 2ft+2yd
8ft

Infinities

Arithmetic

Standard arithmetic operators: a+b a-b a*b a/b

Q1> -3 gcd 6
3

Logical operations

These operations work on integers. Each operand is treated as an semi-infinite list of the bits in the integers twos-complement representation.

All 16 bit-wise operations of two integers are available. These can be specified either by the name of the operation, or by its "number":

x bit'opname y x (bit opnumber) y

x bit'clr y == x (bit 0) y: always 0 x bit'and y == x (bit 1) y: "and" of x and y etc

a shift amount == floor (a * 2 ** amount)

Base conversion

[x0 ... xn] decode y [x0 ... xn] decode [y0 ... yn] [xn .. x0] base y

NOT IMPLEMENTED YET: X encode Y X antibase Y

Symbols

A symbol is a atomic identifier. It is denoted with a ' followed by a word.

For example, 'foo is a symbol.

You can create symbol at runtime from a string. If s has the value "abc", then '$s evalues to 'abc.

A package system as in Common Lisp has been largely implemented, but isn't really used yet.

A character object is actually a kind of symbol that is interned in the alphabet package. (Needs fixing!) There are no character literals. Use an expression such as ("A"0) instead.

Mappings

A mapping maps a key into a value. A sequence is a special case of a mapping where all the keys are consequtive integers starting at 0. For example, the vector [5 4 3] is a mapping with 3 values, and whose keys are 0, 1, and 2.

Sequences

A sequence is a finite or infinite list of objects. The objects are numbered starting with 0.

There are a number of different kinds of sequence, but these are considered to be different ways of implementing the same abstract sequence type.

Higher-order arrays are considered to be sequences of sequences. (But ...)

A vector can be constructed with the [] constructor:

Q2> [[2 3 4] [5 6 7 4+4]]
2 3 4
5 6 7 8

Indexing uses an implicit "adjacency" operator, just like function calls:

Q1> [3 4 5] 2
5

Function: size seq

Returns the number of elements in seq: size [3 4 5] is 3.

Function: seq length

Returns the number of elements in seq: [3 4 5] length is 3.

Function: Q vector N

Allocate an updatable fixed-length vector of length N. Initialize the elements using the sequence Q. Default value for N is the Q length.

Indexing

To select the n'th element of a sequence s do: s n. Indexing starts at zero, thus the first element is s 0.

Negative subscripts count from the back, where s -2 is the last element, s -3 is the penultimate element, and so on. The index -1 represents the non-existent element at the end of the sequence.

Sometimes it is more helpful to think of the indexes as marking positions between elements. Thus index 0 represents the beginning of the sequence, index 1 is the position just after the first element, and so on. Thus a non-negative index counts the elements from the beginning of the sequence. Then the index -1 represents the end of the sequence (the position just after the last element), and a negative index represents the number of elements from the end of the sequence, counting down from -1.

If an "index" is a sequence, it is used to select a sequence of elements: S [I1 I2 ... In] evaluates to: [(S I1) (S I2) ... (S In)].

If extra arguments are given, they are distributed: S [I1 I2 ... In] X ... Z evaluates to the same value as: [(S I1 X ... Z) (S I2 X ... Z) ... (S In X ... Z)]. (However, the expressions X ... Y are only evaluated once. Also, the result applications are evaluated on demand.)

This feature is useful to get a "slice" of a multi-dimentional array: A [3 4] [1 2] evaluates to: [[(A 3 1) (A 3 2)] [(A 4 1) (A 4 2)]].

Selection

The where operator is used to select specified elements of a sequence: Function: Q where B

Here Q is a sequence of values, and B is a sequence of conditions. The result is those elements of Q whose corresponding element of B is true.

As a convenience, if an element of B is a function it is first applied with the corresponding element of Q as left argument. For example: Q where {F x} == Q where {Q? F x}.

As usual, if either argument is not a sequence, it is extended to a sequence. Thus the following selects the positive element of Q:

Q where (> 0)

since it is the same as Q where {> 0}, which is the same as Q where {Q? > 0}.

The result of a where is a lazy sequence. (??).

Truncation

Function: Q while B

The while operation is used to select the initial elements of a sequence, but leaving out the first element that fails to satisfy a condition, as well as all subsequent elements.

Truncates Q from with the first element (inclusive) that fails the condition B. See the usage of where for more details.

Function: Q until B

Similar as while, but with the condition negated: Truncate Q leaving out all elements after the first element that does satisfy B. Note that the first element (only) that satisfies B is included in the output.

Note that while can be used in place of the while loops in many programming languages: {expr1} while {expr2} do corresponds exactly to C's: while (expr2) { expr1; }

Similarly, until is like the repeat ... until or do .. until of many programming languages: {expr1} until {expr2} do corresponds exactly to C's: do { expr1; } until (expr2);

Q2> [3 4 5 -4 8 -1 10] while (> 0)
3 4 5
Q3> [3 4 5 -4 8 -1 10] until (> 0)
3
Q4> [3 4 5 -4 8 -1 10] until (<= 0)
3 4 5 -4

Enumerations

Function: first for count

The for operator returns a sequence containing a sub-range of the integers:

Q1> 1 for 10
1 for 10
Q2> (1 for 10)+100
101 102 103 104 105 106 107 108 109 110

Function: init by step

Returns an infinite sequence of integers starting with init, increasing by step.

Function: Q upto Limit

Same as: Q while {<= Limit} Example: 2 upto 5 == [2 3 4 5]. I upto Limit == (I by 1) upto Limit. Example: 1 by 2 upto 10 == [1 3 5 7 9]

Function: Q downto Limit

Same as: Q while {>= Limit} I downto Limit == (I by -1) downto Limit. Example: 5 downto 2 == [5 4 3 2].

Function: I to J

I to J is like I upto J-1. That is, the upper bound J is not included. Furthermore, the result is recognized specially by indexing. Thus for a sequence X, the expression X (I to J) select the sub-sequence of X between the indexes I and J. One or both of I and J can be negative, meaning indexing from the end of X. For example, X (I to -1) is the same as X drop I, if I>=0.

Strings

A string is a sequence of characters.

String constants are written using double quotes:

"This is a string"

A dollar-sign ($) in a string constant prefixes an expression that should be evaluated. The result is converted to a string, and inserted in place:

:a="abc"
"XX$(a)YY" # Evaluates to "XXabcYY"

Special characters can be notated using \, as in C. There are a few useful extensions:

\^X -- represents the control character Ctrl-X.

A newline following a \ means to skip the newline, and any following spaces or tabs. A space or tab following a \ means to skip all spaces and tabs, then, if the next character is a newline, skip the newline, and any following spaces and tabs.

String Matching

Macro: string match pattern [replacement]

Match string against the regular expression pattern. If string matches the entire pattern, return replacement, except that substitution occurs in replacement, as discused <<below>>. If the match fails, raise a comparison failure.

The string is evaluated; pattern and replement are not.

Regular Expression Patterns

Certain special characters in a word have special meaning.

? means to match any single character, except NewLine and NUL. Additionally, when matching against files names (as in a command line), the ? doesn't match a / or an initial ..

* means means to match any number of characters, with the same exceptions as for ?.

*(pattern) means to match any number of matches for pattern. Note that the left parenthesis must explicitly follow the star. This feature subsumes some of the use of the Unix find program when its used for matching files names:

Q1> ls -i ../dld/*(*/)Makefile
 53639 ../dld/Makefile                   51872 ../dld/test/overlay/Makefile
 32863 ../dld/test/Makefile              36295 ../dld/test/reload/Makefile
 64017 ../dld/test/add1/Makefile         53631 ../dld/test/simple/Makefile
 41491 ../dld/test/general/Makefile

?(pattern) means to optionally match the pattern (i.e. zero or one times).

+(pattern) means to match one or more matches of pattern.

[charset] means to matches any single character in the charset.

pattern1|pattern2

Quoting a character with \ turns off globbing for it:

Q1> echo Makefi\*
Makefi*

Alternatively, you can use string quotes:

Q1> echo "Makefi*"
Makefi*

Files

Unix and many other operating systems view files as just an unstructured sequence of bytes, Q views such a files as a string (which just happens to be stored in the file system, instead of in main memory).

A word that begins with / or ./ or ../ is considered to be a filename. The word is not evaluated.

As a convenience, an identifier that is unbound, but which names an existing file, is also treated as a file name.

Note that filenames follow the emacs convention, where two /-es means start again from the file system root, ignoring anything before. Thus foo//bar/baz means /bar/baz.

Hence, if the variable fn is a string, then ./$(fn) is the file named by the string fn.

An example of the use of file objects:

./a := ./b

will copy file b to a, just like

cp b a

Actually, since no flags are preserved, the semantics are more like the shell command:

cat <a >b  # /bin/sh syntax, not Q syntax!

One way to byte-reverse a file:

Q1> size ./Makefile
110
Q2> ./Makefile (109 downto 0)

Z.rat.Q>c- sserpmoc|derat-non elifekaM
\ tset knilQ elifekaM/xinu cod im derat-non X- - fc- v- rat
:Z.rat.Q
:Z.rat.Q

Cyclic infinite sequences

A cyclic sequence is an infinite sequences whose values repeat in a cyclic manner.

Function: F cyclic C

Both F and C are finite sequences. The result is an infinite sequence starting with the elements of F, followed by an infinite number of copies of C.

Function: Q finite_part

Function: Q cycle_part

Function: Q finite_length

Return the length of the finite prefix of Q.

Function: Q cycle_length

Return the length of the repeating prefix of Q. If Q is finite, return 0.

Lazy vectors

A lazy vector uses the {} construct. This takes an expression that gives the vector element corresponding for each index (given by the ? construct). Each element is only evaluated when it is needed, but once it has been evaluated, its value is remembered for the future:

Q2> :a = [3 4 5]
Q3> :b = {a? + 100}
Q4> b    # No elements are known yet.
 ...
Q5> b 2
105
Q6> b
_ _ 105 ...
Q7> b 1
104
Q8> b
_ 104 105 ...

Note that the operation ? form is the "inverse" of the operation {...}. That is: {exp?} computes the same value as: exp, if exp evaluates without side-effects to a sequence. In fact, if the form ? follows an expression, that expression is evaluated once only (eagerly), not once per item (lazily). The compiler is essence treats {... Exp? ...} as if it were: :Temp=CoerceToSequence(Exp); {... Temp? ...}.

Identifiers declared directly (i.e. not further nested within parentheses or function definitions) in a {...} form are visible to all {...} in the same block.

Re-structuring

Function: q reshape k

k is a vector of integers, which define the bounds of the resulting array. The elements of the aray are taken from the sequence q. (Similar to APL's rho operator.)

Atoms

(This feature may be removed in favor of using non-intered symbols.)

The expression :%name evaluates to a new atom, having the name given by the identifier name. The atom is different from all other atoms even those having the same name. The name is bound to the new atom globally within the module containing :%name. Therefore, a module cannot contain duplicate instances of :%name for any name. (Also note that the new atom is allocated at compile time. There should also a run-time genatom string.)

Union objects

The expression s || t evaluates to a union type.

y POSTFIX (a || b || ... || z) means (U y a) | (U y b) | ... | (U yz) where (U y a) is y if a is scalar and y POSTFIX a if a is non-scalar.

Declarations

The form :var is used to declare a new variable var. It is an expression which evaluates to the "value" of var. That value may be unknown when we evaluate :var, but at some point in the future it may get a value, so what we get is a "potential" value. Operationally, :var creates a new memory location, binds name name var to that location, and returns that location.

Usually, the first thing we may want to do with a new variable is to give it a value. This is easy using the = operator: :x = 10 declares that the variable x has the same value as 10. Operationally, the = operator is implemented as unification, which causes the value 10 to be placed into the memory location for x. (You can think of the expression x=y as a specification that x and y are equal.)

Through the power of unification, we can do complex pattern-matching: [:x :y]=[4 5] has the effect of setting x to 4, and y is set to 5. Here, [4 5] is a list containing the two elements (4 and 5), and [:x :y] yields a list whose two elements are the locations for x and y. Then the unification specifies that the two lists are equal, which means that x must equal 4 and y must equal 5.

Visibility

Q is statically scoped. A variable x is visible in the entire scope containing its declaration :x, not just after the :x is evaluated. So the "memory location" for all variables declared in a scope are actully allocated when the scope is entered. There can be only one defining instance of a variable in a scope (i.e. the name of variable must be prefixed by : once and once only). It is conventional that the defining instance be the first one, but that is not required. (It may not be possible in the case of mutually recursive function.) An example: (:x = y; :y = 10; x) evaluates to 10.

(We can say that all declarations are mutually recursive. The one exception is for Q commands typed in interactively. In that case, the declaration is not visible until the command contianing it has been read in, of course!)

Block structure

A declaration (i.e. a form like :identifier) is owned with smallest block containing it. The declaration is visible ("in scope") inside the entire block (unless it is overridden by another declararation inside a smaller block).

Blocks are: (incomplete list - may change slightly):

A statement sequence (see section Statement sequences); i.e. a list of two or more expressions, separated by semi-colon or new-line. (If you want a block with just a single expression, add an extra empty expression: ;expr.)
The body of a function definition.
A function definition comprising its parameter list and its body.
A program (source file).
The interior branch of certain control structures (such the the branches of an if statement). (The details have not been formalized yet.)

Example: The expression [:x x] will match a two-element sequence if and only if the two elements match. Thus [:x x]=[3 3] succeeds, but [:x x]=[3 4] fails. These expressions also declare x in the current (surrounding) scope, because [:x x] is not a block. This may be undesirable; we would like to limit the scope of the x. Just ([:x x]) will not prevent the unwanted declaration, but (;[:x x]) will do it.

Exporting declarations

(NOTE: WORK IN PROGRESS)

A declaration may have the form :|identifier instead of the usual :identifier. The presence of a | declares the identifier to be exported.

If a block owns (contains) a declaration that exports a name, the block is said to export that name. The result of a block that exports one or more names is a structure that binds each name to the whatever value it gets in the block. One can later use the notation structure'name to extract the bound value.

For example, the block in the parentheses here:

:B=([:|a :|b]=[100 200]; :x=10; :|c=(x+a))

exports the names a, b, and c.

The block evaluates to a record with those three names. Hence, we get that B'a=100, B'b=200, and B'c=110.

Note that the presence or absence of an exported declaration makes all the difference:

Q1> (:a=3; :b=a+1)
Q2> (:a=3; :|b=a+1)
[b: 4]
Q3> (:|a=3; :|b=a+1)
[a: 3 b: 4]
Q4> (:a=3; a+1)
4
Q5> (:|a=3; a+1) # Bad!
[a: 3]

Note that line 5 is poor style: None of the statements in an exporting block should evaluate to other than the null sequence. (This will probably be prohibited be some point.)

If there is an export declaration in a parameter list, the name is added to the export list of the body. (More ...)

Modules

(NOTE: WORK IN PROGRESS)

A Q program can either be typed in as interactive commands, or can be read from a file. A Q program in a file should have the form of an exporting block. Such a program is called a module. (By the way, a session of interactive commands is also a kind of module.)

You can "load" in a module either by naming it on the command line, or by using a load command.

Function: load module_name

Note: the module_name is quoted.

The named module is searched for (using search rules not yet specified). The searching can resolve to a source file or a binary (previously compiled) file. Either way, the value resulting from the load is (normally?) a record.

Loading a module involved a number of steps. If the module_name is a plain identifier or simple quoted string literal, then the initial parts of loading is done at parse time. This is normally the case. It is also required when there a several modules that mutually reference each other.

Loading a source files includes parsing, tree-traversal, and evaluation. If the module_name is known, then the module is parsed just after the load is parsed, and tree-traversal is done at the same time as for the module containing the load. In that case, the names exported by the module are known early, and the names can be inherited int some other block. Otherwise, all of the actions have to be postposed until evaluation time, and the type of the module is unknown until then.

Loading a binary file includes reading in the text and data segments, remembering the labels defined and needed, performing relocation, and running initialization code. If the module_name is known, most of these steps can be done at parse time.

Functions

Here is a simple example of how to define a new function:

Q1> :(Factorial :x) = if x=0 => 1 || x * (Factorial x-1)

This is the classical factorial function. Here is how to call it:

Q2> Factorial 30
265252859812191058636308480000000

Prefix and postfix functions

A prefix function can take any number of parameters specified after the function name:

:(F :x1 :x2)=x1-x2   # Definition of function F.
F 10 3               # Call of function F; yields 7.

A postfix function takes a single parameter specified before the function name:

:(:arg G)=arg*arg    # Definition of function G.
5 G                  # Call of function G; yields 25.

It is also possible for a function to take parameters specified both before and after the function name. This allows infix functions:

:(:x mod :y) = x - y*floor(x/y)
-8 mod 3 # Yields 1.

The arguments before the function name are called the left parameters, while those after are called the right parameters. In the mod example, x is a left parameter, and y is a right parameter.

Having both prefix and postfix functions could lead to ambiguity. See <not written> on the details.

Function definition syntax

A function definition has this general form:

:(positional_params name positional_params keyword_params)=body

Here name is the name of the function being defined; the first positional_params is the list of parameter specifiers for the left parameters; while the second positional_params together with the keyword_params is the list of parameter specifiers for the right parameters.

A positional_params is a list of positional_param, with one optional tuple_param.

Parameter lists

Parameter lists appear in a number of places:

Formal parameter lists.
Class definitions.

In general. each parameter is an expression which matches the value supplied when a function is called or an object is created. However, these expression may be interpreted in non-standard ways.

The most common parameter has the form:

:ident~type

Here, ident is an identifier, and type is some type-expression. This means that the actual parameter is coerced to the type type, and the result becomes the value of ident.

Tuples

Given a sequence X, the form X@ yields a tuple of the elements of X. A tuple is a list of elements, just like a sequence. The main difference is that a tuple of one elements is equal to that element: [X]@ = X. Tuples are not sequences, though you can convert between them with loss of information.

Tuples are used for the parameter lists of functions, as well the results of multi-valued functions. They are volatile, not real objects, and cannot be assigned to variables or be parts of data structures. (1)

If a parameter is a tuple, it is treated as if there were many parameters, one for each element of the tuple. Thus the following are equivalent:

F 10 [1 2 3]@ 20
F 10 1 2 3 20

In both cases, F is applied with 5 arguments.

In this respect, the list constructor form [...] is considered to be a function. Thus: [10 [1 2 3]@ 20] evaluates to [10 1 2 3 20], a sequence of 5 elements.

Note that [...] @ cancel each other: If X is a sequence, then: [X@] == X. If T is a tuple expression (there are no tuple objects, strictly speaking), then [T]@ == T.

A formal parameter can also be a tuple. This is how you define functions that take variable number of arguments. The tuple formal is bound to the arguments that are "left over" after the other arguments have been bound.

Q1> :(f :x :T :y)= printf "x=%S; T=%S; y=%S" x T y
Q2> f 2 3 4 5 6 7
x=2; T=[3 4 5 6]; y=7

Only one formal can be a tuple. Note that a tuple formal gives sequence value to the parameter name (T, in the example). This is how it works: The formal :T@ is unified with the sub-tuple [3 4 5 6]@ from the actual argument list. After cancelling the @, we get the unification: :T = [2 3 4 5].

Note that @ is also use for reduction: See section Reduce. This usage is related: You can think of the reduction converting the argument sequence to a tuple.

Keyword tuples

(NOT IMPLEMENTED!)

Positional parameters correspons to normal tuples. But what about keyword parameters? One possibility is to defined keyword tuples.

An environment is a mapping from a set of names to values, just like a sequence is a mapping from (part of) the integers to values. The tupling operator @ yields a normal tuple from a sequence; we can also define it to yield a keyword tuple from an environment.

A keyword parameter name:value is a keyword tuple (with one element/binding).

There can now be two tuple formals. The first bundles up the unused positional parameters, and defines a sequence. The second (if any) bundles up the keyword parameters, and defines an environment. (In addition, there can be a tuple formal for left-hand arguments.) (If you want no positional parameters, just make the first tuple formal be []@.)

[x:3 y:4] == (:|x=3; :|y=4).

Macros

A macro is a function that is called at compile time. It is given some input expressions, and uses them to create a new expression. That expression is then compiled as if it were part of the original input.

A macro definition has the form:

:(macro left_arg name right_args)=body

This defines name to be a macro function bound to name. A macro function is just like a regular function, except that it is executed by the compiler at bind-time, not at run-time. (Bind-time is the stage after parsing. and before execution or dumping.) When the compiler sees name, it calls the function, passing the other expressions in the same application as parameters. The result should be an expression, which is then traversed further.

An example should make the idea clear, but first we mention the parse function which is very useful when writing macros.

Function: parse arg1 ... argn

Each argi must be either a string or an expression. Informally, all the arguments are concatenated together. The result is parsed, yielding an expression. When, during the parse, an argi that is an expression is seen, that expression is inserted into the resulting expression.

Here are the actual definitions from the Q runtime system, defining the cd macro, which is use to change working directories.

:|(macro cd :x@)= parse "__cd (quote " x@ ")"
:|(__cd :args)=external CdAction

The function __cd does the actual "work:" Its argument is a string vector, usually of length one, which names the desired new directory. An instance of cd:

cd ~/bin

The binder invokes the cd macro with an array whose single element is the expression parsed for ~/bin. The parse function is invoked, and the result is as if the user had written:

__cd (quote ~bin)

which evaluates to the same as:

__cd ["~bin"]

C interface

When a Q function is compiled, the compiler emits various data structures the describe the function object (such as the kind of parameters it expects). It also emits a C++ function for the actual body of the Q function. Normally, the compiler generates a name for the emitted C++ function, but if you write external name after the parameter list, the emitted C++ function will be called name.

Example:

:(foo :x) external Foo = 3 + x

This emits C++ code something like:

Root * Foo(Root *x)
{
   return /* Code to calculate 3 + x */;
}

// Data structures that define foo function object
// These include the symbol 'foo, and a pointer to Foo.

The compiler also emits a .h file that includes a declaration for Foo. You can include this .h is some other C++ source program file, and in that program you can call Foo as if it were written in C++.

The converse problem is when you have a function written in C++ that you want to use in Q. You can do this by writing a dummy function definition whose body is just extern name:

:(bar :x)=external Bar

You can then call bar is your Q code, just as if it were a Q function, assuming you provide it a C++ function something like:

Root *Bar(Root *x)
{
    // Code you write to implement bar.
}

In both forms of external the name can be a quoted string instead of an identifier. In that case the string is the assembler label for the generated/needed C++ function, as opposed to the C++ name of the function.

Operators

Reduce

(NOTE: WORK IN PROGRESS!)

To reduce a sequence using a binary infix function is to combine the elements of the sequence by place the operator "between" each element. For example, the reduction of [2 3 4] using + is 2+3+4.

The symbol @ is used for reduction. This use of @ is related to its use for making tuples (See section Tuples).

A recursive definition:

[] @OP R == R
(x,L) @OP R == x OP (L @OP R)

[] @OP == left_identity OP # If defined
[x] @OP == x
(x,L) @OP == x OP (L @OP)

L OP@ [] == L
L OP@ (x,R) == (L OP x) OP@ R

Note that right(?)-reduction OP@ uses the same syntax as tupling: X@. However, the former requires OP to be a binary function, while the latter requires X to be a sequence.

Note that @, is append, while @@, appends all the elements.

The operator | is syntax, not a function. However, it is convenient to define @| and |@ using the same definitions as before (NOT IMPLEMENTED). The left_identity and right_identity of | are assumed to be (;) (i.e. the empty sequence).

It is also convenient to define ~|@ EXPR to collect each result from evaluating EXPR, and make a sequence of them all.

Inverse

The inverse of a function f is written ~f.

The inverse depends on what kind of a function f is.

Generally, for a postfix or infix function f: if x ~f a ... is y, then y f a ... is x. If f is a prefix function (or is applied as one), then if ~f x a ... is y, then f y a ... is x.

If the operator ~ is applied infix, as in x~f, the result is the same as x(~f), except that any application rule for x is ignored.

Streams

A stream is a sequence of values combined with a current position. You can read (or write) only the sequence element at teh current position; however, the position is changed by various stream options described below.

Reading characters in from files or writing characters out to files uses character streams. These are often called files.

Function: stream get

Get the next value from the stream, moving the current position forward. At end of file, return ???.

Function: stream peek

[NOT IMPLEMENTED?] Same as get, not does not change the current position.

Function: stream get n

Get the next n values from the stream, returning a list of the n values read. If the there are less than n characters after the current position, just return as many as are available. (That is, if the stream was at end of file, return an empty array.) Move the current position forward as many characters as were read.

Function: stream put value

Write value to the stream; that is, set the current element to value and move the current position forward. Returns the null sequence.

Function: stream put value1 ... valuen

Same as:

(stream put value1; ...; stream put valuen)

Function: stream seek pos

Move the current position of the stream to the position pos, an integer offset relative to the beginning of the stream.

Function: stream relseek pos

Move the current position of the stream to the integer offset pos relative to the current position of the stream.

Function: stream endseek pos

Move the current position of the stream to the integer offset pos relative to the end of the stream.

Iterators

Formatted output

To format data as a string, do:

sprintf format_string arg_1 ... arg_n

The args are formatted according to formatting directives in the string format_string.

To format data as a string, and send the result to the output text stream out_stream do:

out_stream printf format_string arg_1 ... arg_n

If out_stream is omitted, it defaults to the standard output file. Except for some pathological cases (and efficiency), this is the same as:

out_stream put (sprintf format_string arg_1 ... arg_n)@

An example:

:x=12
sprintf "[x=%05d]" 12      # Yields the string "[x=00012]"

The text "%05d" is a format directive which indicates that the argument value (in this case 12) is to be written using 5 decimal digits, filled with zeros on the left. The rest of the format string ("[x=" and "]") is emitted verbatim into the output.

A format control string is written in a form (syntax) similar to the printf control strings in the C language. However, the semantics is closer to the format control strings in Common Lisp: formatting is is quite powerful, and can take arguments that are typed.

Format interpretation

Each character in a format string is sent verbatim to the output, until a percent-sign (%) is seen. The % indicates the start of a format directive. The format directive is read and interpreted according to the rules given later. This usually "uses up" one or more of the given format arguments. Then the remaining characters of the format string are sent to the destination, with remaining format directives interpreted as before. This continues until there are no more characters in the format string.

Format directives

A format directive begins with a %. Then there may be an optional a #, which usually means that the argument should be printed in a "readable" format (that is: if the output is read back, the result should be equal to the orginal argument).

Finally, the format directive is terminated by a single format specification characer, which species what kind of formatting to do.

The format directive %% uses no arguments, and outputs a single %.

Formats for numbers

These all use a single argument, which should be a number.

%pd ("decimal") Print the argument as a decimal integer using at least p degits. The default for p is 0.

%pf ("fixed-point float")

%pe ("exponential float")

%pg ("general float")

Formats for general objects

%a ("anything") Print the argument (any type) in a nice format. Do not print quotation marks or other escape characters. (Similar to Common Lisp's ~A.)

%s ("string") Same as %a. %-width.precisions Print the first precision characters of the printed representation of the argument, followed by enough spaces to bring the total length to width, The default for width is 0, and for precision it is -1. If the - is missing, remove excess width at the left, and add needed padding at the left.

%#s Print the argument in a format that can be read back in. (Similar to Common Lisp's ~S.)

%c ("character") Same as %c, except that if the argument is an integer, its character value is printed instead.

%#c Same as %c, except that if the argument is an integer, its character value is printed instead.

Complex formats

%? Indirection The ? format directive uses up two arguments. It calls format recursively, using the first argument as a format control string, and the second argument (which must be a sequence) as the list of arguments to process.

Q1> printf "%s%?%s" "Before<" "(%d;%#x)" [10 20] ">After."
Before<(10;0x14)>After.

%#? Indirection. Same as %?, except that one one argument is consumed, which is used as a format control string to a recursive call to (s)printf. The recursive call may use arguments from the original call:

Q2> printf "%s%#?%s" "Before<" "[%d;%#x]" 10 20 ">After."
Before<[10;0x14]>After.

%#count{actions%} Iteration Keep repeating the actions, until there are no more arguments. Repeat at most count times (default is infinity).

%count{actions%} Iteration Same as %#count{actions%}, but uses one argument, which must be a sequence. That sequence used whenever arguments are needed by the actions.

Q1> printf "%{(%s)%}" [3 4 5]
(3)(4)(5)
Q2> printf "%#{(%s)%}" 3 4 5
(3)(4)(5)

%^ Break from iteration

Records

(THIS IS WORK IN PROGRESS!)

A structure expression is a statement list, where at least of one the identifiers declared is exported.

Assignment

An assignable variable is an object that has changable state.

NEW creates a new assignable object. Its value can be changed with the assignment operator: :v = NEW v := 10 :u = NEW u := v + 5

V value extracts the current value of V.

The :=> operator is as follows: V :=> a ... z assigns V to a, the old value of a to b, and so on, finishing by assigning the old value of y to z. The returned value is the old value of z. All of the assignments are done in parallell.

To increment an assignable variable, use the +:= contructor, as in x +:= 2 or x(+:=)1. This is a special of := used as a postfix operator, where the operand is a function. In that case, v F:= x1 .. xn is the same as: v := ((v value) F x1 .. xn) except that v is only evaluated once.

System interface

A design goal Q has to make it convenient to use as a command language. It has many of the features found in typical shells (sh, csh, bash, as well as languages like perl and rexx), and it can interact with a Unix-like system in many of the same ways.

Running programs

To run a program, do: run program args. The program (which is quoted) must name an executable file. The args (which are also quoted) are passed to the program.

Q1> run wc Q.C
     308     950    7986 Q.C

If the program is a simple identifier that is not in the scope of a Q variable, and the identifier names an executable file in the current search path, then the run may be omitted.

Q2> wc Q.C
     308     950    7986 Q.C

(Note that determination that program is an executable program is done at compile time, so you cannot omit run if the program hasn't been created yet.)

The standard output from run is coerced to a string, which you can use as you use any other string:

Q3> :X = wc Q.C
Q4> printf "{%#a}" X
{"     308     950    7986 Q.C\n"}

The left-hand argument to a run command is coerced to a string, and the result is used as the standard input of the program:

Q5> "Hello World!\n" run /usr/bin/tr a-zA-Z A-Za-z
hELLO wORLD!

If an argument contains the "globbing" characters *?[]()| these are use to match against file names, as in most shells:

Q1> echo Makefi*
Makefile Makefile~

(echo is a builtin command that just prints into arguments.) The globbing syntax is an extension of that used for most shells; see See section Regular Expression Patterns for details.

Tilde expansion is also done.

Q1> "ab cde fgh" run wc
       0       3      10

To replace Q by program, do: exec program args.

By the way, before execution Q translates run program args into something very much like: primitive_run "program" (globlist (quote args)). (However, you can't call invoke primitive_run directly.)

Job control

If you run Q interactively, you can run multiple jobs at once.

Macro: fg [job]

Move the specified job into the foreground.

Macro: bg [job]

Continue executing the specified job in the background.

Macro: jobs

List current jobs

You can interrupt the current forground job with ^Z or kill it with ^C.

Macro: command bg

Execute the current command in the background. (Most shells use command& syntax for this.)

Current directory

Function: pwd

Returns the current working directory (as a string).

Macro: cd dir

Charges the current working directory to dir (which is quoted).

If dir is relative (does not begin with a /), and the CDPATH environment variable is defined, then cd will search each of the directories in CDPATH for a sub-directory dir. Here, CDPATH is a string of directory names, separted by colons; it should begin with ..

cd will try to hide symbolic links: If you cd down a symbolic link, and later cd .., you will get back where you started. In contrast, the chdir routine will follow the physical .. link in the file system.

If dir is -, it is replaced by the value of $OLDPWD. Thus cd - is the same as cd $OLDPWD.

Macro: cd

Change the current working directory to the user's home directory.

Function: chdir dir

Charges the current working directory to dir (which is evaluated). If dir is "", same as chdir env'HOME. The chdir routine is a low-level routine that ignores CDPATH, and does not so anything special for symbolic links.

Variable: PWD

Current working directory as set by cd or chdir.

Variable: OLDPWD

Previous working directory as set by cd or chdir.

Miscellaneous

Variable: argv

The arguments in the command line used to invoke Q are in argv, which is an array of strings. The first element (i.e. argv 0) is the name of the program running, usually just Q.

If Q is invoked on as file (as in: Q -flags file arg1 ... argn), the value of argv is the file name followed by the non-flags arguments: ["file" "arg1" .. "argn"]. This is compatible with most shells.

Variable: env

The "environment" passed to Q (in the Unix sense; see "man 7 environ") is available in the variable env. This is an updatable table, containing strings that are indexed by symbols. When a program is executed, the environment passed to it is taken from env. For example:

Q1> env'TERM
dumb
Q2> env'TERM:="xterm"
Q3> run /bin/sh
$ echo TERM= $TERM
TERM= xterm

To remove an element from the environment, assign NONE to it:

env'TERM:=NONE

Function: exit [code]

Terminates the Q program. (Same as the Unix call exit(code).) The default for code is 0.

Function: quit

Same as exit 0.

Macro: echo [-qn] words...

The echo builtin prints its argument to the output. The -n flag suppresses a final new-line. The -q flag formats each argument as a quoted string literal that can be read in again:

Q1> echo prefix(XX YY)"\012"postfix
prefixXX
postfix prefixYY
postfix
Q2> echo -q prefix(XX YY)"\012"postfix
"prefixXX\npostfix" "prefixYY\npostfix"

Macro: quote words...

Informally, the quote macro puts string quotes areound each word in words...; return resulting code evaluates to a vector of strings.

For example quote 4+9 is $(4+9). becomes ["4+9" "is" "$(4+9)."], which evaluates to the vector: ["4+9" "is" "13."].

These routines are used internally (by run and other commands): Function: glob pattern

Return the list of file names that match the pattern. The input should be a string, and the result is a vector of strings. If there is no match, the result is [].

Q1> printf "<%S>" (glob "parse*")
<["parserule.o"
"parsemacros.o"
"parse.o"]>

Function: globlist names

Call glob on each element of names, which should be a sequence of strings. Append all the results into one long vector of strings. However, any elements of names that have no matches, are inserted directly into the output list (after parentheses and quoting characters have been removed).

Q2> globlist ["*.b" "parse*" "foo\"+\"(abc)"]
*.b
parserule.o
parsemacros.o
parse.o
foo+abc

This behavior mimics that used by /bin/sh.

Lisp support

An expression in Lisp/Scheme syntax may entered by prefixing it with the Scheme keyword. The expression is read using a programmable Lisp reader. Then the resulting Lisp object is converted to a Q expression for evaluation.

Q1> print_readable:=1
Q2> Scheme (quote (:foo #(3 4 5)))
['foo [3 4 5]]

To print an object in Lisp/Scheme syntax, set the print_lisp variable:

Q3> print_lisp:=1
Q4> Scheme '(:foo #(3 4 5)))
(:foo #(3 4 5))

The only special form that has been implemented is quote, so the Scheme keyword has limited usefulness. A number of issues also need to be resolved, such as the representation of the empty list/sequence (Scheme 'nil vs. Scheme '() vs. []).

A number of functions from Common Lisp and the Scheme R3 report have been implemented. You can use them in Q code:

Q5> Scheme (symbol->string 'abc)
"abc"
Q6> (Lisp #'find-package) "common-lisp"
package "common-lisp"

Right now, you can also just name Lisp function directly in Q code, since Scheme, Lisp, and Q use the same global name space. This will probably change:

Q7> symbol\-\>string 'abc
"abc"

Note that -> had to be "quoted" with \s.

The following Common Lisp functions and values have been implemented: car, cdr, cddr, cadr, cons lispread (calls the Lisp reader) nil, t vector integer-length getprop, putprop, remprop, symbol-plist, symbol-package make-symbol find-package symbol-function, symbol-value, set, boundp, fboundp, makunbound, fmakunbound

The following Scheme functions have been implemented: car, cdr, cddr, cadr, cons vector, make-vector, vector-fill! symbol->string, string->symbol

The only recognized declare form is a non-standard one: (declare (external Foo)) in the body of a function declares that when the function is compiled, the name of the emitted C++ function should be Foo. (declare (external "Foo")) is the same, except that it declares that the assembler label of the function is Foo. If the there are no forms in the function following a (declare (external ...)) form then the function is assumed to be defined in some external C++ file. See section C interface

Lisp Number Operations

(Section numbers refer to: Guy Steel jr. "Common Lisp: The Language", second edition, 1990.)

The "Predicates on Numbers" (from Section 12.2) are all implemented.

Function: zerop number

True if number is zero.

Function: plusp number

True if number is positive

Function: minusp number

True of number is negative

Function: oddp integer

True if integer is odd.

Function: evenp integer

True if integer is even.

The "Logical Operations on Numbers" (from Section 12.7) are all implemented. These all consider an integer as a semi-infinite bit string.

Function: logior &rest integers

Take the "inclusive or" of the integers.

Function: logxor &rest integers

Take the "exclusive or" of the integers.

Function: logand &rest integers

Take the logical "and" of the integers.

Function: logeqv &rest integers

Take the "exclusive nor" (bit-wise logival equivalence) of the integers.

Function: lognand integer1 integer2

Same as (lognot (logand integer1 integer2)).

Function: lognor integer1 integer2

Same as (lognot (logor integer1 integer2)).

Function: logandc1 integer1 integer2

Same as (logand (lognot integer1) integer2).

Function: logandc1 integer1 integer2

Same as (logand (lognot integer1) integer2).

Function: logandc2 integer1 integer2

Same as (logand integer1 (lognot integer2)).

Function: logorc1 integer1 integer2

Same as (logor (lognot integer1) integer2).

Function: logorc2 integer1 integer2

Same as (logor integer1 (lognot integer2)).

Function: boole operation integer1 integer2

Does the bit-wise boolean operation indicated by operation, on the two operands integer1 and integer2. The operation is one of the 16 constants below.

Constant: boole-clr

Constant: boole-set

Constant: boole-1

Constant: boole-2

Constant: boole-c1

Constant: boole-c2

Constant: boole-and

Constant: boole-ior

Constant: boole-xor

Constant: boole-eqv

Constant: boole-nand

Constant: boole-nor