Drafts

A program language should have (at least) have these two kinds of comments:

  • Comment extends to the end of the line.
  • Comment extends to a end comment delimiter. Such comments should nest, unlike the Java /* ... */ comments.

An interesting option for nestable comments is for the start delimiter to be #!. The end delimiter could be !#. This allows:

#!/bin/sh
exec kawa --options "$0" "$@"
!#
(define ....)
Created 14 Mar 2007 17:36 PDT. Last edited 14 Mar 2007 17:36 PDT. Tags: draft kawa

A pattern can matched against a value. If it matches, one or more variables may be bound to some part of the matched value.

Patterns can be used in various declaration contexts, include variable declaration, parameter declarations, and cases of a switch expression.

Abstract patern grammar

Here is a classifications of patterns why should support. The concrete syntax is not fixed.

Variables

The simplest pattern is a variable. This declares that variable, and it is bound to the value being matched against. Question: It may make sense to use a special syntactic marker to indicate a variable being declared, as opposed to being used.

Type specification

pattern!type

Conjunction

pattern1&pattern2...
This matches pattern1 against the target, possibly binding some variables. Then pattern2 is matches against the same target. The pattern2 may contain use variables bound in pattern1. Commonly, pattern2 will be a predicate or a type-specifier. In fact, perhaps having a special syntax for conjunction may not be useful, since it can be expressed using a predicate.

Predicate

{boolean-expression}
This matches if the boolean-expression evaluates to true. Typically, boolean-expression may contain variables declared previously in a conjunction. In fact, we could combine the syntaxes:
{pattern|boolean-expression}

Constructor

constructor-name(pattern1, pattern2, ...)
Created 14 Mar 2007 17:36 PDT. Last edited 14 Mar 2007 17:36 PDT. Tags: draft

Up: Kawa

Running commands

(run command arg ...)
The command is an executable program or script, and the arg are command line arguments. (For now leave it open if these evaluated or quoted.)

The result of standard output of the command is effectively redirected to a temporary file, and the contents of this file, viewed as a string or text object, becomes the result of the run expression. If the output consumer for the run is an output port then the command's standard output is re-directed to that port. In the initial case, the output consumer is the standard output stream of the containing JVM, so no redirection is needed.

The standard error output of the command is piped to the current error port. If the error port matches the initial error port, no re-direction is needed.

The standard input of the command is connected to the current input port of the dynamic context.

Discussion: An alternative is to define run so the output from the comm and is written to the current output port. One could then re-ify the output from a command with some kind of a with-output-to-string macro.

File name expansion

(glob regexp)

Return a set of Path values that match the regexp, as multiple values. The can be interpolated in a run argument list.

Created 3 Feb 2007 11:49 PST. Last edited 3 Feb 2007 11:49 PST. Tags: draft kawa shell
Extending Qexo/Kawa for updates

A number of people are interesing in extending XQuery for updates. Here are some useful notes.

Updating means at least two different things: Modifying an in-core node object, and modifying a node in a persistent xml data database. They're very different. Let's start with the former.

Qexo's node model

You might want to read the gnu.lists package descriptor for an overview of the concepts of Kawa's sequence and node objects. A node, in the XML sense, is represented as a pair of an AbstractSequence and an index. The index (a position value) is just a unique number managed by the AbstractSequence. There are a number of implementation classes that extend AbstractSequence, and use different ways of managing position indexes. The one used for XML nodes is a NodeTree, which is an extension of TreeList. The nodes of a document or document fragment are all in a a single NodeTree; each node is identified by a position index, which basically an index in the TreeList's data array, but with the lower-order bit used as a special flag. (See the above-mentioned descriptor in gnu.list.) When we need to create an object for a node, we use a KNode object. The idea is that most nodes aren't actively referenced, so we don't need an actual KNode object, which saves a lot of space.

Updating nodes in-place

To implement updating a node object in-memory we need to finish the update/insert/delete abilities in gnu.lists.TreeList. The latter class is basically a gap-vector (as used in emacs and Swing), but the data structures are more complicated because it stores a hierarchy, rather than just characters. Once we can update the TreeList, we will need an extra level of indirection. The reason is that node identity is tied to the position indexes, but editing a NodeTree causes the nodes in it move around. The solution is to use either StableVector or something similar. Unfortunately, StableVector doesn't currently support TreeList. Perhaps TreeList should be changed to extend GapVector.

A more abstract way to think of it: A Node needs to be a pair of a NodeManager and an index that is managed by the NodeManager. The actual underlying storage is in a TreeList, but since indexes in a TreeList change on updates, the actual Node indexes are indexes in the NodeManager. Each time you read a property of a node, you use the node's index, which is an index in the NodeManager. You use that index in the NodeManager position array, which gives us an index in the TreeList, and get the value from the latter. To update a node, we have to similarly dereference the index in the NodeManager to get an index the the TreeLists's data array, and update the latter. That may require things to move around in the TreeList, so the indexes in the NodeManager have to be updated.

Moving nodes from one document or fragment to another is tricky. The reason is that node indexes are relative to a TreeList. One solution is to use forwarding pointers. Another is a NodeManger that can handle multiple TreeLists.

Updating XML databases

Updating a XML files or a database is more complicated. One approach is reading an XML document, updating nodes in-memory, and writing out the modified document. That is practical for modest-sized XML documents, but expensive for small changes to large documents. Another issue is that it is difficult (but not impossible) to maintain node identity between the original document and the updated version, even for nodes that are unmodified.

Ideally, one would like to modify individial nodes in-place in the database. Thsi is doable in the Kawa node model. The basic idea is to create a AbstractSequence sub-class, which we might call DatabaseDocument. The DatabaseDocument would be a proxy for either the entire database, or an individual xml document. Each node has a database key. The DatabaseDocument object manages the mapping between position indexes and database keys.

Note there are positions of the Qexo run-time that assume nodes are implemented using NodeTree. They would have to be fixed to support general AbstractSequences.

Of course once one is updating a database we also have to deal with transactions and related ACID issues.

Created 3 Feb 2007 10:26 PST. Last edited 3 Feb 2007 10:26 PST. Tags: draft kawa xml

Use the standard \ to escape special characters, in both string literals, and outside. In general (outside string literals) a \ followed by a non-letter character makes that character be treated as a letter. E.g. \1\+2 is a 3-character identifiers consisting of the characters 1, +, and 2, even if the languages normally otherwise doesn't allow identifiers to start with digits or to contain +.

Letters don't need to be escaped, in either identifiers or names. So we're free to use \ followed by a letter for other purposes, including the standard C string escapes. I suggested at least the following:

\xNNNN - A Unicode escape. Terminated by the first character that is neither a digit or a letter. If that character is a space, it is ignored. Only a single space is ignored.

\n - A newline.

...

The string form of regular expressions should be compatible with this convention.

Created 3 Feb 2007 10:21 PST. Last edited 3 Feb 2007 10:21 PST. Tags: draft kawa
Tags: blog