CADiZ

Reference manual / Characters


Contents of this page

Introduction

At the most elementary level, a Z specification can be viewed as a sequence of characters. Given the ubiquity of ASCII keyboards, preparation of a Z specification may have to begin one step back from its sequence of characters, using a mark-up language such as LaTeX or troff. An understanding of the characters to which mark-up is converted is useful when preparing mark-up.

The characters comprising a Z specification are those of ISO/IEC 10646 Universal Multiple-Octet Coded Character Set (UCS). The code positions of characters in UCS are the same as their code positions in Unicode.

Characters are classified into several categories, such as LETTER, DIGIT, SYMBOL and SPECIAL, according to their UCS general property. This provides a basis for lexing a specification.

When characters are exchanged between tools, their code positions are encoded according to one of several alternative schemes. A specific scheme can be chosen using one of the following command-line options to the tools: -UTF8, -UTF16BE and -UCS4. The default scheme is UTF8. No other schemes are yet implemented by CADi\num. The well-known scheme UCS2 is not applicable to Z, as it cannot encode the \arithmos and \finset characters.

More discussion of these issues may be found in [Toyn02].

The characters are formalized below using syntactic metalanguage.

ISO Standard characters

Formal definition of characters

This formal definition is public domain material, and appears as it appears in ISO/IEC 13568:2002 (the Z standard).

ZCHAR = DIGIT | LETTER | SPECIAL | SYMBOL ;

DIGIT = DECIMAL
| ?other UCS chars with Number property but Number, Decimal Digit (as supported)?
;

DECIMAL = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
| ?any other UCS chars with Number, Decimal Digit property (as supported)?
;

LETTER = LATIN | GREEK | OTHERLETTER
| ?any characters of the mathematical toolkit with letter property (as supported)?
| ?any other UCS characters with letter property (as supported)?
;

LATIN = 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I'
| 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R'
| 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z'
| 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i'
| 'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r'
| 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z'
;

GREEK = '\Delta' | '\Xi' | '\theta' | '\lambda' | '\mu' ;

OTHERLETTER = '\arithmos' | '\nat' | '\power' ;

SPECIAL = STROKECHAR | WORDGLUE | BRACKET | BOXCHAR | NLCHAR | SPACE ;

STROKECHAR = ''' | '!' | '?' ;

WORDGLUE = '\nearrow' | '\swarrow' | '\searrow' | '\nwarrow' | '_' ;

BRACKET = '(' | ')' | '[' | ']' | '{' | '}' | '\lblot' | '\rblot' | '\ldata' | '\rdata' ;

BOXCHAR = ZEDCHAR | AXCHAR | SCHCHAR | GENCHAR | ENDCHAR ;

SYMBOL = 'ampersand' | '\vdash' | '\land' | '\lor' | '\implies' | '\iff' | '\lnot' | '\forall' | '\exists' | '/' | '=' | '\in' | ':' | ';' | ',' | '.' | '\project' | '\semi' | '>>'
| ?any characters of the mathematical toolkit with neither letter or
number property (as supported)?
| ?any other UCS characters with neither letter or
number property and that are not in SPECIAL (as supported)?
;

CADiZ-specific characters

CADi\num ``supports'' use of all UCS characters, each classified according to its general property. However, CADi\num is able to display only some characters in the most desirable form; others are displayed as nameplates showing their code numbers.

The CADi\num core language uses not only the characters enumerated in the above formal definition from ISO Standard Z but also '\xor', '\dagger', '\zovr' and '"' as additional characters in the SYMBOL class. The uses of these additional characters are documented in extensions. To check that a Z specification uses only ISO Standard notations, invoke cadiz with the -ws option.


IT 28-Jan-2002