CADiZ

example Z specifications / A Z Specification of a Z Preprocessor


Introduction

A specification written in standard Z [FCD2] is comprised of a sequence of sections [Arthan95], each of which has a name and includes the paragraphs of other sections known as its parents. For compatibility with traditional Z [Spivey92], a sequence of paragraphs is accepted as comprising the sections of the mathematical toolkit and an anonymous section containing those paragraphs.

This document is a standard Z specification of a preprocessor for Z specifications. The preprocessor's job is to dispense with the non-standardised notion of file, in which sections are stored in a file system, and to permute sections into a definition before use order as assumed by the Z standard.

This specification assumes that a file contains a sequence of paragraphs: the preprocessor needs to distinguish formal and informal paragraphs, and to identify section headers, but it does not need to parse the Z text within formal paragraphs. There can be several sections within a single file, in which case it can be useful for the file to have several names (links in UNIX terminology).

Any references to parent sections that have not yet been read are presumed to be in files of the same name, and are read from there. Within a file, any formal paragraphs that are not preceded by a section header are treated as if there had been a section header whose name is that of the file and which has standard_toolkit as parent. This is similar to the treatment of anonymous sections in the Z standard. A file's name need not be the same as any of the sections it contains, in which case that name is useless from the point of view of finding parent sections, but it is useful as a starting point for a whole specification.

Specification

This specification makes use of the standard mathematical toolkit.

section preprocessor parents standard_toolkit

Data types

Strings are encoded in ASCII (to be compatible with the treatment of string literal expressions by version 3.13 of the CADi\num tool [cadizurl], which has checked this specification).

String ::= string \ldata seq (0 .. 127) \rdata

Names (of files and sections) are represented by strings. (The form of names would be irrelevant to this specification but for "standard_toolkit".)

Name ::= name \ldata String \rdata

Only certain kinds of paragraphs need be distinguished. Informal text between formal paragraphs is retained for possible display in the same order between the formal paragraphs. Section headers are treated like paragraphs in this specification.

Paragraph ::= Informal \ldata String \rdata
    | Section_header \ldata [name : Name; parent_set : \finset Name] \rdata
    | Formal \ldata String \rdata

The file system is modelled as a function from pathnames (formed of directory and file names) to sequences of paragraphs. This avoids having to specify the parsing of files of text. Section headers can have been distinguished by the section keyword.


Directory == Name

File_system ::= file_system \ldata Directory \cross Name \pfun seq Paragraph \rdata

Sections are represented as sequences of paragraphs in which an explicit section header begins each section.


Section ==
    { ps : seq1 Paragraph
    | head ps \in ran Section_header
    \land ran (tail ps) \cap ran Section_header = \emptyset )

Environment

This specification operates in an environment comprising: the file system fs; the current working directory name cwd; the name of the directory containing the toolkit sections toolkit_dir; and an environment variable SECTIONPATH giving the names of other directories from which sections may be read. The environment is modelled as the global state of the specification. Its value is not changed by the specification.


fs : File_system
cwd, toolkit_dir : Directory
SECTIONPATH : seq Directory

Functions

The function section_to_name is given a section and returns the name of that section. The name returned is that in the section header that is the section's first paragraph.


section_to_name ==
    \lambda s : Section @ ((Section_header ~) (head s)) . name

The function sections_to_parents is given a set of sections and returns the set containing the names of the parents referenced by those sections.


sections_to_parents ==
    \lambda ss : \finset Section @
    \bigcup { s : ss @ ((Section_header ~) (head s)) . parent_set }

The function filename_to_paras is given a search path of directory names and a file name and returns the sequence of paragraphs contained in the first file found with that name in the path of directories to be searched. If no file with that name is found, an empty sequence of paragraphs is returned (and an error should be reported by an implementation).


filename_to_paras : seq Directory \cross Name \pfun seq Paragraph
\where
\forall n : Name @
    filename_to_paras (\langle \rangle, n) = \langle \rangle
\forall d : Directory; path : seq Directory; n : Name @
    filename_to_paras (\langle d \rangle \cat path, n) =
    if (d, n) \in dom ((file_system ~) fs)
    then (file_system ~) fs (d, n)
    else filename_to_paras (path, n)

The function filename_to_paragraphs is given a filename and returns the sequence of paragraphs contained in the first file found with that name in the path of directories to be searched. The current working directory is always searched first, then whatever directories are explicitly listed in the SECTIONPATH environment variable, and finally the directory of toolkits.


filename_to_paragraphs ==
    \lambda n : Name @
    filename_to_paras (\langle cwd \rangle \cat SECTIONPATH \cat \langle toolkit_dir \rangle, n)

The function add_header reads the named file and prefixes its sequence of paragraphs with a section header if the file starts with an anonymous section. If the anonymous section has any formal paragraphs, it is named after the file, otherwise it is given a different name in case the first named section has that name.


add_header ==
    \lambda n : Name @
    let ps == filename_to_paragraphs n @
    (\mu pref, suff : seq Paragraph | pref \cat suff = ps \land
    ran pref \cap ran Section_header = \emptyset \land
    (suff = \emptyset \lor head suff \in ran Section_header) @
    if pref = \emptyset then \langle \rangle
    else if ran pref \cap ran Formal \neq \emptyset then
    \langle Section_header \lblot name == n,
    parent_set == { name (string "standard_toolkit") } \rblot \rangle
    else
    \langle Section_header \lblot name ==
    name (string ((string ~) ((name ~) n) \cat "informal")),
    parent_set == { } \rblot \rangle)
    \cat ps

The function filename_to_sections reads the named file and partitions its sequence of paragraphs into a sequence of sections.


filename_to_sections ==
    \lambda n : Name @
    (\mu ss : seq Section | \dcat ss = add_header n)

The function read_spec is given a set of names of files to be read and a set of sections already read from files. It returns the set of sections containing those already read, those read from the named files, and those read from files named as ancestors of other sections in this set. A file is read only if the named parent has not already been found in previous files and is not present anywhere in the current file; the parent section could be defined later in the current file, in which case any file with the name of the parent is not read. The sections should all have different names (otherwise an implementation should report an error); this specification merges sections that are identical.


read_spec : \finset Name \cross \finset Section \fun \finset Section
\where
\forall ss : \finset Section @
    read_spec (\emptyset, ss) = ss
\forall ns : \finset Name; ss : \finset Section @
    read_spec (ns, ss) =
    \mu ss2 == \bigcup { n : ns @ ran (filename_to_sections n) }
    | # ss2 = # (section_to_name \limg ss2 \rimg) @
    read_spec (sections_to_parents ss2 \
    section_to_name \limg ss \rimg,
    ss \cup ss2)

The function order_sections is given a set of sections and returns those sections in a sequence ordered so that every section appears before it is referenced as a parent. The function is partial because of the possibility of cycles in the parents relation, about which an implementation should report errors.


order_sections ==
    { ss : \finset Section; ss2 : seq Section
    | ran ss2 = ss
    \land (\forall ss3 : seq Section | ss3 prefix ss2 @
    { section_to_name (last ss3) } \cap
    sections_to_parents (ran (front ss3)) = \emptyset)
    @ (ss, ss2) )

The function preprocessor specifies the entire tool. It takes the name of a file, and returns the ordered sequence of sections from that file and the files of ancestral sections.


preprocessor ==
    \lambda n : Name @ order, { }))

Further work

  1. The consistency of this specification has not been formally proven.
  2. A task that is best done by the preprocessor, but has not been specified here, is the extraction from operator template paragraphs of a mapping from words to tokens for each section, to be consumed by the Z parser, as implicitly required by the Z standard.

The preprocessor has been implemented and is in use within CADi\num. Some small bugs were found in this spec - did you spot them? Cadiz reads any cumulus (-l) file first, and passes the names of the sections so obtained to zpp, which omits them from its output. The preprocessor permutes operator template paragraphs to the beginnings of their sections, so that operators can be used before being introduced in the typeset presentation. (This fails to allow an operator word to be used in multiple operators.) It also moves all glyph directives to before the operator templates. Cadiz checks the paragraphs in the resulting order, but permutes them back into the original order for typesetting; file and line directives are inserted by the preprocessor to enable this. Any file or line directives in the preprocessor's input are ignored. Quiet and reckless directives are recognised, and these modes are recorded as attributes of paragraphs, so that the mode can be set appropriately after permutation.

Acknowledgements

Sam Valentine advised on the use of Z in this specification.