Next: Terminal symbols
Up: Lexical analysis
Previous: Lexical analysis
Definitions
Definition _definition (Token)
A token is a part of the input which is separated by
word separators. These word separators may be part of the
token. A token can be a a word form, a
multi token word or a composed word which consisting of
one or more word parts.
Definition _definition (Word-form)
A word-form is a sequence of non-blank characters, starting
after or with a word separator, and ending before or with a
word separator. Therefore, each non-blank word separator
may be part of a word form, or be a word form by itself.
Definition _definition (Multi-token word)
A multi-token-word is a sequence of word forms separated
by blanks. Since it consists of word forms it may also include
word separators.
Definition _definition (Composed word)
A
composed word is a composition of
word parts conforming to
the following pattern:

This means a single infix can be a
word form, but a single prefix or
suffix can not!
Definition _definition (Word-part)
A word-part is a word form or multi token word with
an additional hyphen at the beginning or the end, or at both sides.
This hyphen has no liteal meaning, it only indicates a word part.
A word part with a hyphen at the beginning is called a
prefix, a word part with a hyphen at the end is a
suffix and an infix is a word part with hyphens at
the beginning as well as at the end.
Definition _definition (Blanks)
A
blank is one of the characters space, tab, or newline. Any
blanks or sequence thereof are always reduced to a single
space
2.1.
Definition _definition (Word separator)
A word-separator is a character used for indicating possible
boundaries between word forms. All blanks are considered
word separators. Furthermore the user can specify non-blank
characters, which indicate word boundaries, as word separators.
Definition _definition (Invisible character)
An invisible-character is a character that should be skipped
automatically if it cannot be matched. All blanks are considered
invisible characters. Furthermore the user can specify non-blank
characters which can be skipped if no match was found for them.
Next: Terminal symbols
Up: Lexical analysis
Previous: Lexical analysis
2000-01-10