summaryrefslogtreecommitdiffstats
path: root/WebKitTools/android/flex-2.5.4a/MISC/texinfo
diff options
context:
space:
mode:
Diffstat (limited to 'WebKitTools/android/flex-2.5.4a/MISC/texinfo')
-rw-r--r--WebKitTools/android/flex-2.5.4a/MISC/texinfo/flex.info2951
-rw-r--r--WebKitTools/android/flex-2.5.4a/MISC/texinfo/flex.texi3448
2 files changed, 0 insertions, 6399 deletions
diff --git a/WebKitTools/android/flex-2.5.4a/MISC/texinfo/flex.info b/WebKitTools/android/flex-2.5.4a/MISC/texinfo/flex.info
deleted file mode 100644
index 9269418..0000000
--- a/WebKitTools/android/flex-2.5.4a/MISC/texinfo/flex.info
+++ /dev/null
@@ -1,2951 +0,0 @@
-This is Info file flex.info, produced by Makeinfo-1.55 from the input
-file flex.texi.
-
-START-INFO-DIR-ENTRY
-* Flex: (flex). A fast scanner generator.
-END-INFO-DIR-ENTRY
-
- This file documents Flex.
-
- Copyright (c) 1990 The Regents of the University of California. All
-rights reserved.
-
- This code is derived from software contributed to Berkeley by Vern
-Paxson.
-
- The United States Government has rights in this work pursuant to
-contract no. DE-AC03-76SF00098 between the United States Department of
-Energy and the University of California.
-
- Redistribution and use in source and binary forms with or without
-modification are permitted provided that: (1) source distributions
-retain this entire copyright notice and comment, and (2) distributions
-including binaries display the following acknowledgement: "This
-product includes software developed by the University of California,
-Berkeley and its contributors" in the documentation or other materials
-provided with the distribution and in all advertising materials
-mentioning features or use of this software. Neither the name of the
-University nor the names of its contributors may be used to endorse or
-promote products derived from this software without specific prior
-written permission.
-
- THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
-WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
-MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
-
-
-File: flex.info, Node: Top, Next: Name, Prev: (dir), Up: (dir)
-
-flex
-****
-
- This manual documents `flex'. It covers release 2.5.
-
-* Menu:
-
-* Name:: Name
-* Synopsis:: Synopsis
-* Overview:: Overview
-* Description:: Description
-* Examples:: Some simple examples
-* Format:: Format of the input file
-* Patterns:: Patterns
-* Matching:: How the input is matched
-* Actions:: Actions
-* Generated scanner:: The generated scanner
-* Start conditions:: Start conditions
-* Multiple buffers:: Multiple input buffers
-* End-of-file rules:: End-of-file rules
-* Miscellaneous:: Miscellaneous macros
-* User variables:: Values available to the user
-* YACC interface:: Interfacing with `yacc'
-* Options:: Options
-* Performance:: Performance considerations
-* C++:: Generating C++ scanners
-* Incompatibilities:: Incompatibilities with `lex' and POSIX
-* Diagnostics:: Diagnostics
-* Files:: Files
-* Deficiencies:: Deficiencies / Bugs
-* See also:: See also
-* Author:: Author
-
-
-File: flex.info, Node: Name, Next: Synopsis, Prev: Top, Up: Top
-
-Name
-====
-
- flex - fast lexical analyzer generator
-
-
-File: flex.info, Node: Synopsis, Next: Overview, Prev: Name, Up: Top
-
-Synopsis
-========
-
- flex [-bcdfhilnpstvwBFILTV78+? -C[aefFmr] -ooutput -Pprefix -Sskeleton]
- [--help --version] [FILENAME ...]
-
-
-File: flex.info, Node: Overview, Next: Description, Prev: Synopsis, Up: Top
-
-Overview
-========
-
- This manual describes `flex', a tool for generating programs that
-perform pattern-matching on text. The manual includes both tutorial
-and reference sections:
-
-Description
- a brief overview of the tool
-
-Some Simple Examples
-Format Of The Input File
-Patterns
- the extended regular expressions used by flex
-
-How The Input Is Matched
- the rules for determining what has been matched
-
-Actions
- how to specify what to do when a pattern is matched
-
-The Generated Scanner
- details regarding the scanner that flex produces; how to control
- the input source
-
-Start Conditions
- introducing context into your scanners, and managing
- "mini-scanners"
-
-Multiple Input Buffers
- how to manipulate multiple input sources; how to scan from strings
- instead of files
-
-End-of-file Rules
- special rules for matching the end of the input
-
-Miscellaneous Macros
- a summary of macros available to the actions
-
-Values Available To The User
- a summary of values available to the actions
-
-Interfacing With Yacc
- connecting flex scanners together with yacc parsers
-
-Options
- flex command-line options, and the "%option" directive
-
-Performance Considerations
- how to make your scanner go as fast as possible
-
-Generating C++ Scanners
- the (experimental) facility for generating C++ scanner classes
-
-Incompatibilities With Lex And POSIX
- how flex differs from AT&T lex and the POSIX lex standard
-
-Diagnostics
- those error messages produced by flex (or scanners it generates)
- whose meanings might not be apparent
-
-Files
- files used by flex
-
-Deficiencies / Bugs
- known problems with flex
-
-See Also
- other documentation, related tools
-
-Author
- includes contact information
-
-
-File: flex.info, Node: Description, Next: Examples, Prev: Overview, Up: Top
-
-Description
-===========
-
- `flex' is a tool for generating "scanners": programs which
-recognized lexical patterns in text. `flex' reads the given input
-files, or its standard input if no file names are given, for a
-description of a scanner to generate. The description is in the form
-of pairs of regular expressions and C code, called "rules". `flex'
-generates as output a C source file, `lex.yy.c', which defines a
-routine `yylex()'. This file is compiled and linked with the `-lfl'
-library to produce an executable. When the executable is run, it
-analyzes its input for occurrences of the regular expressions.
-Whenever it finds one, it executes the corresponding C code.
-
-
-File: flex.info, Node: Examples, Next: Format, Prev: Description, Up: Top
-
-Some simple examples
-====================
-
- First some simple examples to get the flavor of how one uses `flex'.
-The following `flex' input specifies a scanner which whenever it
-encounters the string "username" will replace it with the user's login
-name:
-
- %%
- username printf( "%s", getlogin() );
-
- By default, any text not matched by a `flex' scanner is copied to
-the output, so the net effect of this scanner is to copy its input file
-to its output with each occurrence of "username" expanded. In this
-input, there is just one rule. "username" is the PATTERN and the
-"printf" is the ACTION. The "%%" marks the beginning of the rules.
-
- Here's another simple example:
-
- int num_lines = 0, num_chars = 0;
-
- %%
- \n ++num_lines; ++num_chars;
- . ++num_chars;
-
- %%
- main()
- {
- yylex();
- printf( "# of lines = %d, # of chars = %d\n",
- num_lines, num_chars );
- }
-
- This scanner counts the number of characters and the number of lines
-in its input (it produces no output other than the final report on the
-counts). The first line declares two globals, "num_lines" and
-"num_chars", which are accessible both inside `yylex()' and in the
-`main()' routine declared after the second "%%". There are two rules,
-one which matches a newline ("\n") and increments both the line count
-and the character count, and one which matches any character other than
-a newline (indicated by the "." regular expression).
-
- A somewhat more complicated example:
-
- /* scanner for a toy Pascal-like language */
-
- %{
- /* need this for the call to atof() below */
- #include <math.h>
- %}
-
- DIGIT [0-9]
- ID [a-z][a-z0-9]*
-
- %%
-
- {DIGIT}+ {
- printf( "An integer: %s (%d)\n", yytext,
- atoi( yytext ) );
- }
-
- {DIGIT}+"."{DIGIT}* {
- printf( "A float: %s (%g)\n", yytext,
- atof( yytext ) );
- }
-
- if|then|begin|end|procedure|function {
- printf( "A keyword: %s\n", yytext );
- }
-
- {ID} printf( "An identifier: %s\n", yytext );
-
- "+"|"-"|"*"|"/" printf( "An operator: %s\n", yytext );
-
- "{"[^}\n]*"}" /* eat up one-line comments */
-
- [ \t\n]+ /* eat up whitespace */
-
- . printf( "Unrecognized character: %s\n", yytext );
-
- %%
-
- main( argc, argv )
- int argc;
- char **argv;
- {
- ++argv, --argc; /* skip over program name */
- if ( argc > 0 )
- yyin = fopen( argv[0], "r" );
- else
- yyin = stdin;
-
- yylex();
- }
-
- This is the beginnings of a simple scanner for a language like
-Pascal. It identifies different types of TOKENS and reports on what it
-has seen.
-
- The details of this example will be explained in the following
-sections.
-
-
-File: flex.info, Node: Format, Next: Patterns, Prev: Examples, Up: Top
-
-Format of the input file
-========================
-
- The `flex' input file consists of three sections, separated by a
-line with just `%%' in it:
-
- definitions
- %%
- rules
- %%
- user code
-
- The "definitions" section contains declarations of simple "name"
-definitions to simplify the scanner specification, and declarations of
-"start conditions", which are explained in a later section. Name
-definitions have the form:
-
- name definition
-
- The "name" is a word beginning with a letter or an underscore ('_')
-followed by zero or more letters, digits, '_', or '-' (dash). The
-definition is taken to begin at the first non-white-space character
-following the name and continuing to the end of the line. The
-definition can subsequently be referred to using "{name}", which will
-expand to "(definition)". For example,
-
- DIGIT [0-9]
- ID [a-z][a-z0-9]*
-
-defines "DIGIT" to be a regular expression which matches a single
-digit, and "ID" to be a regular expression which matches a letter
-followed by zero-or-more letters-or-digits. A subsequent reference to
-
- {DIGIT}+"."{DIGIT}*
-
-is identical to
-
- ([0-9])+"."([0-9])*
-
-and matches one-or-more digits followed by a '.' followed by
-zero-or-more digits.
-
- The RULES section of the `flex' input contains a series of rules of
-the form:
-
- pattern action
-
-where the pattern must be unindented and the action must begin on the
-same line.
-
- See below for a further description of patterns and actions.
-
- Finally, the user code section is simply copied to `lex.yy.c'
-verbatim. It is used for companion routines which call or are called
-by the scanner. The presence of this section is optional; if it is
-missing, the second `%%' in the input file may be skipped, too.
-
- In the definitions and rules sections, any *indented* text or text
-enclosed in `%{' and `%}' is copied verbatim to the output (with the
-`%{}''s removed). The `%{}''s must appear unindented on lines by
-themselves.
-
- In the rules section, any indented or %{} text appearing before the
-first rule may be used to declare variables which are local to the
-scanning routine and (after the declarations) code which is to be
-executed whenever the scanning routine is entered. Other indented or
-%{} text in the rule section is still copied to the output, but its
-meaning is not well-defined and it may well cause compile-time errors
-(this feature is present for `POSIX' compliance; see below for other
-such features).
-
- In the definitions section (but not in the rules section), an
-unindented comment (i.e., a line beginning with "/*") is also copied
-verbatim to the output up to the next "*/".
-
-
-File: flex.info, Node: Patterns, Next: Matching, Prev: Format, Up: Top
-
-Patterns
-========
-
- The patterns in the input are written using an extended set of
-regular expressions. These are:
-
-`x'
- match the character `x'
-
-`.'
- any character (byte) except newline
-
-`[xyz]'
- a "character class"; in this case, the pattern matches either an
- `x', a `y', or a `z'
-
-`[abj-oZ]'
- a "character class" with a range in it; matches an `a', a `b', any
- letter from `j' through `o', or a `Z'
-
-`[^A-Z]'
- a "negated character class", i.e., any character but those in the
- class. In this case, any character EXCEPT an uppercase letter.
-
-`[^A-Z\n]'
- any character EXCEPT an uppercase letter or a newline
-
-`R*'
- zero or more R's, where R is any regular expression
-
-`R+'
- one or more R's
-
-`R?'
- zero or one R's (that is, "an optional R")
-
-`R{2,5}'
- anywhere from two to five R's
-
-`R{2,}'
- two or more R's
-
-`R{4}'
- exactly 4 R's
-
-`{NAME}'
- the expansion of the "NAME" definition (see above)
-
-`"[xyz]\"foo"'
- the literal string: `[xyz]"foo'
-
-`\X'
- if X is an `a', `b', `f', `n', `r', `t', or `v', then the ANSI-C
- interpretation of \X. Otherwise, a literal `X' (used to escape
- operators such as `*')
-
-`\0'
- a NUL character (ASCII code 0)
-
-`\123'
- the character with octal value 123
-
-`\x2a'
- the character with hexadecimal value `2a'
-
-`(R)'
- match an R; parentheses are used to override precedence (see below)
-
-`RS'
- the regular expression R followed by the regular expression S;
- called "concatenation"
-
-`R|S'
- either an R or an S
-
-`R/S'
- an R but only if it is followed by an S. The text matched by S is
- included when determining whether this rule is the "longest
- match", but is then returned to the input before the action is
- executed. So the action only sees the text matched by R. This
- type of pattern is called "trailing context". (There are some
- combinations of `R/S' that `flex' cannot match correctly; see
- notes in the Deficiencies / Bugs section below regarding
- "dangerous trailing context".)
-
-`^R'
- an R, but only at the beginning of a line (i.e., which just
- starting to scan, or right after a newline has been scanned).
-
-`R$'
- an R, but only at the end of a line (i.e., just before a newline).
- Equivalent to "R/\n".
-
- Note that flex's notion of "newline" is exactly whatever the C
- compiler used to compile flex interprets '\n' as; in particular,
- on some DOS systems you must either filter out \r's in the input
- yourself, or explicitly use R/\r\n for "r$".
-
-`<S>R'
- an R, but only in start condition S (see below for discussion of
- start conditions) <S1,S2,S3>R same, but in any of start conditions
- S1, S2, or S3
-
-`<*>R'
- an R in any start condition, even an exclusive one.
-
-`<<EOF>>'
- an end-of-file <S1,S2><<EOF>> an end-of-file when in start
- condition S1 or S2
-
- Note that inside of a character class, all regular expression
-operators lose their special meaning except escape ('\') and the
-character class operators, '-', ']', and, at the beginning of the
-class, '^'.
-
- The regular expressions listed above are grouped according to
-precedence, from highest precedence at the top to lowest at the bottom.
-Those grouped together have equal precedence. For example,
-
- foo|bar*
-
-is the same as
-
- (foo)|(ba(r*))
-
-since the '*' operator has higher precedence than concatenation, and
-concatenation higher than alternation ('|'). This pattern therefore
-matches *either* the string "foo" *or* the string "ba" followed by
-zero-or-more r's. To match "foo" or zero-or-more "bar"'s, use:
-
- foo|(bar)*
-
-and to match zero-or-more "foo"'s-or-"bar"'s:
-
- (foo|bar)*
-
- In addition to characters and ranges of characters, character
-classes can also contain character class "expressions". These are
-expressions enclosed inside `[': and `:'] delimiters (which themselves
-must appear between the '[' and ']' of the character class; other
-elements may occur inside the character class, too). The valid
-expressions are:
-
- [:alnum:] [:alpha:] [:blank:]
- [:cntrl:] [:digit:] [:graph:]
- [:lower:] [:print:] [:punct:]
- [:space:] [:upper:] [:xdigit:]
-
- These expressions all designate a set of characters equivalent to
-the corresponding standard C `isXXX' function. For example,
-`[:alnum:]' designates those characters for which `isalnum()' returns
-true - i.e., any alphabetic or numeric. Some systems don't provide
-`isblank()', so flex defines `[:blank:]' as a blank or a tab.
-
- For example, the following character classes are all equivalent:
-
- [[:alnum:]]
- [[:alpha:][:digit:]
- [[:alpha:]0-9]
- [a-zA-Z0-9]
-
- If your scanner is case-insensitive (the `-i' flag), then
-`[:upper:]' and `[:lower:]' are equivalent to `[:alpha:]'.
-
- Some notes on patterns:
-
- - A negated character class such as the example "[^A-Z]" above *will
- match a newline* unless "\n" (or an equivalent escape sequence) is
- one of the characters explicitly present in the negated character
- class (e.g., "[^A-Z\n]"). This is unlike how many other regular
- expression tools treat negated character classes, but
- unfortunately the inconsistency is historically entrenched.
- Matching newlines means that a pattern like [^"]* can match the
- entire input unless there's another quote in the input.
-
- - A rule can have at most one instance of trailing context (the '/'
- operator or the '$' operator). The start condition, '^', and
- "<<EOF>>" patterns can only occur at the beginning of a pattern,
- and, as well as with '/' and '$', cannot be grouped inside
- parentheses. A '^' which does not occur at the beginning of a
- rule or a '$' which does not occur at the end of a rule loses its
- special properties and is treated as a normal character.
-
- The following are illegal:
-
- foo/bar$
- <sc1>foo<sc2>bar
-
- Note that the first of these, can be written "foo/bar\n".
-
- The following will result in '$' or '^' being treated as a normal
- character:
-
- foo|(bar$)
- foo|^bar
-
- If what's wanted is a "foo" or a bar-followed-by-a-newline, the
- following could be used (the special '|' action is explained
- below):
-
- foo |
- bar$ /* action goes here */
-
- A similar trick will work for matching a foo or a
- bar-at-the-beginning-of-a-line.
-
-
-File: flex.info, Node: Matching, Next: Actions, Prev: Patterns, Up: Top
-
-How the input is matched
-========================
-
- When the generated scanner is run, it analyzes its input looking for
-strings which match any of its patterns. If it finds more than one
-match, it takes the one matching the most text (for trailing context
-rules, this includes the length of the trailing part, even though it
-will then be returned to the input). If it finds two or more matches
-of the same length, the rule listed first in the `flex' input file is
-chosen.
-
- Once the match is determined, the text corresponding to the match
-(called the TOKEN) is made available in the global character pointer
-`yytext', and its length in the global integer `yyleng'. The ACTION
-corresponding to the matched pattern is then executed (a more detailed
-description of actions follows), and then the remaining input is
-scanned for another match.
-
- If no match is found, then the "default rule" is executed: the next
-character in the input is considered matched and copied to the standard
-output. Thus, the simplest legal `flex' input is:
-
- %%
-
- which generates a scanner that simply copies its input (one
-character at a time) to its output.
-
- Note that `yytext' can be defined in two different ways: either as a
-character *pointer* or as a character *array*. You can control which
-definition `flex' uses by including one of the special directives
-`%pointer' or `%array' in the first (definitions) section of your flex
-input. The default is `%pointer', unless you use the `-l' lex
-compatibility option, in which case `yytext' will be an array. The
-advantage of using `%pointer' is substantially faster scanning and no
-buffer overflow when matching very large tokens (unless you run out of
-dynamic memory). The disadvantage is that you are restricted in how
-your actions can modify `yytext' (see the next section), and calls to
-the `unput()' function destroys the present contents of `yytext', which
-can be a considerable porting headache when moving between different
-`lex' versions.
-
- The advantage of `%array' is that you can then modify `yytext' to
-your heart's content, and calls to `unput()' do not destroy `yytext'
-(see below). Furthermore, existing `lex' programs sometimes access
-`yytext' externally using declarations of the form:
- extern char yytext[];
- This definition is erroneous when used with `%pointer', but correct
-for `%array'.
-
- `%array' defines `yytext' to be an array of `YYLMAX' characters,
-which defaults to a fairly large value. You can change the size by
-simply #define'ing `YYLMAX' to a different value in the first section
-of your `flex' input. As mentioned above, with `%pointer' yytext grows
-dynamically to accommodate large tokens. While this means your
-`%pointer' scanner can accommodate very large tokens (such as matching
-entire blocks of comments), bear in mind that each time the scanner
-must resize `yytext' it also must rescan the entire token from the
-beginning, so matching such tokens can prove slow. `yytext' presently
-does *not* dynamically grow if a call to `unput()' results in too much
-text being pushed back; instead, a run-time error results.
-
- Also note that you cannot use `%array' with C++ scanner classes (the
-`c++' option; see below).
-
-
-File: flex.info, Node: Actions, Next: Generated scanner, Prev: Matching, Up: Top
-
-Actions
-=======
-
- Each pattern in a rule has a corresponding action, which can be any
-arbitrary C statement. The pattern ends at the first non-escaped
-whitespace character; the remainder of the line is its action. If the
-action is empty, then when the pattern is matched the input token is
-simply discarded. For example, here is the specification for a program
-which deletes all occurrences of "zap me" from its input:
-
- %%
- "zap me"
-
- (It will copy all other characters in the input to the output since
-they will be matched by the default rule.)
-
- Here is a program which compresses multiple blanks and tabs down to
-a single blank, and throws away whitespace found at the end of a line:
-
- %%
- [ \t]+ putchar( ' ' );
- [ \t]+$ /* ignore this token */
-
- If the action contains a '{', then the action spans till the
-balancing '}' is found, and the action may cross multiple lines.
-`flex' knows about C strings and comments and won't be fooled by braces
-found within them, but also allows actions to begin with `%{' and will
-consider the action to be all the text up to the next `%}' (regardless
-of ordinary braces inside the action).
-
- An action consisting solely of a vertical bar ('|') means "same as
-the action for the next rule." See below for an illustration.
-
- Actions can include arbitrary C code, including `return' statements
-to return a value to whatever routine called `yylex()'. Each time
-`yylex()' is called it continues processing tokens from where it last
-left off until it either reaches the end of the file or executes a
-return.
-
- Actions are free to modify `yytext' except for lengthening it
-(adding characters to its end-these will overwrite later characters in
-the input stream). This however does not apply when using `%array'
-(see above); in that case, `yytext' may be freely modified in any way.
-
- Actions are free to modify `yyleng' except they should not do so if
-the action also includes use of `yymore()' (see below).
-
- There are a number of special directives which can be included
-within an action:
-
- - `ECHO' copies yytext to the scanner's output.
-
- - `BEGIN' followed by the name of a start condition places the
- scanner in the corresponding start condition (see below).
-
- - `REJECT' directs the scanner to proceed on to the "second best"
- rule which matched the input (or a prefix of the input). The rule
- is chosen as described above in "How the Input is Matched", and
- `yytext' and `yyleng' set up appropriately. It may either be one
- which matched as much text as the originally chosen rule but came
- later in the `flex' input file, or one which matched less text.
- For example, the following will both count the words in the input
- and call the routine special() whenever "frob" is seen:
-
- int word_count = 0;
- %%
-
- frob special(); REJECT;
- [^ \t\n]+ ++word_count;
-
- Without the `REJECT', any "frob"'s in the input would not be
- counted as words, since the scanner normally executes only one
- action per token. Multiple `REJECT's' are allowed, each one
- finding the next best choice to the currently active rule. For
- example, when the following scanner scans the token "abcd", it
- will write "abcdabcaba" to the output:
-
- %%
- a |
- ab |
- abc |
- abcd ECHO; REJECT;
- .|\n /* eat up any unmatched character */
-
- (The first three rules share the fourth's action since they use
- the special '|' action.) `REJECT' is a particularly expensive
- feature in terms of scanner performance; if it is used in *any* of
- the scanner's actions it will slow down *all* of the scanner's
- matching. Furthermore, `REJECT' cannot be used with the `-Cf' or
- `-CF' options (see below).
-
- Note also that unlike the other special actions, `REJECT' is a
- *branch*; code immediately following it in the action will *not*
- be executed.
-
- - `yymore()' tells the scanner that the next time it matches a rule,
- the corresponding token should be *appended* onto the current
- value of `yytext' rather than replacing it. For example, given
- the input "mega-kludge" the following will write
- "mega-mega-kludge" to the output:
-
- %%
- mega- ECHO; yymore();
- kludge ECHO;
-
- First "mega-" is matched and echoed to the output. Then "kludge"
- is matched, but the previous "mega-" is still hanging around at
- the beginning of `yytext' so the `ECHO' for the "kludge" rule will
- actually write "mega-kludge".
-
- Two notes regarding use of `yymore()'. First, `yymore()' depends on
-the value of `yyleng' correctly reflecting the size of the current
-token, so you must not modify `yyleng' if you are using `yymore()'.
-Second, the presence of `yymore()' in the scanner's action entails a
-minor performance penalty in the scanner's matching speed.
-
- - `yyless(n)' returns all but the first N characters of the current
- token back to the input stream, where they will be rescanned when
- the scanner looks for the next match. `yytext' and `yyleng' are
- adjusted appropriately (e.g., `yyleng' will now be equal to N ).
- For example, on the input "foobar" the following will write out
- "foobarbar":
-
- %%
- foobar ECHO; yyless(3);
- [a-z]+ ECHO;
-
- An argument of 0 to `yyless' will cause the entire current input
- string to be scanned again. Unless you've changed how the scanner
- will subsequently process its input (using `BEGIN', for example),
- this will result in an endless loop.
-
- Note that `yyless' is a macro and can only be used in the flex
- input file, not from other source files.
-
- - `unput(c)' puts the character `c' back onto the input stream. It
- will be the next character scanned. The following action will
- take the current token and cause it to be rescanned enclosed in
- parentheses.
-
- {
- int i;
- /* Copy yytext because unput() trashes yytext */
- char *yycopy = strdup( yytext );
- unput( ')' );
- for ( i = yyleng - 1; i >= 0; --i )
- unput( yycopy[i] );
- unput( '(' );
- free( yycopy );
- }
-
- Note that since each `unput()' puts the given character back at
- the *beginning* of the input stream, pushing back strings must be
- done back-to-front. An important potential problem when using
- `unput()' is that if you are using `%pointer' (the default), a
- call to `unput()' *destroys* the contents of `yytext', starting
- with its rightmost character and devouring one character to the
- left with each call. If you need the value of yytext preserved
- after a call to `unput()' (as in the above example), you must
- either first copy it elsewhere, or build your scanner using
- `%array' instead (see How The Input Is Matched).
-
- Finally, note that you cannot put back `EOF' to attempt to mark
- the input stream with an end-of-file.
-
- - `input()' reads the next character from the input stream. For
- example, the following is one way to eat up C comments:
-
- %%
- "/*" {
- register int c;
-
- for ( ; ; )
- {
- while ( (c = input()) != '*' &&
- c != EOF )
- ; /* eat up text of comment */
-
- if ( c == '*' )
- {
- while ( (c = input()) == '*' )
- ;
- if ( c == '/' )
- break; /* found the end */
- }
-
- if ( c == EOF )
- {
- error( "EOF in comment" );
- break;
- }
- }
- }
-
- (Note that if the scanner is compiled using `C++', then `input()'
- is instead referred to as `yyinput()', in order to avoid a name
- clash with the `C++' stream by the name of `input'.)
-
- - YY_FLUSH_BUFFER flushes the scanner's internal buffer so that the
- next time the scanner attempts to match a token, it will first
- refill the buffer using `YY_INPUT' (see The Generated Scanner,
- below). This action is a special case of the more general
- `yy_flush_buffer()' function, described below in the section
- Multiple Input Buffers.
-
- - `yyterminate()' can be used in lieu of a return statement in an
- action. It terminates the scanner and returns a 0 to the
- scanner's caller, indicating "all done". By default,
- `yyterminate()' is also called when an end-of-file is encountered.
- It is a macro and may be redefined.
-
-
-File: flex.info, Node: Generated scanner, Next: Start conditions, Prev: Actions, Up: Top
-
-The generated scanner
-=====================
-
- The output of `flex' is the file `lex.yy.c', which contains the
-scanning routine `yylex()', a number of tables used by it for matching
-tokens, and a number of auxiliary routines and macros. By default,
-`yylex()' is declared as follows:
-
- int yylex()
- {
- ... various definitions and the actions in here ...
- }
-
- (If your environment supports function prototypes, then it will be
-"int yylex( void )".) This definition may be changed by defining
-the "YY_DECL" macro. For example, you could use:
-
- #define YY_DECL float lexscan( a, b ) float a, b;
-
- to give the scanning routine the name `lexscan', returning a float,
-and taking two floats as arguments. Note that if you give arguments to
-the scanning routine using a K&R-style/non-prototyped function
-declaration, you must terminate the definition with a semi-colon (`;').
-
- Whenever `yylex()' is called, it scans tokens from the global input
-file `yyin' (which defaults to stdin). It continues until it either
-reaches an end-of-file (at which point it returns the value 0) or one
-of its actions executes a `return' statement.
-
- If the scanner reaches an end-of-file, subsequent calls are undefined
-unless either `yyin' is pointed at a new input file (in which case
-scanning continues from that file), or `yyrestart()' is called.
-`yyrestart()' takes one argument, a `FILE *' pointer (which can be nil,
-if you've set up `YY_INPUT' to scan from a source other than `yyin'),
-and initializes `yyin' for scanning from that file. Essentially there
-is no difference between just assigning `yyin' to a new input file or
-using `yyrestart()' to do so; the latter is available for compatibility
-with previous versions of `flex', and because it can be used to switch
-input files in the middle of scanning. It can also be used to throw
-away the current input buffer, by calling it with an argument of
-`yyin'; but better is to use `YY_FLUSH_BUFFER' (see above). Note that
-`yyrestart()' does *not* reset the start condition to `INITIAL' (see
-Start Conditions, below).
-
- If `yylex()' stops scanning due to executing a `return' statement in
-one of the actions, the scanner may then be called again and it will
-resume scanning where it left off.
-
- By default (and for purposes of efficiency), the scanner uses
-block-reads rather than simple `getc()' calls to read characters from
-`yyin'. The nature of how it gets its input can be controlled by
-defining the `YY_INPUT' macro. YY_INPUT's calling sequence is
-"YY_INPUT(buf,result,max_size)". Its action is to place up to MAX_SIZE
-characters in the character array BUF and return in the integer
-variable RESULT either the number of characters read or the constant
-YY_NULL (0 on Unix systems) to indicate EOF. The default YY_INPUT
-reads from the global file-pointer "yyin".
-
- A sample definition of YY_INPUT (in the definitions section of the
-input file):
-
- %{
- #define YY_INPUT(buf,result,max_size) \
- { \
- int c = getchar(); \
- result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
- }
- %}
-
- This definition will change the input processing to occur one
-character at a time.
-
- When the scanner receives an end-of-file indication from YY_INPUT,
-it then checks the `yywrap()' function. If `yywrap()' returns false
-(zero), then it is assumed that the function has gone ahead and set up
-`yyin' to point to another input file, and scanning continues. If it
-returns true (non-zero), then the scanner terminates, returning 0 to
-its caller. Note that in either case, the start condition remains
-unchanged; it does *not* revert to `INITIAL'.
-
- If you do not supply your own version of `yywrap()', then you must
-either use `%option noyywrap' (in which case the scanner behaves as
-though `yywrap()' returned 1), or you must link with `-lfl' to obtain
-the default version of the routine, which always returns 1.
-
- Three routines are available for scanning from in-memory buffers
-rather than files: `yy_scan_string()', `yy_scan_bytes()', and
-`yy_scan_buffer()'. See the discussion of them below in the section
-Multiple Input Buffers.
-
- The scanner writes its `ECHO' output to the `yyout' global (default,
-stdout), which may be redefined by the user simply by assigning it to
-some other `FILE' pointer.
-
-
-File: flex.info, Node: Start conditions, Next: Multiple buffers, Prev: Generated scanner, Up: Top
-
-Start conditions
-================
-
- `flex' provides a mechanism for conditionally activating rules. Any
-rule whose pattern is prefixed with "<sc>" will only be active when the
-scanner is in the start condition named "sc". For example,
-
- <STRING>[^"]* { /* eat up the string body ... */
- ...
- }
-
-will be active only when the scanner is in the "STRING" start
-condition, and
-
- <INITIAL,STRING,QUOTE>\. { /* handle an escape ... */
- ...
- }
-
-will be active only when the current start condition is either
-"INITIAL", "STRING", or "QUOTE".
-
- Start conditions are declared in the definitions (first) section of
-the input using unindented lines beginning with either `%s' or `%x'
-followed by a list of names. The former declares *inclusive* start
-conditions, the latter *exclusive* start conditions. A start condition
-is activated using the `BEGIN' action. Until the next `BEGIN' action is
-executed, rules with the given start condition will be active and rules
-with other start conditions will be inactive. If the start condition
-is *inclusive*, then rules with no start conditions at all will also be
-active. If it is *exclusive*, then *only* rules qualified with the
-start condition will be active. A set of rules contingent on the same
-exclusive start condition describe a scanner which is independent of
-any of the other rules in the `flex' input. Because of this, exclusive
-start conditions make it easy to specify "mini-scanners" which scan
-portions of the input that are syntactically different from the rest
-(e.g., comments).
-
- If the distinction between inclusive and exclusive start conditions
-is still a little vague, here's a simple example illustrating the
-connection between the two. The set of rules:
-
- %s example
- %%
-
- <example>foo do_something();
-
- bar something_else();
-
-is equivalent to
-
- %x example
- %%
-
- <example>foo do_something();
-
- <INITIAL,example>bar something_else();
-
- Without the `<INITIAL,example>' qualifier, the `bar' pattern in the
-second example wouldn't be active (i.e., couldn't match) when in start
-condition `example'. If we just used `<example>' to qualify `bar',
-though, then it would only be active in `example' and not in `INITIAL',
-while in the first example it's active in both, because in the first
-example the `example' starting condition is an *inclusive* (`%s') start
-condition.
-
- Also note that the special start-condition specifier `<*>' matches
-every start condition. Thus, the above example could also have been
-written;
-
- %x example
- %%
-
- <example>foo do_something();
-
- <*>bar something_else();
-
- The default rule (to `ECHO' any unmatched character) remains active
-in start conditions. It is equivalent to:
-
- <*>.|\\n ECHO;
-
- `BEGIN(0)' returns to the original state where only the rules with
-no start conditions are active. This state can also be referred to as
-the start-condition "INITIAL", so `BEGIN(INITIAL)' is equivalent to
-`BEGIN(0)'. (The parentheses around the start condition name are not
-required but are considered good style.)
-
- `BEGIN' actions can also be given as indented code at the beginning
-of the rules section. For example, the following will cause the
-scanner to enter the "SPECIAL" start condition whenever `yylex()' is
-called and the global variable `enter_special' is true:
-
- int enter_special;
-
- %x SPECIAL
- %%
- if ( enter_special )
- BEGIN(SPECIAL);
-
- <SPECIAL>blahblahblah
- ...more rules follow...
-
- To illustrate the uses of start conditions, here is a scanner which
-provides two different interpretations of a string like "123.456". By
-default it will treat it as as three tokens, the integer "123", a dot
-('.'), and the integer "456". But if the string is preceded earlier in
-the line by the string "expect-floats" it will treat it as a single
-token, the floating-point number 123.456:
-
- %{
- #include <math.h>
- %}
- %s expect
-
- %%
- expect-floats BEGIN(expect);
-
- <expect>[0-9]+"."[0-9]+ {
- printf( "found a float, = %f\n",
- atof( yytext ) );
- }
- <expect>\n {
- /* that's the end of the line, so
- * we need another "expect-number"
- * before we'll recognize any more
- * numbers
- */
- BEGIN(INITIAL);
- }
-
- [0-9]+ {
-
- Version 2.5 December 1994 18
-
- printf( "found an integer, = %d\n",
- atoi( yytext ) );
- }
-
- "." printf( "found a dot\n" );
-
- Here is a scanner which recognizes (and discards) C comments while
-maintaining a count of the current input line.
-
- %x comment
- %%
- int line_num = 1;
-
- "/*" BEGIN(comment);
-
- <comment>[^*\n]* /* eat anything that's not a '*' */
- <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */
- <comment>\n ++line_num;
- <comment>"*"+"/" BEGIN(INITIAL);
-
- This scanner goes to a bit of trouble to match as much text as
-possible with each rule. In general, when attempting to write a
-high-speed scanner try to match as much possible in each rule, as it's
-a big win.
-
- Note that start-conditions names are really integer values and can
-be stored as such. Thus, the above could be extended in the following
-fashion:
-
- %x comment foo
- %%
- int line_num = 1;
- int comment_caller;
-
- "/*" {
- comment_caller = INITIAL;
- BEGIN(comment);
- }
-
- ...
-
- <foo>"/*" {
- comment_caller = foo;
- BEGIN(comment);
- }
-
- <comment>[^*\n]* /* eat anything that's not a '*' */
- <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */
- <comment>\n ++line_num;
- <comment>"*"+"/" BEGIN(comment_caller);
-
- Furthermore, you can access the current start condition using the
-integer-valued `YY_START' macro. For example, the above assignments to
-`comment_caller' could instead be written
-
- comment_caller = YY_START;
-
- Flex provides `YYSTATE' as an alias for `YY_START' (since that is
-what's used by AT&T `lex').
-
- Note that start conditions do not have their own name-space; %s's
-and %x's declare names in the same fashion as #define's.
-
- Finally, here's an example of how to match C-style quoted strings
-using exclusive start conditions, including expanded escape sequences
-(but not including checking for a string that's too long):
-
- %x str
-
- %%
- char string_buf[MAX_STR_CONST];
- char *string_buf_ptr;
-
- \" string_buf_ptr = string_buf; BEGIN(str);
-
- <str>\" { /* saw closing quote - all done */
- BEGIN(INITIAL);
- *string_buf_ptr = '\0';
- /* return string constant token type and
- * value to parser
- */
- }
-
- <str>\n {
- /* error - unterminated string constant */
- /* generate error message */
- }
-
- <str>\\[0-7]{1,3} {
- /* octal escape sequence */
- int result;
-
- (void) sscanf( yytext + 1, "%o", &result );
-
- if ( result > 0xff )
- /* error, constant is out-of-bounds */
-
- *string_buf_ptr++ = result;
- }
-
- <str>\\[0-9]+ {
- /* generate error - bad escape sequence; something
- * like '\48' or '\0777777'
- */
- }
-
- <str>\\n *string_buf_ptr++ = '\n';
- <str>\\t *string_buf_ptr++ = '\t';
- <str>\\r *string_buf_ptr++ = '\r';
- <str>\\b *string_buf_ptr++ = '\b';
- <str>\\f *string_buf_ptr++ = '\f';
-
- <str>\\(.|\n) *string_buf_ptr++ = yytext[1];
-
- <str>[^\\\n\"]+ {
- char *yptr = yytext;
-
- while ( *yptr )
- *string_buf_ptr++ = *yptr++;
- }
-
- Often, such as in some of the examples above, you wind up writing a
-whole bunch of rules all preceded by the same start condition(s). Flex
-makes this a little easier and cleaner by introducing a notion of start
-condition "scope". A start condition scope is begun with:
-
- <SCs>{
-
-where SCs is a list of one or more start conditions. Inside the start
-condition scope, every rule automatically has the prefix `<SCs>'
-applied to it, until a `}' which matches the initial `{'. So, for
-example,
-
- <ESC>{
- "\\n" return '\n';
- "\\r" return '\r';
- "\\f" return '\f';
- "\\0" return '\0';
- }
-
-is equivalent to:
-
- <ESC>"\\n" return '\n';
- <ESC>"\\r" return '\r';
- <ESC>"\\f" return '\f';
- <ESC>"\\0" return '\0';
-
- Start condition scopes may be nested.
-
- Three routines are available for manipulating stacks of start
-conditions:
-
-`void yy_push_state(int new_state)'
- pushes the current start condition onto the top of the start
- condition stack and switches to NEW_STATE as though you had used
- `BEGIN new_state' (recall that start condition names are also
- integers).
-
-`void yy_pop_state()'
- pops the top of the stack and switches to it via `BEGIN'.
-
-`int yy_top_state()'
- returns the top of the stack without altering the stack's contents.
-
- The start condition stack grows dynamically and so has no built-in
-size limitation. If memory is exhausted, program execution aborts.
-
- To use start condition stacks, your scanner must include a `%option
-stack' directive (see Options below).
-
-
-File: flex.info, Node: Multiple buffers, Next: End-of-file rules, Prev: Start conditions, Up: Top
-
-Multiple input buffers
-======================
-
- Some scanners (such as those which support "include" files) require
-reading from several input streams. As `flex' scanners do a large
-amount of buffering, one cannot control where the next input will be
-read from by simply writing a `YY_INPUT' which is sensitive to the
-scanning context. `YY_INPUT' is only called when the scanner reaches
-the end of its buffer, which may be a long time after scanning a
-statement such as an "include" which requires switching the input
-source.
-
- To negotiate these sorts of problems, `flex' provides a mechanism
-for creating and switching between multiple input buffers. An input
-buffer is created by using:
-
- YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
-
-which takes a `FILE' pointer and a size and creates a buffer associated
-with the given file and large enough to hold SIZE characters (when in
-doubt, use `YY_BUF_SIZE' for the size). It returns a `YY_BUFFER_STATE'
-handle, which may then be passed to other routines (see below). The
-`YY_BUFFER_STATE' type is a pointer to an opaque `struct'
-`yy_buffer_state' structure, so you may safely initialize
-YY_BUFFER_STATE variables to `((YY_BUFFER_STATE) 0)' if you wish, and
-also refer to the opaque structure in order to correctly declare input
-buffers in source files other than that of your scanner. Note that the
-`FILE' pointer in the call to `yy_create_buffer' is only used as the
-value of `yyin' seen by `YY_INPUT'; if you redefine `YY_INPUT' so it no
-longer uses `yyin', then you can safely pass a nil `FILE' pointer to
-`yy_create_buffer'. You select a particular buffer to scan from using:
-
- void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
-
- switches the scanner's input buffer so subsequent tokens will come
-from NEW_BUFFER. Note that `yy_switch_to_buffer()' may be used by
-`yywrap()' to set things up for continued scanning, instead of opening
-a new file and pointing `yyin' at it. Note also that switching input
-sources via either `yy_switch_to_buffer()' or `yywrap()' does *not*
-change the start condition.
-
- void yy_delete_buffer( YY_BUFFER_STATE buffer )
-
-is used to reclaim the storage associated with a buffer. You can also
-clear the current contents of a buffer using:
-
- void yy_flush_buffer( YY_BUFFER_STATE buffer )
-
- This function discards the buffer's contents, so the next time the
-scanner attempts to match a token from the buffer, it will first fill
-the buffer anew using `YY_INPUT'.
-
- `yy_new_buffer()' is an alias for `yy_create_buffer()', provided for
-compatibility with the C++ use of `new' and `delete' for creating and
-destroying dynamic objects.
-
- Finally, the `YY_CURRENT_BUFFER' macro returns a `YY_BUFFER_STATE'
-handle to the current buffer.
-
- Here is an example of using these features for writing a scanner
-which expands include files (the `<<EOF>>' feature is discussed below):
-
- /* the "incl" state is used for picking up the name
- * of an include file
- */
- %x incl
-
- %{
- #define MAX_INCLUDE_DEPTH 10
- YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
- int include_stack_ptr = 0;
- %}
-
- %%
- include BEGIN(incl);
-
- [a-z]+ ECHO;
- [^a-z\n]*\n? ECHO;
-
- <incl>[ \t]* /* eat the whitespace */
- <incl>[^ \t\n]+ { /* got the include file name */
- if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
- {
- fprintf( stderr, "Includes nested too deeply" );
- exit( 1 );
- }
-
- include_stack[include_stack_ptr++] =
- YY_CURRENT_BUFFER;
-
- yyin = fopen( yytext, "r" );
-
- if ( ! yyin )
- error( ... );
-
- yy_switch_to_buffer(
- yy_create_buffer( yyin, YY_BUF_SIZE ) );
-
- BEGIN(INITIAL);
- }
-
- <<EOF>> {
- if ( --include_stack_ptr < 0 )
- {
- yyterminate();
- }
-
- else
- {
- yy_delete_buffer( YY_CURRENT_BUFFER );
- yy_switch_to_buffer(
- include_stack[include_stack_ptr] );
- }
- }
-
- Three routines are available for setting up input buffers for
-scanning in-memory strings instead of files. All of them create a new
-input buffer for scanning the string, and return a corresponding
-`YY_BUFFER_STATE' handle (which you should delete with
-`yy_delete_buffer()' when done with it). They also switch to the new
-buffer using `yy_switch_to_buffer()', so the next call to `yylex()' will
-start scanning the string.
-
-`yy_scan_string(const char *str)'
- scans a NUL-terminated string.
-
-`yy_scan_bytes(const char *bytes, int len)'
- scans `len' bytes (including possibly NUL's) starting at location
- BYTES.
-
- Note that both of these functions create and scan a *copy* of the
-string or bytes. (This may be desirable, since `yylex()' modifies the
-contents of the buffer it is scanning.) You can avoid the copy by using:
-
-`yy_scan_buffer(char *base, yy_size_t size)'
- which scans in place the buffer starting at BASE, consisting of
- SIZE bytes, the last two bytes of which *must* be
- `YY_END_OF_BUFFER_CHAR' (ASCII NUL). These last two bytes are not
- scanned; thus, scanning consists of `base[0]' through
- `base[size-2]', inclusive.
-
- If you fail to set up BASE in this manner (i.e., forget the final
- two `YY_END_OF_BUFFER_CHAR' bytes), then `yy_scan_buffer()'
- returns a nil pointer instead of creating a new input buffer.
-
- The type `yy_size_t' is an integral type to which you can cast an
- integer expression reflecting the size of the buffer.
-
-
-File: flex.info, Node: End-of-file rules, Next: Miscellaneous, Prev: Multiple buffers, Up: Top
-
-End-of-file rules
-=================
-
- The special rule "<<EOF>>" indicates actions which are to be taken
-when an end-of-file is encountered and yywrap() returns non-zero (i.e.,
-indicates no further files to process). The action must finish by
-doing one of four things:
-
- - assigning `yyin' to a new input file (in previous versions of
- flex, after doing the assignment you had to call the special
- action `YY_NEW_FILE'; this is no longer necessary);
-
- - executing a `return' statement;
-
- - executing the special `yyterminate()' action;
-
- - or, switching to a new buffer using `yy_switch_to_buffer()' as
- shown in the example above.
-
- <<EOF>> rules may not be used with other patterns; they may only be
-qualified with a list of start conditions. If an unqualified <<EOF>>
-rule is given, it applies to *all* start conditions which do not
-already have <<EOF>> actions. To specify an <<EOF>> rule for only the
-initial start condition, use
-
- <INITIAL><<EOF>>
-
- These rules are useful for catching things like unclosed comments.
-An example:
-
- %x quote
- %%
-
- ...other rules for dealing with quotes...
-
- <quote><<EOF>> {
- error( "unterminated quote" );
- yyterminate();
- }
- <<EOF>> {
- if ( *++filelist )
- yyin = fopen( *filelist, "r" );
- else
- yyterminate();
- }
-
-
-File: flex.info, Node: Miscellaneous, Next: User variables, Prev: End-of-file rules, Up: Top
-
-Miscellaneous macros
-====================
-
- The macro `YY_USER_ACTION' can be defined to provide an action which
-is always executed prior to the matched rule's action. For example, it
-could be #define'd to call a routine to convert yytext to lower-case.
-When `YY_USER_ACTION' is invoked, the variable `yy_act' gives the
-number of the matched rule (rules are numbered starting with 1).
-Suppose you want to profile how often each of your rules is matched.
-The following would do the trick:
-
- #define YY_USER_ACTION ++ctr[yy_act]
-
- where `ctr' is an array to hold the counts for the different rules.
-Note that the macro `YY_NUM_RULES' gives the total number of rules
-(including the default rule, even if you use `-s', so a correct
-declaration for `ctr' is:
-
- int ctr[YY_NUM_RULES];
-
- The macro `YY_USER_INIT' may be defined to provide an action which
-is always executed before the first scan (and before the scanner's
-internal initializations are done). For example, it could be used to
-call a routine to read in a data table or open a logging file.
-
- The macro `yy_set_interactive(is_interactive)' can be used to
-control whether the current buffer is considered *interactive*. An
-interactive buffer is processed more slowly, but must be used when the
-scanner's input source is indeed interactive to avoid problems due to
-waiting to fill buffers (see the discussion of the `-I' flag below). A
-non-zero value in the macro invocation marks the buffer as interactive,
-a zero value as non-interactive. Note that use of this macro overrides
-`%option always-interactive' or `%option never-interactive' (see
-Options below). `yy_set_interactive()' must be invoked prior to
-beginning to scan the buffer that is (or is not) to be considered
-interactive.
-
- The macro `yy_set_bol(at_bol)' can be used to control whether the
-current buffer's scanning context for the next token match is done as
-though at the beginning of a line. A non-zero macro argument makes
-rules anchored with
-
- The macro `YY_AT_BOL()' returns true if the next token scanned from
-the current buffer will have '^' rules active, false otherwise.
-
- In the generated scanner, the actions are all gathered in one large
-switch statement and separated using `YY_BREAK', which may be
-redefined. By default, it is simply a "break", to separate each rule's
-action from the following rule's. Redefining `YY_BREAK' allows, for
-example, C++ users to #define YY_BREAK to do nothing (while being very
-careful that every rule ends with a "break" or a "return"!) to avoid
-suffering from unreachable statement warnings where because a rule's
-action ends with "return", the `YY_BREAK' is inaccessible.
-
-
-File: flex.info, Node: User variables, Next: YACC interface, Prev: Miscellaneous, Up: Top
-
-Values available to the user
-============================
-
- This section summarizes the various values available to the user in
-the rule actions.
-
- - `char *yytext' holds the text of the current token. It may be
- modified but not lengthened (you cannot append characters to the
- end).
-
- If the special directive `%array' appears in the first section of
- the scanner description, then `yytext' is instead declared `char
- yytext[YYLMAX]', where `YYLMAX' is a macro definition that you can
- redefine in the first section if you don't like the default value
- (generally 8KB). Using `%array' results in somewhat slower
- scanners, but the value of `yytext' becomes immune to calls to
- `input()' and `unput()', which potentially destroy its value when
- `yytext' is a character pointer. The opposite of `%array' is
- `%pointer', which is the default.
-
- You cannot use `%array' when generating C++ scanner classes (the
- `-+' flag).
-
- - `int yyleng' holds the length of the current token.
-
- - `FILE *yyin' is the file which by default `flex' reads from. It
- may be redefined but doing so only makes sense before scanning
- begins or after an EOF has been encountered. Changing it in the
- midst of scanning will have unexpected results since `flex'
- buffers its input; use `yyrestart()' instead. Once scanning
- terminates because an end-of-file has been seen, you can assign
- `yyin' at the new input file and then call the scanner again to
- continue scanning.
-
- - `void yyrestart( FILE *new_file )' may be called to point `yyin'
- at the new input file. The switch-over to the new file is
- immediate (any previously buffered-up input is lost). Note that
- calling `yyrestart()' with `yyin' as an argument thus throws away
- the current input buffer and continues scanning the same input
- file.
-
- - `FILE *yyout' is the file to which `ECHO' actions are done. It
- can be reassigned by the user.
-
- - `YY_CURRENT_BUFFER' returns a `YY_BUFFER_STATE' handle to the
- current buffer.
-
- - `YY_START' returns an integer value corresponding to the current
- start condition. You can subsequently use this value with `BEGIN'
- to return to that start condition.
-
-
-File: flex.info, Node: YACC interface, Next: Options, Prev: User variables, Up: Top
-
-Interfacing with `yacc'
-=======================
-
- One of the main uses of `flex' is as a companion to the `yacc'
-parser-generator. `yacc' parsers expect to call a routine named
-`yylex()' to find the next input token. The routine is supposed to
-return the type of the next token as well as putting any associated
-value in the global `yylval'. To use `flex' with `yacc', one specifies
-the `-d' option to `yacc' to instruct it to generate the file `y.tab.h'
-containing definitions of all the `%tokens' appearing in the `yacc'
-input. This file is then included in the `flex' scanner. For example,
-if one of the tokens is "TOK_NUMBER", part of the scanner might look
-like:
-
- %{
- #include "y.tab.h"
- %}
-
- %%
-
- [0-9]+ yylval = atoi( yytext ); return TOK_NUMBER;
-
-
-File: flex.info, Node: Options, Next: Performance, Prev: YACC interface, Up: Top
-
-Options
-=======
-
- `flex' has the following options:
-
-`-b'
- Generate backing-up information to `lex.backup'. This is a list
- of scanner states which require backing up and the input
- characters on which they do so. By adding rules one can remove
- backing-up states. If *all* backing-up states are eliminated and
- `-Cf' or `-CF' is used, the generated scanner will run faster (see
- the `-p' flag). Only users who wish to squeeze every last cycle
- out of their scanners need worry about this option. (See the
- section on Performance Considerations below.)
-
-`-c'
- is a do-nothing, deprecated option included for POSIX compliance.
-
-`-d'
- makes the generated scanner run in "debug" mode. Whenever a
- pattern is recognized and the global `yy_flex_debug' is non-zero
- (which is the default), the scanner will write to `stderr' a line
- of the form:
-
- --accepting rule at line 53 ("the matched text")
-
- The line number refers to the location of the rule in the file
- defining the scanner (i.e., the file that was fed to flex).
- Messages are also generated when the scanner backs up, accepts the
- default rule, reaches the end of its input buffer (or encounters a
- NUL; at this point, the two look the same as far as the scanner's
- concerned), or reaches an end-of-file.
-
-`-f'
- specifies "fast scanner". No table compression is done and stdio
- is bypassed. The result is large but fast. This option is
- equivalent to `-Cfr' (see below).
-
-`-h'
- generates a "help" summary of `flex's' options to `stdout' and
- then exits. `-?' and `--help' are synonyms for `-h'.
-
-`-i'
- instructs `flex' to generate a *case-insensitive* scanner. The
- case of letters given in the `flex' input patterns will be
- ignored, and tokens in the input will be matched regardless of
- case. The matched text given in `yytext' will have the preserved
- case (i.e., it will not be folded).
-
-`-l'
- turns on maximum compatibility with the original AT&T `lex'
- implementation. Note that this does not mean *full*
- compatibility. Use of this option costs a considerable amount of
- performance, and it cannot be used with the `-+, -f, -F, -Cf', or
- `-CF' options. For details on the compatibilities it provides, see
- the section "Incompatibilities With Lex And POSIX" below. This
- option also results in the name `YY_FLEX_LEX_COMPAT' being
- #define'd in the generated scanner.
-
-`-n'
- is another do-nothing, deprecated option included only for POSIX
- compliance.
-
-`-p'
- generates a performance report to stderr. The report consists of
- comments regarding features of the `flex' input file which will
- cause a serious loss of performance in the resulting scanner. If
- you give the flag twice, you will also get comments regarding
- features that lead to minor performance losses.
-
- Note that the use of `REJECT', `%option yylineno' and variable
- trailing context (see the Deficiencies / Bugs section below)
- entails a substantial performance penalty; use of `yymore()', the
- `^' operator, and the `-I' flag entail minor performance penalties.
-
-`-s'
- causes the "default rule" (that unmatched scanner input is echoed
- to `stdout') to be suppressed. If the scanner encounters input
- that does not match any of its rules, it aborts with an error.
- This option is useful for finding holes in a scanner's rule set.
-
-`-t'
- instructs `flex' to write the scanner it generates to standard
- output instead of `lex.yy.c'.
-
-`-v'
- specifies that `flex' should write to `stderr' a summary of
- statistics regarding the scanner it generates. Most of the
- statistics are meaningless to the casual `flex' user, but the
- first line identifies the version of `flex' (same as reported by
- `-V'), and the next line the flags used when generating the
- scanner, including those that are on by default.
-
-`-w'
- suppresses warning messages.
-
-`-B'
- instructs `flex' to generate a *batch* scanner, the opposite of
- *interactive* scanners generated by `-I' (see below). In general,
- you use `-B' when you are *certain* that your scanner will never
- be used interactively, and you want to squeeze a *little* more
- performance out of it. If your goal is instead to squeeze out a
- *lot* more performance, you should be using the `-Cf' or `-CF'
- options (discussed below), which turn on `-B' automatically anyway.
-
-`-F'
- specifies that the "fast" scanner table representation should be
- used (and stdio bypassed). This representation is about as fast
- as the full table representation `(-f)', and for some sets of
- patterns will be considerably smaller (and for others, larger).
- In general, if the pattern set contains both "keywords" and a
- catch-all, "identifier" rule, such as in the set:
-
- "case" return TOK_CASE;
- "switch" return TOK_SWITCH;
- ...
- "default" return TOK_DEFAULT;
- [a-z]+ return TOK_ID;
-
- then you're better off using the full table representation. If
- only the "identifier" rule is present and you then use a hash
- table or some such to detect the keywords, you're better off using
- `-F'.
-
- This option is equivalent to `-CFr' (see below). It cannot be
- used with `-+'.
-
-`-I'
- instructs `flex' to generate an *interactive* scanner. An
- interactive scanner is one that only looks ahead to decide what
- token has been matched if it absolutely must. It turns out that
- always looking one extra character ahead, even if the scanner has
- already seen enough text to disambiguate the current token, is a
- bit faster than only looking ahead when necessary. But scanners
- that always look ahead give dreadful interactive performance; for
- example, when a user types a newline, it is not recognized as a
- newline token until they enter *another* token, which often means
- typing in another whole line.
-
- `Flex' scanners default to *interactive* unless you use the `-Cf'
- or `-CF' table-compression options (see below). That's because if
- you're looking for high-performance you should be using one of
- these options, so if you didn't, `flex' assumes you'd rather trade
- off a bit of run-time performance for intuitive interactive
- behavior. Note also that you *cannot* use `-I' in conjunction
- with `-Cf' or `-CF'. Thus, this option is not really needed; it
- is on by default for all those cases in which it is allowed.
-
- You can force a scanner to *not* be interactive by using `-B' (see
- above).
-
-`-L'
- instructs `flex' not to generate `#line' directives. Without this
- option, `flex' peppers the generated scanner with #line directives
- so error messages in the actions will be correctly located with
- respect to either the original `flex' input file (if the errors
- are due to code in the input file), or `lex.yy.c' (if the errors
- are `flex's' fault - you should report these sorts of errors to
- the email address given below).
-
-`-T'
- makes `flex' run in `trace' mode. It will generate a lot of
- messages to `stderr' concerning the form of the input and the
- resultant non-deterministic and deterministic finite automata.
- This option is mostly for use in maintaining `flex'.
-
-`-V'
- prints the version number to `stdout' and exits. `--version' is a
- synonym for `-V'.
-
-`-7'
- instructs `flex' to generate a 7-bit scanner, i.e., one which can
- only recognized 7-bit characters in its input. The advantage of
- using `-7' is that the scanner's tables can be up to half the size
- of those generated using the `-8' option (see below). The
- disadvantage is that such scanners often hang or crash if their
- input contains an 8-bit character.
-
- Note, however, that unless you generate your scanner using the
- `-Cf' or `-CF' table compression options, use of `-7' will save
- only a small amount of table space, and make your scanner
- considerably less portable. `Flex's' default behavior is to
- generate an 8-bit scanner unless you use the `-Cf' or `-CF', in
- which case `flex' defaults to generating 7-bit scanners unless
- your site was always configured to generate 8-bit scanners (as
- will often be the case with non-USA sites). You can tell whether
- flex generated a 7-bit or an 8-bit scanner by inspecting the flag
- summary in the `-v' output as described above.
-
- Note that if you use `-Cfe' or `-CFe' (those table compression
- options, but also using equivalence classes as discussed see
- below), flex still defaults to generating an 8-bit scanner, since
- usually with these compression options full 8-bit tables are not
- much more expensive than 7-bit tables.
-
-`-8'
- instructs `flex' to generate an 8-bit scanner, i.e., one which can
- recognize 8-bit characters. This flag is only needed for scanners
- generated using `-Cf' or `-CF', as otherwise flex defaults to
- generating an 8-bit scanner anyway.
-
- See the discussion of `-7' above for flex's default behavior and
- the tradeoffs between 7-bit and 8-bit scanners.
-
-`-+'
- specifies that you want flex to generate a C++ scanner class. See
- the section on Generating C++ Scanners below for details.
-
-`-C[aefFmr]'
- controls the degree of table compression and, more generally,
- trade-offs between small scanners and fast scanners.
-
- `-Ca' ("align") instructs flex to trade off larger tables in the
- generated scanner for faster performance because the elements of
- the tables are better aligned for memory access and computation.
- On some RISC architectures, fetching and manipulating long-words
- is more efficient than with smaller-sized units such as
- shortwords. This option can double the size of the tables used by
- your scanner.
-
- `-Ce' directs `flex' to construct "equivalence classes", i.e.,
- sets of characters which have identical lexical properties (for
- example, if the only appearance of digits in the `flex' input is
- in the character class "[0-9]" then the digits '0', '1', ..., '9'
- will all be put in the same equivalence class). Equivalence
- classes usually give dramatic reductions in the final table/object
- file sizes (typically a factor of 2-5) and are pretty cheap
- performance-wise (one array look-up per character scanned).
-
- `-Cf' specifies that the *full* scanner tables should be generated
- - `flex' should not compress the tables by taking advantages of
- similar transition functions for different states.
-
- `-CF' specifies that the alternate fast scanner representation
- (described above under the `-F' flag) should be used. This option
- cannot be used with `-+'.
-
- `-Cm' directs `flex' to construct "meta-equivalence classes",
- which are sets of equivalence classes (or characters, if
- equivalence classes are not being used) that are commonly used
- together. Meta-equivalence classes are often a big win when using
- compressed tables, but they have a moderate performance impact
- (one or two "if" tests and one array look-up per character
- scanned).
-
- `-Cr' causes the generated scanner to *bypass* use of the standard
- I/O library (stdio) for input. Instead of calling `fread()' or
- `getc()', the scanner will use the `read()' system call, resulting
- in a performance gain which varies from system to system, but in
- general is probably negligible unless you are also using `-Cf' or
- `-CF'. Using `-Cr' can cause strange behavior if, for example,
- you read from `yyin' using stdio prior to calling the scanner
- (because the scanner will miss whatever text your previous reads
- left in the stdio input buffer).
-
- `-Cr' has no effect if you define `YY_INPUT' (see The Generated
- Scanner above).
-
- A lone `-C' specifies that the scanner tables should be compressed
- but neither equivalence classes nor meta-equivalence classes
- should be used.
-
- The options `-Cf' or `-CF' and `-Cm' do not make sense together -
- there is no opportunity for meta-equivalence classes if the table
- is not being compressed. Otherwise the options may be freely
- mixed, and are cumulative.
-
- The default setting is `-Cem', which specifies that `flex' should
- generate equivalence classes and meta-equivalence classes. This
- setting provides the highest degree of table compression. You can
- trade off faster-executing scanners at the cost of larger tables
- with the following generally being true:
-
- slowest & smallest
- -Cem
- -Cm
- -Ce
- -C
- -C{f,F}e
- -C{f,F}
- -C{f,F}a
- fastest & largest
-
- Note that scanners with the smallest tables are usually generated
- and compiled the quickest, so during development you will usually
- want to use the default, maximal compression.
-
- `-Cfe' is often a good compromise between speed and size for
- production scanners.
-
-`-ooutput'
- directs flex to write the scanner to the file `out-' `put' instead
- of `lex.yy.c'. If you combine `-o' with the `-t' option, then the
- scanner is written to `stdout' but its `#line' directives (see the
- `-L' option above) refer to the file `output'.
-
-`-Pprefix'
- changes the default `yy' prefix used by `flex' for all
- globally-visible variable and function names to instead be PREFIX.
- For example, `-Pfoo' changes the name of `yytext' to `footext'.
- It also changes the name of the default output file from
- `lex.yy.c' to `lex.foo.c'. Here are all of the names affected:
-
- yy_create_buffer
- yy_delete_buffer
- yy_flex_debug
- yy_init_buffer
- yy_flush_buffer
- yy_load_buffer_state
- yy_switch_to_buffer
- yyin
- yyleng
- yylex
- yylineno
- yyout
- yyrestart
- yytext
- yywrap
-
- (If you are using a C++ scanner, then only `yywrap' and
- `yyFlexLexer' are affected.) Within your scanner itself, you can
- still refer to the global variables and functions using either
- version of their name; but externally, they have the modified name.
-
- This option lets you easily link together multiple `flex' programs
- into the same executable. Note, though, that using this option
- also renames `yywrap()', so you now *must* either provide your own
- (appropriately-named) version of the routine for your scanner, or
- use `%option noyywrap', as linking with `-lfl' no longer provides
- one for you by default.
-
-`-Sskeleton_file'
- overrides the default skeleton file from which `flex' constructs
- its scanners. You'll never need this option unless you are doing
- `flex' maintenance or development.
-
- `flex' also provides a mechanism for controlling options within the
-scanner specification itself, rather than from the flex command-line.
-This is done by including `%option' directives in the first section of
-the scanner specification. You can specify multiple options with a
-single `%option' directive, and multiple directives in the first
-section of your flex input file. Most options are given simply as
-names, optionally preceded by the word "no" (with no intervening
-whitespace) to negate their meaning. A number are equivalent to flex
-flags or their negation:
-
- 7bit -7 option
- 8bit -8 option
- align -Ca option
- backup -b option
- batch -B option
- c++ -+ option
-
- caseful or
- case-sensitive opposite of -i (default)
-
- case-insensitive or
- caseless -i option
-
- debug -d option
- default opposite of -s option
- ecs -Ce option
- fast -F option
- full -f option
- interactive -I option
- lex-compat -l option
- meta-ecs -Cm option
- perf-report -p option
- read -Cr option
- stdout -t option
- verbose -v option
- warn opposite of -w option
- (use "%option nowarn" for -w)
-
- array equivalent to "%array"
- pointer equivalent to "%pointer" (default)
-
- Some `%option's' provide features otherwise not available:
-
-`always-interactive'
- instructs flex to generate a scanner which always considers its
- input "interactive". Normally, on each new input file the scanner
- calls `isatty()' in an attempt to determine whether the scanner's
- input source is interactive and thus should be read a character at
- a time. When this option is used, however, then no such call is
- made.
-
-`main'
- directs flex to provide a default `main()' program for the
- scanner, which simply calls `yylex()'. This option implies
- `noyywrap' (see below).
-
-`never-interactive'
- instructs flex to generate a scanner which never considers its
- input "interactive" (again, no call made to `isatty())'. This is
- the opposite of `always-' *interactive*.
-
-`stack'
- enables the use of start condition stacks (see Start Conditions
- above).
-
-`stdinit'
- if unset (i.e., `%option nostdinit') initializes `yyin' and
- `yyout' to nil `FILE' pointers, instead of `stdin' and `stdout'.
-
-`yylineno'
- directs `flex' to generate a scanner that maintains the number of
- the current line read from its input in the global variable
- `yylineno'. This option is implied by `%option lex-compat'.
-
-`yywrap'
- if unset (i.e., `%option noyywrap'), makes the scanner not call
- `yywrap()' upon an end-of-file, but simply assume that there are
- no more files to scan (until the user points `yyin' at a new file
- and calls `yylex()' again).
-
- `flex' scans your rule actions to determine whether you use the
-`REJECT' or `yymore()' features. The `reject' and `yymore' options are
-available to override its decision as to whether you use the options,
-either by setting them (e.g., `%option reject') to indicate the feature
-is indeed used, or unsetting them to indicate it actually is not used
-(e.g., `%option noyymore').
-
- Three options take string-delimited values, offset with '=':
-
- %option outfile="ABC"
-
-is equivalent to `-oABC', and
-
- %option prefix="XYZ"
-
-is equivalent to `-PXYZ'.
-
- Finally,
-
- %option yyclass="foo"
-
-only applies when generating a C++ scanner (`-+' option). It informs
-`flex' that you have derived `foo' as a subclass of `yyFlexLexer' so
-`flex' will place your actions in the member function `foo::yylex()'
-instead of `yyFlexLexer::yylex()'. It also generates a
-`yyFlexLexer::yylex()' member function that emits a run-time error (by
-invoking `yyFlexLexer::LexerError()') if called. See Generating C++
-Scanners, below, for additional information.
-
- A number of options are available for lint purists who want to
-suppress the appearance of unneeded routines in the generated scanner.
-Each of the following, if unset, results in the corresponding routine
-not appearing in the generated scanner:
-
- input, unput
- yy_push_state, yy_pop_state, yy_top_state
- yy_scan_buffer, yy_scan_bytes, yy_scan_string
-
-(though `yy_push_state()' and friends won't appear anyway unless you
-use `%option stack').
-
-
-File: flex.info, Node: Performance, Next: C++, Prev: Options, Up: Top
-
-Performance considerations
-==========================
-
- The main design goal of `flex' is that it generate high-performance
-scanners. It has been optimized for dealing well with large sets of
-rules. Aside from the effects on scanner speed of the table
-compression `-C' options outlined above, there are a number of
-options/actions which degrade performance. These are, from most
-expensive to least:
-
- REJECT
- %option yylineno
- arbitrary trailing context
-
- pattern sets that require backing up
- %array
- %option interactive
- %option always-interactive
-
- '^' beginning-of-line operator
- yymore()
-
- with the first three all being quite expensive and the last two
-being quite cheap. Note also that `unput()' is implemented as a
-routine call that potentially does quite a bit of work, while
-`yyless()' is a quite-cheap macro; so if just putting back some excess
-text you scanned, use `yyless()'.
-
- `REJECT' should be avoided at all costs when performance is
-important. It is a particularly expensive option.
-
- Getting rid of backing up is messy and often may be an enormous
-amount of work for a complicated scanner. In principal, one begins by
-using the `-b' flag to generate a `lex.backup' file. For example, on
-the input
-
- %%
- foo return TOK_KEYWORD;
- foobar return TOK_KEYWORD;
-
-the file looks like:
-
- State #6 is non-accepting -
- associated rule line numbers:
- 2 3
- out-transitions: [ o ]
- jam-transitions: EOF [ \001-n p-\177 ]
-
- State #8 is non-accepting -
- associated rule line numbers:
- 3
- out-transitions: [ a ]
- jam-transitions: EOF [ \001-` b-\177 ]
-
- State #9 is non-accepting -
- associated rule line numbers:
- 3
- out-transitions: [ r ]
- jam-transitions: EOF [ \001-q s-\177 ]
-
- Compressed tables always back up.
-
- The first few lines tell us that there's a scanner state in which it
-can make a transition on an 'o' but not on any other character, and
-that in that state the currently scanned text does not match any rule.
-The state occurs when trying to match the rules found at lines 2 and 3
-in the input file. If the scanner is in that state and then reads
-something other than an 'o', it will have to back up to find a rule
-which is matched. With a bit of head-scratching one can see that this
-must be the state it's in when it has seen "fo". When this has
-happened, if anything other than another 'o' is seen, the scanner will
-have to back up to simply match the 'f' (by the default rule).
-
- The comment regarding State #8 indicates there's a problem when
-"foob" has been scanned. Indeed, on any character other than an 'a',
-the scanner will have to back up to accept "foo". Similarly, the
-comment for State #9 concerns when "fooba" has been scanned and an 'r'
-does not follow.
-
- The final comment reminds us that there's no point going to all the
-trouble of removing backing up from the rules unless we're using `-Cf'
-or `-CF', since there's no performance gain doing so with compressed
-scanners.
-
- The way to remove the backing up is to add "error" rules:
-
- %%
- foo return TOK_KEYWORD;
- foobar return TOK_KEYWORD;
-
- fooba |
- foob |
- fo {
- /* false alarm, not really a keyword */
- return TOK_ID;
- }
-
- Eliminating backing up among a list of keywords can also be done
-using a "catch-all" rule:
-
- %%
- foo return TOK_KEYWORD;
- foobar return TOK_KEYWORD;
-
- [a-z]+ return TOK_ID;
-
- This is usually the best solution when appropriate.
-
- Backing up messages tend to cascade. With a complicated set of
-rules it's not uncommon to get hundreds of messages. If one can
-decipher them, though, it often only takes a dozen or so rules to
-eliminate the backing up (though it's easy to make a mistake and have
-an error rule accidentally match a valid token. A possible future
-`flex' feature will be to automatically add rules to eliminate backing
-up).
-
- It's important to keep in mind that you gain the benefits of
-eliminating backing up only if you eliminate *every* instance of
-backing up. Leaving just one means you gain nothing.
-
- VARIABLE trailing context (where both the leading and trailing parts
-do not have a fixed length) entails almost the same performance loss as
-`REJECT' (i.e., substantial). So when possible a rule like:
-
- %%
- mouse|rat/(cat|dog) run();
-
-is better written:
-
- %%
- mouse/cat|dog run();
- rat/cat|dog run();
-
-or as
-
- %%
- mouse|rat/cat run();
- mouse|rat/dog run();
-
- Note that here the special '|' action does *not* provide any
-savings, and can even make things worse (see Deficiencies / Bugs below).
-
- Another area where the user can increase a scanner's performance
-(and one that's easier to implement) arises from the fact that the
-longer the tokens matched, the faster the scanner will run. This is
-because with long tokens the processing of most input characters takes
-place in the (short) inner scanning loop, and does not often have to go
-through the additional work of setting up the scanning environment
-(e.g., `yytext') for the action. Recall the scanner for C comments:
-
- %x comment
- %%
- int line_num = 1;
-
- "/*" BEGIN(comment);
-
- <comment>[^*\n]*
- <comment>"*"+[^*/\n]*
- <comment>\n ++line_num;
- <comment>"*"+"/" BEGIN(INITIAL);
-
- This could be sped up by writing it as:
-
- %x comment
- %%
- int line_num = 1;
-
- "/*" BEGIN(comment);
-
- <comment>[^*\n]*
- <comment>[^*\n]*\n ++line_num;
- <comment>"*"+[^*/\n]*
- <comment>"*"+[^*/\n]*\n ++line_num;
- <comment>"*"+"/" BEGIN(INITIAL);
-
- Now instead of each newline requiring the processing of another
-action, recognizing the newlines is "distributed" over the other rules
-to keep the matched text as long as possible. Note that *adding* rules
-does *not* slow down the scanner! The speed of the scanner is
-independent of the number of rules or (modulo the considerations given
-at the beginning of this section) how complicated the rules are with
-regard to operators such as '*' and '|'.
-
- A final example in speeding up a scanner: suppose you want to scan
-through a file containing identifiers and keywords, one per line and
-with no other extraneous characters, and recognize all the keywords. A
-natural first approach is:
-
- %%
- asm |
- auto |
- break |
- ... etc ...
- volatile |
- while /* it's a keyword */
-
- .|\n /* it's not a keyword */
-
- To eliminate the back-tracking, introduce a catch-all rule:
-
- %%
- asm |
- auto |
- break |
- ... etc ...
- volatile |
- while /* it's a keyword */
-
- [a-z]+ |
- .|\n /* it's not a keyword */
-
- Now, if it's guaranteed that there's exactly one word per line, then
-we can reduce the total number of matches by a half by merging in the
-recognition of newlines with that of the other tokens:
-
- %%
- asm\n |
- auto\n |
- break\n |
- ... etc ...
- volatile\n |
- while\n /* it's a keyword */
-
- [a-z]+\n |
- .|\n /* it's not a keyword */
-
- One has to be careful here, as we have now reintroduced backing up
-into the scanner. In particular, while *we* know that there will never
-be any characters in the input stream other than letters or newlines,
-`flex' can't figure this out, and it will plan for possibly needing to
-back up when it has scanned a token like "auto" and then the next
-character is something other than a newline or a letter. Previously it
-would then just match the "auto" rule and be done, but now it has no
-"auto" rule, only a "auto\n" rule. To eliminate the possibility of
-backing up, we could either duplicate all rules but without final
-newlines, or, since we never expect to encounter such an input and
-therefore don't how it's classified, we can introduce one more
-catch-all rule, this one which doesn't include a newline:
-
- %%
- asm\n |
- auto\n |
- break\n |
- ... etc ...
- volatile\n |
- while\n /* it's a keyword */
-
- [a-z]+\n |
- [a-z]+ |
- .|\n /* it's not a keyword */
-
- Compiled with `-Cf', this is about as fast as one can get a `flex'
-scanner to go for this particular problem.
-
- A final note: `flex' is slow when matching NUL's, particularly when
-a token contains multiple NUL's. It's best to write rules which match
-*short* amounts of text if it's anticipated that the text will often
-include NUL's.
-
- Another final note regarding performance: as mentioned above in the
-section How the Input is Matched, dynamically resizing `yytext' to
-accommodate huge tokens is a slow process because it presently requires
-that the (huge) token be rescanned from the beginning. Thus if
-performance is vital, you should attempt to match "large" quantities of
-text but not "huge" quantities, where the cutoff between the two is at
-about 8K characters/token.
-
-
-File: flex.info, Node: C++, Next: Incompatibilities, Prev: Performance, Up: Top
-
-Generating C++ scanners
-=======================
-
- `flex' provides two different ways to generate scanners for use with
-C++. The first way is to simply compile a scanner generated by `flex'
-using a C++ compiler instead of a C compiler. You should not encounter
-any compilations errors (please report any you find to the email address
-given in the Author section below). You can then use C++ code in your
-rule actions instead of C code. Note that the default input source for
-your scanner remains `yyin', and default echoing is still done to
-`yyout'. Both of these remain `FILE *' variables and not C++ `streams'.
-
- You can also use `flex' to generate a C++ scanner class, using the
-`-+' option, (or, equivalently, `%option c++'), which is automatically
-specified if the name of the flex executable ends in a `+', such as
-`flex++'. When using this option, flex defaults to generating the
-scanner to the file `lex.yy.cc' instead of `lex.yy.c'. The generated
-scanner includes the header file `FlexLexer.h', which defines the
-interface to two C++ classes.
-
- The first class, `FlexLexer', provides an abstract base class
-defining the general scanner class interface. It provides the
-following member functions:
-
-`const char* YYText()'
- returns the text of the most recently matched token, the
- equivalent of `yytext'.
-
-`int YYLeng()'
- returns the length of the most recently matched token, the
- equivalent of `yyleng'.
-
-`int lineno() const'
- returns the current input line number (see `%option yylineno'), or
- 1 if `%option yylineno' was not used.
-
-`void set_debug( int flag )'
- sets the debugging flag for the scanner, equivalent to assigning to
- `yy_flex_debug' (see the Options section above). Note that you
- must build the scanner using `%option debug' to include debugging
- information in it.
-
-`int debug() const'
- returns the current setting of the debugging flag.
-
- Also provided are member functions equivalent to
-`yy_switch_to_buffer(), yy_create_buffer()' (though the first argument
-is an `istream*' object pointer and not a `FILE*', `yy_flush_buffer()',
-`yy_delete_buffer()', and `yyrestart()' (again, the first argument is a
-`istream*' object pointer).
-
- The second class defined in `FlexLexer.h' is `yyFlexLexer', which is
-derived from `FlexLexer'. It defines the following additional member
-functions:
-
-`yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )'
- constructs a `yyFlexLexer' object using the given streams for
- input and output. If not specified, the streams default to `cin'
- and `cout', respectively.
-
-`virtual int yylex()'
- performs the same role is `yylex()' does for ordinary flex
- scanners: it scans the input stream, consuming tokens, until a
- rule's action returns a value. If you derive a subclass S from
- `yyFlexLexer' and want to access the member functions and
- variables of S inside `yylex()', then you need to use `%option
- yyclass="S"' to inform `flex' that you will be using that subclass
- instead of `yyFlexLexer'. In this case, rather than generating
- `yyFlexLexer::yylex()', `flex' generates `S::yylex()' (and also
- generates a dummy `yyFlexLexer::yylex()' that calls
- `yyFlexLexer::LexerError()' if called).
-
-`virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)'
- reassigns `yyin' to `new_in' (if non-nil) and `yyout' to `new_out'
- (ditto), deleting the previous input buffer if `yyin' is
- reassigned.
-
-`int yylex( istream* new_in = 0, ostream* new_out = 0 )'
- first switches the input streams via `switch_streams( new_in,
- new_out )' and then returns the value of `yylex()'.
-
- In addition, `yyFlexLexer' defines the following protected virtual
-functions which you can redefine in derived classes to tailor the
-scanner:
-
-`virtual int LexerInput( char* buf, int max_size )'
- reads up to `max_size' characters into BUF and returns the number
- of characters read. To indicate end-of-input, return 0
- characters. Note that "interactive" scanners (see the `-B' and
- `-I' flags) define the macro `YY_INTERACTIVE'. If you redefine
- `LexerInput()' and need to take different actions depending on
- whether or not the scanner might be scanning an interactive input
- source, you can test for the presence of this name via `#ifdef'.
-
-`virtual void LexerOutput( const char* buf, int size )'
- writes out SIZE characters from the buffer BUF, which, while
- NUL-terminated, may also contain "internal" NUL's if the scanner's
- rules can match text with NUL's in them.
-
-`virtual void LexerError( const char* msg )'
- reports a fatal error message. The default version of this
- function writes the message to the stream `cerr' and exits.
-
- Note that a `yyFlexLexer' object contains its *entire* scanning
-state. Thus you can use such objects to create reentrant scanners.
-You can instantiate multiple instances of the same `yyFlexLexer' class,
-and you can also combine multiple C++ scanner classes together in the
-same program using the `-P' option discussed above. Finally, note that
-the `%array' feature is not available to C++ scanner classes; you must
-use `%pointer' (the default).
-
- Here is an example of a simple C++ scanner:
-
- // An example of using the flex C++ scanner class.
-
- %{
- int mylineno = 0;
- %}
-
- string \"[^\n"]+\"
-
- ws [ \t]+
-
- alpha [A-Za-z]
- dig [0-9]
- name ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])*
- num1 [-+]?{dig}+\.?([eE][-+]?{dig}+)?
- num2 [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)?
- number {num1}|{num2}
-
- %%
-
- {ws} /* skip blanks and tabs */
-
- "/*" {
- int c;
-
- while((c = yyinput()) != 0)
- {
- if(c == '\n')
- ++mylineno;
-
- else if(c == '*')
- {
- if((c = yyinput()) == '/')
- break;
- else
- unput(c);
- }
- }
- }
-
- {number} cout << "number " << YYText() << '\n';
-
- \n mylineno++;
-
- {name} cout << "name " << YYText() << '\n';
-
- {string} cout << "string " << YYText() << '\n';
-
- %%
-
- Version 2.5 December 1994 44
-
- int main( int /* argc */, char** /* argv */ )
- {
- FlexLexer* lexer = new yyFlexLexer;
- while(lexer->yylex() != 0)
- ;
- return 0;
- }
-
- If you want to create multiple (different) lexer classes, you use
-the `-P' flag (or the `prefix=' option) to rename each `yyFlexLexer' to
-some other `xxFlexLexer'. You then can include `<FlexLexer.h>' in your
-other sources once per lexer class, first renaming `yyFlexLexer' as
-follows:
-
- #undef yyFlexLexer
- #define yyFlexLexer xxFlexLexer
- #include <FlexLexer.h>
-
- #undef yyFlexLexer
- #define yyFlexLexer zzFlexLexer
- #include <FlexLexer.h>
-
- if, for example, you used `%option prefix="xx"' for one of your
-scanners and `%option prefix="zz"' for the other.
-
- IMPORTANT: the present form of the scanning class is *experimental*
-and may change considerably between major releases.
-
-
-File: flex.info, Node: Incompatibilities, Next: Diagnostics, Prev: C++, Up: Top
-
-Incompatibilities with `lex' and POSIX
-======================================
-
- `flex' is a rewrite of the AT&T Unix `lex' tool (the two
-implementations do not share any code, though), with some extensions
-and incompatibilities, both of which are of concern to those who wish
-to write scanners acceptable to either implementation. Flex is fully
-compliant with the POSIX `lex' specification, except that when using
-`%pointer' (the default), a call to `unput()' destroys the contents of
-`yytext', which is counter to the POSIX specification.
-
- In this section we discuss all of the known areas of incompatibility
-between flex, AT&T lex, and the POSIX specification.
-
- `flex's' `-l' option turns on maximum compatibility with the
-original AT&T `lex' implementation, at the cost of a major loss in the
-generated scanner's performance. We note below which incompatibilities
-can be overcome using the `-l' option.
-
- `flex' is fully compatible with `lex' with the following exceptions:
-
- - The undocumented `lex' scanner internal variable `yylineno' is not
- supported unless `-l' or `%option yylineno' is used. `yylineno'
- should be maintained on a per-buffer basis, rather than a
- per-scanner (single global variable) basis. `yylineno' is not
- part of the POSIX specification.
-
- - The `input()' routine is not redefinable, though it may be called
- to read characters following whatever has been matched by a rule.
- If `input()' encounters an end-of-file the normal `yywrap()'
- processing is done. A "real" end-of-file is returned by `input()'
- as `EOF'.
-
- Input is instead controlled by defining the `YY_INPUT' macro.
-
- The `flex' restriction that `input()' cannot be redefined is in
- accordance with the POSIX specification, which simply does not
- specify any way of controlling the scanner's input other than by
- making an initial assignment to `yyin'.
-
- - The `unput()' routine is not redefinable. This restriction is in
- accordance with POSIX.
-
- - `flex' scanners are not as reentrant as `lex' scanners. In
- particular, if you have an interactive scanner and an interrupt
- handler which long-jumps out of the scanner, and the scanner is
- subsequently called again, you may get the following message:
-
- fatal flex scanner internal error--end of buffer missed
-
- To reenter the scanner, first use
-
- yyrestart( yyin );
-
- Note that this call will throw away any buffered input; usually
- this isn't a problem with an interactive scanner.
-
- Also note that flex C++ scanner classes *are* reentrant, so if
- using C++ is an option for you, you should use them instead. See
- "Generating C++ Scanners" above for details.
-
- - `output()' is not supported. Output from the `ECHO' macro is done
- to the file-pointer `yyout' (default `stdout').
-
- `output()' is not part of the POSIX specification.
-
- - `lex' does not support exclusive start conditions (%x), though
- they are in the POSIX specification.
-
- - When definitions are expanded, `flex' encloses them in
- parentheses. With lex, the following:
-
- NAME [A-Z][A-Z0-9]*
- %%
- foo{NAME}? printf( "Found it\n" );
- %%
-
- will not match the string "foo" because when the macro is expanded
- the rule is equivalent to "foo[A-Z][A-Z0-9]*?" and the precedence
- is such that the '?' is associated with "[A-Z0-9]*". With `flex',
- the rule will be expanded to "foo([A-Z][A-Z0-9]*)?" and so the
- string "foo" will match.
-
- Note that if the definition begins with `^' or ends with `$' then
- it is *not* expanded with parentheses, to allow these operators to
- appear in definitions without losing their special meanings. But
- the `<s>, /', and `<<EOF>>' operators cannot be used in a `flex'
- definition.
-
- Using `-l' results in the `lex' behavior of no parentheses around
- the definition.
-
- The POSIX specification is that the definition be enclosed in
- parentheses.
-
- - Some implementations of `lex' allow a rule's action to begin on a
- separate line, if the rule's pattern has trailing whitespace:
-
- %%
- foo|bar<space here>
- { foobar_action(); }
-
- `flex' does not support this feature.
-
- - The `lex' `%r' (generate a Ratfor scanner) option is not
- supported. It is not part of the POSIX specification.
-
- - After a call to `unput()', `yytext' is undefined until the next
- token is matched, unless the scanner was built using `%array'.
- This is not the case with `lex' or the POSIX specification. The
- `-l' option does away with this incompatibility.
-
- - The precedence of the `{}' (numeric range) operator is different.
- `lex' interprets "abc{1,3}" as "match one, two, or three
- occurrences of 'abc'", whereas `flex' interprets it as "match 'ab'
- followed by one, two, or three occurrences of 'c'". The latter is
- in agreement with the POSIX specification.
-
- - The precedence of the `^' operator is different. `lex' interprets
- "^foo|bar" as "match either 'foo' at the beginning of a line, or
- 'bar' anywhere", whereas `flex' interprets it as "match either
- 'foo' or 'bar' if they come at the beginning of a line". The
- latter is in agreement with the POSIX specification.
-
- - The special table-size declarations such as `%a' supported by
- `lex' are not required by `flex' scanners; `flex' ignores them.
-
- - The name FLEX_SCANNER is #define'd so scanners may be written for
- use with either `flex' or `lex'. Scanners also include
- `YY_FLEX_MAJOR_VERSION' and `YY_FLEX_MINOR_VERSION' indicating
- which version of `flex' generated the scanner (for example, for the
- 2.5 release, these defines would be 2 and 5 respectively).
-
- The following `flex' features are not included in `lex' or the POSIX
-specification:
-
- C++ scanners
- %option
- start condition scopes
- start condition stacks
- interactive/non-interactive scanners
- yy_scan_string() and friends
- yyterminate()
- yy_set_interactive()
- yy_set_bol()
- YY_AT_BOL()
- <<EOF>>
- <*>
- YY_DECL
- YY_START
- YY_USER_ACTION
- YY_USER_INIT
- #line directives
- %{}'s around actions
- multiple actions on a line
-
-plus almost all of the flex flags. The last feature in the list refers
-to the fact that with `flex' you can put multiple actions on the same
-line, separated with semicolons, while with `lex', the following
-
- foo handle_foo(); ++num_foos_seen;
-
-is (rather surprisingly) truncated to
-
- foo handle_foo();
-
- `flex' does not truncate the action. Actions that are not enclosed
-in braces are simply terminated at the end of the line.
-
-
-File: flex.info, Node: Diagnostics, Next: Files, Prev: Incompatibilities, Up: Top
-
-Diagnostics
-===========
-
-`warning, rule cannot be matched'
- indicates that the given rule cannot be matched because it follows
- other rules that will always match the same text as it. For
- example, in the following "foo" cannot be matched because it comes
- after an identifier "catch-all" rule:
-
- [a-z]+ got_identifier();
- foo got_foo();
-
- Using `REJECT' in a scanner suppresses this warning.
-
-`warning, -s option given but default rule can be matched'
- means that it is possible (perhaps only in a particular start
- condition) that the default rule (match any single character) is
- the only one that will match a particular input. Since `-s' was
- given, presumably this is not intended.
-
-`reject_used_but_not_detected undefined'
-`yymore_used_but_not_detected undefined'
- These errors can occur at compile time. They indicate that the
- scanner uses `REJECT' or `yymore()' but that `flex' failed to
- notice the fact, meaning that `flex' scanned the first two sections
- looking for occurrences of these actions and failed to find any,
- but somehow you snuck some in (via a #include file, for example).
- Use `%option reject' or `%option yymore' to indicate to flex that
- you really do use these features.
-
-`flex scanner jammed'
- a scanner compiled with `-s' has encountered an input string which
- wasn't matched by any of its rules. This error can also occur due
- to internal problems.
-
-`token too large, exceeds YYLMAX'
- your scanner uses `%array' and one of its rules matched a string
- longer than the `YYL-' `MAX' constant (8K bytes by default). You
- can increase the value by #define'ing `YYLMAX' in the definitions
- section of your `flex' input.
-
-`scanner requires -8 flag to use the character 'X''
- Your scanner specification includes recognizing the 8-bit
- character X and you did not specify the -8 flag, and your scanner
- defaulted to 7-bit because you used the `-Cf' or `-CF' table
- compression options. See the discussion of the `-7' flag for
- details.
-
-`flex scanner push-back overflow'
- you used `unput()' to push back so much text that the scanner's
- buffer could not hold both the pushed-back text and the current
- token in `yytext'. Ideally the scanner should dynamically resize
- the buffer in this case, but at present it does not.
-
-`input buffer overflow, can't enlarge buffer because scanner uses REJECT'
- the scanner was working on matching an extremely large token and
- needed to expand the input buffer. This doesn't work with
- scanners that use `REJECT'.
-
-`fatal flex scanner internal error--end of buffer missed'
- This can occur in an scanner which is reentered after a long-jump
- has jumped out (or over) the scanner's activation frame. Before
- reentering the scanner, use:
-
- yyrestart( yyin );
-
- or, as noted above, switch to using the C++ scanner class.
-
-`too many start conditions in <> construct!'
- you listed more start conditions in a <> construct than exist (so
- you must have listed at least one of them twice).
-
-
-File: flex.info, Node: Files, Next: Deficiencies, Prev: Diagnostics, Up: Top
-
-Files
-=====
-
-`-lfl'
- library with which scanners must be linked.
-
-`lex.yy.c'
- generated scanner (called `lexyy.c' on some systems).
-
-`lex.yy.cc'
- generated C++ scanner class, when using `-+'.
-
-`<FlexLexer.h>'
- header file defining the C++ scanner base class, `FlexLexer', and
- its derived class, `yyFlexLexer'.
-
-`flex.skl'
- skeleton scanner. This file is only used when building flex, not
- when flex executes.
-
-`lex.backup'
- backing-up information for `-b' flag (called `lex.bck' on some
- systems).
-
-
-File: flex.info, Node: Deficiencies, Next: See also, Prev: Files, Up: Top
-
-Deficiencies / Bugs
-===================
-
- Some trailing context patterns cannot be properly matched and
-generate warning messages ("dangerous trailing context"). These are
-patterns where the ending of the first part of the rule matches the
-beginning of the second part, such as "zx*/xy*", where the 'x*' matches
-the 'x' at the beginning of the trailing context. (Note that the POSIX
-draft states that the text matched by such patterns is undefined.)
-
- For some trailing context rules, parts which are actually
-fixed-length are not recognized as such, leading to the abovementioned
-performance loss. In particular, parts using '|' or {n} (such as
-"foo{3}") are always considered variable-length.
-
- Combining trailing context with the special '|' action can result in
-*fixed* trailing context being turned into the more expensive VARIABLE
-trailing context. For example, in the following:
-
- %%
- abc |
- xyz/def
-
- Use of `unput()' invalidates yytext and yyleng, unless the `%array'
-directive or the `-l' option has been used.
-
- Pattern-matching of NUL's is substantially slower than matching
-other characters.
-
- Dynamic resizing of the input buffer is slow, as it entails
-rescanning all the text matched so far by the current (generally huge)
-token.
-
- Due to both buffering of input and read-ahead, you cannot intermix
-calls to <stdio.h> routines, such as, for example, `getchar()', with
-`flex' rules and expect it to work. Call `input()' instead.
-
- The total table entries listed by the `-v' flag excludes the number
-of table entries needed to determine what rule has been matched. The
-number of entries is equal to the number of DFA states if the scanner
-does not use `REJECT', and somewhat greater than the number of states
-if it does.
-
- `REJECT' cannot be used with the `-f' or `-F' options.
-
- The `flex' internal algorithms need documentation.
-
-
-File: flex.info, Node: See also, Next: Author, Prev: Deficiencies, Up: Top
-
-See also
-========
-
- `lex'(1), `yacc'(1), `sed'(1), `awk'(1).
-
- John Levine, Tony Mason, and Doug Brown: Lex & Yacc; O'Reilly and
-Associates. Be sure to get the 2nd edition.
-
- M. E. Lesk and E. Schmidt, LEX - Lexical Analyzer Generator.
-
- Alfred Aho, Ravi Sethi and Jeffrey Ullman: Compilers: Principles,
-Techniques and Tools; Addison-Wesley (1986). Describes the
-pattern-matching techniques used by `flex' (deterministic finite
-automata).
-
-
-File: flex.info, Node: Author, Prev: See also, Up: Top
-
-Author
-======
-
- Vern Paxson, with the help of many ideas and much inspiration from
-Van Jacobson. Original version by Jef Poskanzer. The fast table
-representation is a partial implementation of a design done by Van
-Jacobson. The implementation was done by Kevin Gong and Vern Paxson.
-
- Thanks to the many `flex' beta-testers, feedbackers, and
-contributors, especially Francois Pinard, Casey Leedom, Stan Adermann,
-Terry Allen, David Barker-Plummer, John Basrai, Nelson H.F. Beebe,
-`benson@odi.com', Karl Berry, Peter A. Bigot, Simon Blanchard, Keith
-Bostic, Frederic Brehm, Ian Brockbank, Kin Cho, Nick Christopher, Brian
-Clapper, J.T. Conklin, Jason Coughlin, Bill Cox, Nick Cropper, Dave
-Curtis, Scott David Daniels, Chris G. Demetriou, Theo Deraadt, Mike
-Donahue, Chuck Doucette, Tom Epperly, Leo Eskin, Chris Faylor, Chris
-Flatters, Jon Forrest, Joe Gayda, Kaveh R. Ghazi, Eric Goldman,
-Christopher M. Gould, Ulrich Grepel, Peer Griebel, Jan Hajic, Charles
-Hemphill, NORO Hideo, Jarkko Hietaniemi, Scott Hofmann, Jeff Honig,
-Dana Hudes, Eric Hughes, John Interrante, Ceriel Jacobs, Michal
-Jaegermann, Sakari Jalovaara, Jeffrey R. Jones, Henry Juengst, Klaus
-Kaempf, Jonathan I. Kamens, Terrence O Kane, Amir Katz,
-`ken@ken.hilco.com', Kevin B. Kenny, Steve Kirsch, Winfried Koenig,
-Marq Kole, Ronald Lamprecht, Greg Lee, Rohan Lenard, Craig Leres, John
-Levine, Steve Liddle, Mike Long, Mohamed el Lozy, Brian Madsen, Malte,
-Joe Marshall, Bengt Martensson, Chris Metcalf, Luke Mewburn, Jim
-Meyering, R. Alexander Milowski, Erik Naggum, G.T. Nicol, Landon Noll,
-James Nordby, Marc Nozell, Richard Ohnemus, Karsten Pahnke, Sven Panne,
-Roland Pesch, Walter Pelissero, Gaumond Pierre, Esmond Pitt, Jef
-Poskanzer, Joe Rahmeh, Jarmo Raiha, Frederic Raimbault, Pat Rankin,
-Rick Richardson, Kevin Rodgers, Kai Uwe Rommel, Jim Roskind, Alberto
-Santini, Andreas Scherer, Darrell Schiebel, Raf Schietekat, Doug
-Schmidt, Philippe Schnoebelen, Andreas Schwab, Alex Siegel, Eckehard
-Stolz, Jan-Erik Strvmquist, Mike Stump, Paul Stuart, Dave Tallman, Ian
-Lance Taylor, Chris Thewalt, Richard M. Timoney, Jodi Tsai, Paul
-Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms, Kent Williams, Ken
-Yap, Ron Zellar, Nathan Zelle, David Zuhn, and those whose names have
-slipped my marginal mail-archiving skills but whose contributions are
-appreciated all the same.
-
- Thanks to Keith Bostic, Jon Forrest, Noah Friedman, John Gilmore,
-Craig Leres, John Levine, Bob Mulcahy, G.T. Nicol, Francois Pinard,
-Rich Salz, and Richard Stallman for help with various distribution
-headaches.
-
- Thanks to Esmond Pitt and Earle Horton for 8-bit character support;
-to Benson Margulies and Fred Burke for C++ support; to Kent Williams
-and Tom Epperly for C++ class support; to Ove Ewerlid for support of
-NUL's; and to Eric Hughes for support of multiple buffers.
-
- This work was primarily done when I was with the Real Time Systems
-Group at the Lawrence Berkeley Laboratory in Berkeley, CA. Many thanks
-to all there for the support I received.
-
- Send comments to `vern@ee.lbl.gov'.
-
-
-
-Tag Table:
-Node: Top1430
-Node: Name2808
-Node: Synopsis2933
-Node: Overview3145
-Node: Description4986
-Node: Examples5748
-Node: Format8896
-Node: Patterns11637
-Node: Matching18138
-Node: Actions21438
-Node: Generated scanner30560
-Node: Start conditions34988
-Node: Multiple buffers45069
-Node: End-of-file rules50975
-Node: Miscellaneous52508
-Node: User variables55279
-Node: YACC interface57651
-Node: Options58542
-Node: Performance78234
-Node: C++87532
-Node: Incompatibilities94993
-Node: Diagnostics101853
-Node: Files105094
-Node: Deficiencies105715
-Node: See also107684
-Node: Author108216
-
-End Tag Table
diff --git a/WebKitTools/android/flex-2.5.4a/MISC/texinfo/flex.texi b/WebKitTools/android/flex-2.5.4a/MISC/texinfo/flex.texi
deleted file mode 100644
index 23280b1..0000000
--- a/WebKitTools/android/flex-2.5.4a/MISC/texinfo/flex.texi
+++ /dev/null
@@ -1,3448 +0,0 @@
-\input texinfo
-@c %**start of header
-@setfilename flex.info
-@settitle Flex - a scanner generator
-@c @finalout
-@c @setchapternewpage odd
-@c %**end of header
-
-@set EDITION 2.5
-@set UPDATED March 1995
-@set VERSION 2.5
-
-@c FIXME - Reread a printed copy with a red pen and patience.
-@c FIXME - Modify all "See ..." references and replace with @xref's.
-
-@ifinfo
-@format
-START-INFO-DIR-ENTRY
-* Flex: (flex). A fast scanner generator.
-END-INFO-DIR-ENTRY
-@end format
-@end ifinfo
-
-@c Define new indices for commands, filenames, and options.
-@c @defcodeindex cm
-@c @defcodeindex fl
-@c @defcodeindex op
-
-@c Put everything in one index (arbitrarily chosen to be the concept index).
-@c @syncodeindex cm cp
-@c @syncodeindex fl cp
-@syncodeindex fn cp
-@syncodeindex ky cp
-@c @syncodeindex op cp
-@syncodeindex pg cp
-@syncodeindex vr cp
-
-@ifinfo
-This file documents Flex.
-
-Copyright (c) 1990 The Regents of the University of California.
-All rights reserved.
-
-This code is derived from software contributed to Berkeley by
-Vern Paxson.
-
-The United States Government has rights in this work pursuant
-to contract no. DE-AC03-76SF00098 between the United States
-Department of Energy and the University of California.
-
-Redistribution and use in source and binary forms with or without
-modification are permitted provided that: (1) source distributions
-retain this entire copyright notice and comment, and (2)
-distributions including binaries display the following
-acknowledgement: ``This product includes software developed by the
-University of California, Berkeley and its contributors'' in the
-documentation or other materials provided with the distribution and
-in all advertising materials mentioning features or use of this
-software. Neither the name of the University nor the names of its
-contributors may be used to endorse or promote products derived
-from this software without specific prior written permission.
-
-THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
-IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
-WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-PURPOSE.
-
-@ignore
-Permission is granted to process this file through TeX and print the
-results, provided the printed document carries copying permission
-notice identical to this one except for the removal of this paragraph
-(this paragraph not being relevant to the printed manual).
-
-@end ignore
-@end ifinfo
-
-@titlepage
-@title Flex, version @value{VERSION}
-@subtitle A fast scanner generator
-@subtitle Edition @value{EDITION}, @value{UPDATED}
-@author Vern Paxson
-
-@page
-@vskip 0pt plus 1filll
-Copyright @copyright{} 1990 The Regents of the University of California.
-All rights reserved.
-
-This code is derived from software contributed to Berkeley by
-Vern Paxson.
-
-The United States Government has rights in this work pursuant
-to contract no. DE-AC03-76SF00098 between the United States
-Department of Energy and the University of California.
-
-Redistribution and use in source and binary forms with or without
-modification are permitted provided that: (1) source distributions
-retain this entire copyright notice and comment, and (2)
-distributions including binaries display the following
-acknowledgement: ``This product includes software developed by the
-University of California, Berkeley and its contributors'' in the
-documentation or other materials provided with the distribution and
-in all advertising materials mentioning features or use of this
-software. Neither the name of the University nor the names of its
-contributors may be used to endorse or promote products derived
-from this software without specific prior written permission.
-
-THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
-IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
-WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-PURPOSE.
-@end titlepage
-
-@ifinfo
-
-@node Top, Name, (dir), (dir)
-@top flex
-
-@cindex scanner generator
-
-This manual documents @code{flex}. It covers release @value{VERSION}.
-
-@menu
-* Name:: Name
-* Synopsis:: Synopsis
-* Overview:: Overview
-* Description:: Description
-* Examples:: Some simple examples
-* Format:: Format of the input file
-* Patterns:: Patterns
-* Matching:: How the input is matched
-* Actions:: Actions
-* Generated scanner:: The generated scanner
-* Start conditions:: Start conditions
-* Multiple buffers:: Multiple input buffers
-* End-of-file rules:: End-of-file rules
-* Miscellaneous:: Miscellaneous macros
-* User variables:: Values available to the user
-* YACC interface:: Interfacing with @code{yacc}
-* Options:: Options
-* Performance:: Performance considerations
-* C++:: Generating C++ scanners
-* Incompatibilities:: Incompatibilities with @code{lex} and POSIX
-* Diagnostics:: Diagnostics
-* Files:: Files
-* Deficiencies:: Deficiencies / Bugs
-* See also:: See also
-* Author:: Author
-@c * Index:: Index
-@end menu
-
-@end ifinfo
-
-@node Name, Synopsis, Top, Top
-@section Name
-
-flex - fast lexical analyzer generator
-
-@node Synopsis, Overview, Name, Top
-@section Synopsis
-
-@example
-flex [-bcdfhilnpstvwBFILTV78+? -C[aefFmr] -ooutput -Pprefix -Sskeleton]
-[--help --version] [@var{filename} @dots{}]
-@end example
-
-@node Overview, Description, Synopsis, Top
-@section Overview
-
-This manual describes @code{flex}, a tool for generating programs
-that perform pattern-matching on text. The manual
-includes both tutorial and reference sections:
-
-@table @asis
-@item Description
-a brief overview of the tool
-
-@item Some Simple Examples
-
-@item Format Of The Input File
-
-@item Patterns
-the extended regular expressions used by flex
-
-@item How The Input Is Matched
-the rules for determining what has been matched
-
-@item Actions
-how to specify what to do when a pattern is matched
-
-@item The Generated Scanner
-details regarding the scanner that flex produces;
-how to control the input source
-
-@item Start Conditions
-introducing context into your scanners, and
-managing "mini-scanners"
-
-@item Multiple Input Buffers
-how to manipulate multiple input sources; how to
-scan from strings instead of files
-
-@item End-of-file Rules
-special rules for matching the end of the input
-
-@item Miscellaneous Macros
-a summary of macros available to the actions
-
-@item Values Available To The User
-a summary of values available to the actions
-
-@item Interfacing With Yacc
-connecting flex scanners together with yacc parsers
-
-@item Options
-flex command-line options, and the "%option"
-directive
-
-@item Performance Considerations
-how to make your scanner go as fast as possible
-
-@item Generating C++ Scanners
-the (experimental) facility for generating C++
-scanner classes
-
-@item Incompatibilities With Lex And POSIX
-how flex differs from AT&T lex and the POSIX lex
-standard
-
-@item Diagnostics
-those error messages produced by flex (or scanners
-it generates) whose meanings might not be apparent
-
-@item Files
-files used by flex
-
-@item Deficiencies / Bugs
-known problems with flex
-
-@item See Also
-other documentation, related tools
-
-@item Author
-includes contact information
-@end table
-
-@node Description, Examples, Overview, Top
-@section Description
-
-@code{flex} is a tool for generating @dfn{scanners}: programs which
-recognized lexical patterns in text. @code{flex} reads the given
-input files, or its standard input if no file names are
-given, for a description of a scanner to generate. The
-description is in the form of pairs of regular expressions
-and C code, called @dfn{rules}. @code{flex} generates as output a C
-source file, @file{lex.yy.c}, which defines a routine @samp{yylex()}.
-This file is compiled and linked with the @samp{-lfl} library to
-produce an executable. When the executable is run, it
-analyzes its input for occurrences of the regular
-expressions. Whenever it finds one, it executes the
-corresponding C code.
-
-@node Examples, Format, Description, Top
-@section Some simple examples
-
-First some simple examples to get the flavor of how one
-uses @code{flex}. The following @code{flex} input specifies a scanner
-which whenever it encounters the string "username" will
-replace it with the user's login name:
-
-@example
-%%
-username printf( "%s", getlogin() );
-@end example
-
-By default, any text not matched by a @code{flex} scanner is
-copied to the output, so the net effect of this scanner is
-to copy its input file to its output with each occurrence
-of "username" expanded. In this input, there is just one
-rule. "username" is the @var{pattern} and the "printf" is the
-@var{action}. The "%%" marks the beginning of the rules.
-
-Here's another simple example:
-
-@example
- int num_lines = 0, num_chars = 0;
-
-%%
-\n ++num_lines; ++num_chars;
-. ++num_chars;
-
-%%
-main()
- @{
- yylex();
- printf( "# of lines = %d, # of chars = %d\n",
- num_lines, num_chars );
- @}
-@end example
-
-This scanner counts the number of characters and the
-number of lines in its input (it produces no output other
-than the final report on the counts). The first line
-declares two globals, "num_lines" and "num_chars", which
-are accessible both inside @samp{yylex()} and in the @samp{main()}
-routine declared after the second "%%". There are two rules,
-one which matches a newline ("\n") and increments both the
-line count and the character count, and one which matches
-any character other than a newline (indicated by the "."
-regular expression).
-
-A somewhat more complicated example:
-
-@example
-/* scanner for a toy Pascal-like language */
-
-%@{
-/* need this for the call to atof() below */
-#include <math.h>
-%@}
-
-DIGIT [0-9]
-ID [a-z][a-z0-9]*
-
-%%
-
-@{DIGIT@}+ @{
- printf( "An integer: %s (%d)\n", yytext,
- atoi( yytext ) );
- @}
-
-@{DIGIT@}+"."@{DIGIT@}* @{
- printf( "A float: %s (%g)\n", yytext,
- atof( yytext ) );
- @}
-
-if|then|begin|end|procedure|function @{
- printf( "A keyword: %s\n", yytext );
- @}
-
-@{ID@} printf( "An identifier: %s\n", yytext );
-
-"+"|"-"|"*"|"/" printf( "An operator: %s\n", yytext );
-
-"@{"[^@}\n]*"@}" /* eat up one-line comments */
-
-[ \t\n]+ /* eat up whitespace */
-
-. printf( "Unrecognized character: %s\n", yytext );
-
-%%
-
-main( argc, argv )
-int argc;
-char **argv;
- @{
- ++argv, --argc; /* skip over program name */
- if ( argc > 0 )
- yyin = fopen( argv[0], "r" );
- else
- yyin = stdin;
-
- yylex();
- @}
-@end example
-
-This is the beginnings of a simple scanner for a language
-like Pascal. It identifies different types of @var{tokens} and
-reports on what it has seen.
-
-The details of this example will be explained in the
-following sections.
-
-@node Format, Patterns, Examples, Top
-@section Format of the input file
-
-The @code{flex} input file consists of three sections, separated
-by a line with just @samp{%%} in it:
-
-@example
-definitions
-%%
-rules
-%%
-user code
-@end example
-
-The @dfn{definitions} section contains declarations of simple
-@dfn{name} definitions to simplify the scanner specification,
-and declarations of @dfn{start conditions}, which are explained
-in a later section.
-Name definitions have the form:
-
-@example
-name definition
-@end example
-
-The "name" is a word beginning with a letter or an
-underscore ('_') followed by zero or more letters, digits, '_',
-or '-' (dash). The definition is taken to begin at the
-first non-white-space character following the name and
-continuing to the end of the line. The definition can
-subsequently be referred to using "@{name@}", which will
-expand to "(definition)". For example,
-
-@example
-DIGIT [0-9]
-ID [a-z][a-z0-9]*
-@end example
-
-@noindent
-defines "DIGIT" to be a regular expression which matches a
-single digit, and "ID" to be a regular expression which
-matches a letter followed by zero-or-more
-letters-or-digits. A subsequent reference to
-
-@example
-@{DIGIT@}+"."@{DIGIT@}*
-@end example
-
-@noindent
-is identical to
-
-@example
-([0-9])+"."([0-9])*
-@end example
-
-@noindent
-and matches one-or-more digits followed by a '.' followed
-by zero-or-more digits.
-
-The @var{rules} section of the @code{flex} input contains a series of
-rules of the form:
-
-@example
-pattern action
-@end example
-
-@noindent
-where the pattern must be unindented and the action must
-begin on the same line.
-
-See below for a further description of patterns and
-actions.
-
-Finally, the user code section is simply copied to
-@file{lex.yy.c} verbatim. It is used for companion routines
-which call or are called by the scanner. The presence of
-this section is optional; if it is missing, the second @samp{%%}
-in the input file may be skipped, too.
-
-In the definitions and rules sections, any @emph{indented} text or
-text enclosed in @samp{%@{} and @samp{%@}} is copied verbatim to the
-output (with the @samp{%@{@}}'s removed). The @samp{%@{@}}'s must
-appear unindented on lines by themselves.
-
-In the rules section, any indented or %@{@} text appearing
-before the first rule may be used to declare variables
-which are local to the scanning routine and (after the
-declarations) code which is to be executed whenever the
-scanning routine is entered. Other indented or %@{@} text
-in the rule section is still copied to the output, but its
-meaning is not well-defined and it may well cause
-compile-time errors (this feature is present for @code{POSIX} compliance;
-see below for other such features).
-
-In the definitions section (but not in the rules section),
-an unindented comment (i.e., a line beginning with "/*")
-is also copied verbatim to the output up to the next "*/".
-
-@node Patterns, Matching, Format, Top
-@section Patterns
-
-The patterns in the input are written using an extended
-set of regular expressions. These are:
-
-@table @samp
-@item x
-match the character @samp{x}
-@item .
-any character (byte) except newline
-@item [xyz]
-a "character class"; in this case, the pattern
-matches either an @samp{x}, a @samp{y}, or a @samp{z}
-@item [abj-oZ]
-a "character class" with a range in it; matches
-an @samp{a}, a @samp{b}, any letter from @samp{j} through @samp{o},
-or a @samp{Z}
-@item [^A-Z]
-a "negated character class", i.e., any character
-but those in the class. In this case, any
-character EXCEPT an uppercase letter.
-@item [^A-Z\n]
-any character EXCEPT an uppercase letter or
-a newline
-@item @var{r}*
-zero or more @var{r}'s, where @var{r} is any regular expression
-@item @var{r}+
-one or more @var{r}'s
-@item @var{r}?
-zero or one @var{r}'s (that is, "an optional @var{r}")
-@item @var{r}@{2,5@}
-anywhere from two to five @var{r}'s
-@item @var{r}@{2,@}
-two or more @var{r}'s
-@item @var{r}@{4@}
-exactly 4 @var{r}'s
-@item @{@var{name}@}
-the expansion of the "@var{name}" definition
-(see above)
-@item "[xyz]\"foo"
-the literal string: @samp{[xyz]"foo}
-@item \@var{x}
-if @var{x} is an @samp{a}, @samp{b}, @samp{f}, @samp{n}, @samp{r}, @samp{t}, or @samp{v},
-then the ANSI-C interpretation of \@var{x}.
-Otherwise, a literal @samp{@var{x}} (used to escape
-operators such as @samp{*})
-@item \0
-a NUL character (ASCII code 0)
-@item \123
-the character with octal value 123
-@item \x2a
-the character with hexadecimal value @code{2a}
-@item (@var{r})
-match an @var{r}; parentheses are used to override
-precedence (see below)
-@item @var{r}@var{s}
-the regular expression @var{r} followed by the
-regular expression @var{s}; called "concatenation"
-@item @var{r}|@var{s}
-either an @var{r} or an @var{s}
-@item @var{r}/@var{s}
-an @var{r} but only if it is followed by an @var{s}. The text
-matched by @var{s} is included when determining whether this rule is
-the @dfn{longest match}, but is then returned to the input before
-the action is executed. So the action only sees the text matched
-by @var{r}. This type of pattern is called @dfn{trailing context}.
-(There are some combinations of @samp{@var{r}/@var{s}} that @code{flex}
-cannot match correctly; see notes in the Deficiencies / Bugs section
-below regarding "dangerous trailing context".)
-@item ^@var{r}
-an @var{r}, but only at the beginning of a line (i.e.,
-which just starting to scan, or right after a
-newline has been scanned).
-@item @var{r}$
-an @var{r}, but only at the end of a line (i.e., just
-before a newline). Equivalent to "@var{r}/\n".
-
-Note that flex's notion of "newline" is exactly
-whatever the C compiler used to compile flex
-interprets '\n' as; in particular, on some DOS
-systems you must either filter out \r's in the
-input yourself, or explicitly use @var{r}/\r\n for "r$".
-@item <@var{s}>@var{r}
-an @var{r}, but only in start condition @var{s} (see
-below for discussion of start conditions)
-<@var{s1},@var{s2},@var{s3}>@var{r}
-same, but in any of start conditions @var{s1},
-@var{s2}, or @var{s3}
-@item <*>@var{r}
-an @var{r} in any start condition, even an exclusive one.
-@item <<EOF>>
-an end-of-file
-<@var{s1},@var{s2}><<EOF>>
-an end-of-file when in start condition @var{s1} or @var{s2}
-@end table
-
-Note that inside of a character class, all regular
-expression operators lose their special meaning except escape
-('\') and the character class operators, '-', ']', and, at
-the beginning of the class, '^'.
-
-The regular expressions listed above are grouped according
-to precedence, from highest precedence at the top to
-lowest at the bottom. Those grouped together have equal
-precedence. For example,
-
-@example
-foo|bar*
-@end example
-
-@noindent
-is the same as
-
-@example
-(foo)|(ba(r*))
-@end example
-
-@noindent
-since the '*' operator has higher precedence than
-concatenation, and concatenation higher than alternation ('|').
-This pattern therefore matches @emph{either} the string "foo" @emph{or}
-the string "ba" followed by zero-or-more r's. To match
-"foo" or zero-or-more "bar"'s, use:
-
-@example
-foo|(bar)*
-@end example
-
-@noindent
-and to match zero-or-more "foo"'s-or-"bar"'s:
-
-@example
-(foo|bar)*
-@end example
-
-In addition to characters and ranges of characters,
-character classes can also contain character class
-@dfn{expressions}. These are expressions enclosed inside @samp{[}: and @samp{:}]
-delimiters (which themselves must appear between the '['
-and ']' of the character class; other elements may occur
-inside the character class, too). The valid expressions
-are:
-
-@example
-[:alnum:] [:alpha:] [:blank:]
-[:cntrl:] [:digit:] [:graph:]
-[:lower:] [:print:] [:punct:]
-[:space:] [:upper:] [:xdigit:]
-@end example
-
-These expressions all designate a set of characters
-equivalent to the corresponding standard C @samp{isXXX} function. For
-example, @samp{[:alnum:]} designates those characters for which
-@samp{isalnum()} returns true - i.e., any alphabetic or numeric.
-Some systems don't provide @samp{isblank()}, so flex defines
-@samp{[:blank:]} as a blank or a tab.
-
-For example, the following character classes are all
-equivalent:
-
-@example
-[[:alnum:]]
-[[:alpha:][:digit:]
-[[:alpha:]0-9]
-[a-zA-Z0-9]
-@end example
-
-If your scanner is case-insensitive (the @samp{-i} flag), then
-@samp{[:upper:]} and @samp{[:lower:]} are equivalent to @samp{[:alpha:]}.
-
-Some notes on patterns:
-
-@itemize -
-@item
-A negated character class such as the example
-"[^A-Z]" above @emph{will match a newline} unless "\n" (or an
-equivalent escape sequence) is one of the
-characters explicitly present in the negated character
-class (e.g., "[^A-Z\n]"). This is unlike how many
-other regular expression tools treat negated
-character classes, but unfortunately the inconsistency
-is historically entrenched. Matching newlines
-means that a pattern like [^"]* can match the
-entire input unless there's another quote in the
-input.
-
-@item
-A rule can have at most one instance of trailing
-context (the '/' operator or the '$' operator).
-The start condition, '^', and "<<EOF>>" patterns
-can only occur at the beginning of a pattern, and,
-as well as with '/' and '$', cannot be grouped
-inside parentheses. A '^' which does not occur at
-the beginning of a rule or a '$' which does not
-occur at the end of a rule loses its special
-properties and is treated as a normal character.
-
-The following are illegal:
-
-@example
-foo/bar$
-<sc1>foo<sc2>bar
-@end example
-
-Note that the first of these, can be written
-"foo/bar\n".
-
-The following will result in '$' or '^' being
-treated as a normal character:
-
-@example
-foo|(bar$)
-foo|^bar
-@end example
-
-If what's wanted is a "foo" or a
-bar-followed-by-a-newline, the following could be used (the special
-'|' action is explained below):
-
-@example
-foo |
-bar$ /* action goes here */
-@end example
-
-A similar trick will work for matching a foo or a
-bar-at-the-beginning-of-a-line.
-@end itemize
-
-@node Matching, Actions, Patterns, Top
-@section How the input is matched
-
-When the generated scanner is run, it analyzes its input
-looking for strings which match any of its patterns. If
-it finds more than one match, it takes the one matching
-the most text (for trailing context rules, this includes
-the length of the trailing part, even though it will then
-be returned to the input). If it finds two or more
-matches of the same length, the rule listed first in the
-@code{flex} input file is chosen.
-
-Once the match is determined, the text corresponding to
-the match (called the @var{token}) is made available in the
-global character pointer @code{yytext}, and its length in the
-global integer @code{yyleng}. The @var{action} corresponding to the
-matched pattern is then executed (a more detailed
-description of actions follows), and then the remaining input is
-scanned for another match.
-
-If no match is found, then the @dfn{default rule} is executed:
-the next character in the input is considered matched and
-copied to the standard output. Thus, the simplest legal
-@code{flex} input is:
-
-@example
-%%
-@end example
-
-which generates a scanner that simply copies its input
-(one character at a time) to its output.
-
-Note that @code{yytext} can be defined in two different ways:
-either as a character @emph{pointer} or as a character @emph{array}.
-You can control which definition @code{flex} uses by including
-one of the special directives @samp{%pointer} or @samp{%array} in the
-first (definitions) section of your flex input. The
-default is @samp{%pointer}, unless you use the @samp{-l} lex
-compatibility option, in which case @code{yytext} will be an array. The
-advantage of using @samp{%pointer} is substantially faster
-scanning and no buffer overflow when matching very large
-tokens (unless you run out of dynamic memory). The
-disadvantage is that you are restricted in how your actions can
-modify @code{yytext} (see the next section), and calls to the
-@samp{unput()} function destroys the present contents of @code{yytext},
-which can be a considerable porting headache when moving
-between different @code{lex} versions.
-
-The advantage of @samp{%array} is that you can then modify @code{yytext}
-to your heart's content, and calls to @samp{unput()} do not
-destroy @code{yytext} (see below). Furthermore, existing @code{lex}
-programs sometimes access @code{yytext} externally using
-declarations of the form:
-@example
-extern char yytext[];
-@end example
-This definition is erroneous when used with @samp{%pointer}, but
-correct for @samp{%array}.
-
-@samp{%array} defines @code{yytext} to be an array of @code{YYLMAX} characters,
-which defaults to a fairly large value. You can change
-the size by simply #define'ing @code{YYLMAX} to a different value
-in the first section of your @code{flex} input. As mentioned
-above, with @samp{%pointer} yytext grows dynamically to
-accommodate large tokens. While this means your @samp{%pointer} scanner
-can accommodate very large tokens (such as matching entire
-blocks of comments), bear in mind that each time the
-scanner must resize @code{yytext} it also must rescan the entire
-token from the beginning, so matching such tokens can
-prove slow. @code{yytext} presently does @emph{not} dynamically grow if
-a call to @samp{unput()} results in too much text being pushed
-back; instead, a run-time error results.
-
-Also note that you cannot use @samp{%array} with C++ scanner
-classes (the @code{c++} option; see below).
-
-@node Actions, Generated scanner, Matching, Top
-@section Actions
-
-Each pattern in a rule has a corresponding action, which
-can be any arbitrary C statement. The pattern ends at the
-first non-escaped whitespace character; the remainder of
-the line is its action. If the action is empty, then when
-the pattern is matched the input token is simply
-discarded. For example, here is the specification for a
-program which deletes all occurrences of "zap me" from its
-input:
-
-@example
-%%
-"zap me"
-@end example
-
-(It will copy all other characters in the input to the
-output since they will be matched by the default rule.)
-
-Here is a program which compresses multiple blanks and
-tabs down to a single blank, and throws away whitespace
-found at the end of a line:
-
-@example
-%%
-[ \t]+ putchar( ' ' );
-[ \t]+$ /* ignore this token */
-@end example
-
-If the action contains a '@{', then the action spans till
-the balancing '@}' is found, and the action may cross
-multiple lines. @code{flex} knows about C strings and comments and
-won't be fooled by braces found within them, but also
-allows actions to begin with @samp{%@{} and will consider the
-action to be all the text up to the next @samp{%@}} (regardless of
-ordinary braces inside the action).
-
-An action consisting solely of a vertical bar ('|') means
-"same as the action for the next rule." See below for an
-illustration.
-
-Actions can include arbitrary C code, including @code{return}
-statements to return a value to whatever routine called
-@samp{yylex()}. Each time @samp{yylex()} is called it continues
-processing tokens from where it last left off until it either
-reaches the end of the file or executes a return.
-
-Actions are free to modify @code{yytext} except for lengthening
-it (adding characters to its end--these will overwrite
-later characters in the input stream). This however does
-not apply when using @samp{%array} (see above); in that case,
-@code{yytext} may be freely modified in any way.
-
-Actions are free to modify @code{yyleng} except they should not
-do so if the action also includes use of @samp{yymore()} (see
-below).
-
-There are a number of special directives which can be
-included within an action:
-
-@itemize -
-@item
-@samp{ECHO} copies yytext to the scanner's output.
-
-@item
-@code{BEGIN} followed by the name of a start condition
-places the scanner in the corresponding start
-condition (see below).
-
-@item
-@code{REJECT} directs the scanner to proceed on to the
-"second best" rule which matched the input (or a
-prefix of the input). The rule is chosen as
-described above in "How the Input is Matched", and
-@code{yytext} and @code{yyleng} set up appropriately. It may
-either be one which matched as much text as the
-originally chosen rule but came later in the @code{flex}
-input file, or one which matched less text. For
-example, the following will both count the words in
-the input and call the routine special() whenever
-"frob" is seen:
-
-@example
- int word_count = 0;
-%%
-
-frob special(); REJECT;
-[^ \t\n]+ ++word_count;
-@end example
-
-Without the @code{REJECT}, any "frob"'s in the input would
-not be counted as words, since the scanner normally
-executes only one action per token. Multiple
-@code{REJECT's} are allowed, each one finding the next
-best choice to the currently active rule. For
-example, when the following scanner scans the token
-"abcd", it will write "abcdabcaba" to the output:
-
-@example
-%%
-a |
-ab |
-abc |
-abcd ECHO; REJECT;
-.|\n /* eat up any unmatched character */
-@end example
-
-(The first three rules share the fourth's action
-since they use the special '|' action.) @code{REJECT} is
-a particularly expensive feature in terms of
-scanner performance; if it is used in @emph{any} of the
-scanner's actions it will slow down @emph{all} of the
-scanner's matching. Furthermore, @code{REJECT} cannot be used
-with the @samp{-Cf} or @samp{-CF} options (see below).
-
-Note also that unlike the other special actions,
-@code{REJECT} is a @emph{branch}; code immediately following it
-in the action will @emph{not} be executed.
-
-@item
-@samp{yymore()} tells the scanner that the next time it
-matches a rule, the corresponding token should be
-@emph{appended} onto the current value of @code{yytext} rather
-than replacing it. For example, given the input
-"mega-kludge" the following will write
-"mega-mega-kludge" to the output:
-
-@example
-%%
-mega- ECHO; yymore();
-kludge ECHO;
-@end example
-
-First "mega-" is matched and echoed to the output.
-Then "kludge" is matched, but the previous "mega-"
-is still hanging around at the beginning of @code{yytext}
-so the @samp{ECHO} for the "kludge" rule will actually
-write "mega-kludge".
-@end itemize
-
-Two notes regarding use of @samp{yymore()}. First, @samp{yymore()}
-depends on the value of @code{yyleng} correctly reflecting the
-size of the current token, so you must not modify @code{yyleng}
-if you are using @samp{yymore()}. Second, the presence of
-@samp{yymore()} in the scanner's action entails a minor
-performance penalty in the scanner's matching speed.
-
-@itemize -
-@item
-@samp{yyless(n)} returns all but the first @var{n} characters of
-the current token back to the input stream, where
-they will be rescanned when the scanner looks for
-the next match. @code{yytext} and @code{yyleng} are adjusted
-appropriately (e.g., @code{yyleng} will now be equal to @var{n}
-). For example, on the input "foobar" the
-following will write out "foobarbar":
-
-@example
-%%
-foobar ECHO; yyless(3);
-[a-z]+ ECHO;
-@end example
-
-An argument of 0 to @code{yyless} will cause the entire
-current input string to be scanned again. Unless
-you've changed how the scanner will subsequently
-process its input (using @code{BEGIN}, for example), this
-will result in an endless loop.
-
-Note that @code{yyless} is a macro and can only be used in the
-flex input file, not from other source files.
-
-@item
-@samp{unput(c)} puts the character @code{c} back onto the input
-stream. It will be the next character scanned.
-The following action will take the current token
-and cause it to be rescanned enclosed in
-parentheses.
-
-@example
-@{
-int i;
-/* Copy yytext because unput() trashes yytext */
-char *yycopy = strdup( yytext );
-unput( ')' );
-for ( i = yyleng - 1; i >= 0; --i )
- unput( yycopy[i] );
-unput( '(' );
-free( yycopy );
-@}
-@end example
-
-Note that since each @samp{unput()} puts the given
-character back at the @emph{beginning} of the input stream,
-pushing back strings must be done back-to-front.
-An important potential problem when using @samp{unput()} is that
-if you are using @samp{%pointer} (the default), a call to @samp{unput()}
-@emph{destroys} the contents of @code{yytext}, starting with its
-rightmost character and devouring one character to the left
-with each call. If you need the value of yytext preserved
-after a call to @samp{unput()} (as in the above example), you
-must either first copy it elsewhere, or build your scanner
-using @samp{%array} instead (see How The Input Is Matched).
-
-Finally, note that you cannot put back @code{EOF} to attempt to
-mark the input stream with an end-of-file.
-
-@item
-@samp{input()} reads the next character from the input
-stream. For example, the following is one way to
-eat up C comments:
-
-@example
-%%
-"/*" @{
- register int c;
-
- for ( ; ; )
- @{
- while ( (c = input()) != '*' &&
- c != EOF )
- ; /* eat up text of comment */
-
- if ( c == '*' )
- @{
- while ( (c = input()) == '*' )
- ;
- if ( c == '/' )
- break; /* found the end */
- @}
-
- if ( c == EOF )
- @{
- error( "EOF in comment" );
- break;
- @}
- @}
- @}
-@end example
-
-(Note that if the scanner is compiled using @samp{C++},
-then @samp{input()} is instead referred to as @samp{yyinput()},
-in order to avoid a name clash with the @samp{C++} stream
-by the name of @code{input}.)
-
-@item YY_FLUSH_BUFFER
-flushes the scanner's internal buffer so that the next time the scanner
-attempts to match a token, it will first refill the buffer using
-@code{YY_INPUT} (see The Generated Scanner, below). This action is
-a special case of the more general @samp{yy_flush_buffer()} function,
-described below in the section Multiple Input Buffers.
-
-@item
-@samp{yyterminate()} can be used in lieu of a return
-statement in an action. It terminates the scanner
-and returns a 0 to the scanner's caller, indicating
-"all done". By default, @samp{yyterminate()} is also
-called when an end-of-file is encountered. It is a
-macro and may be redefined.
-@end itemize
-
-@node Generated scanner, Start conditions, Actions, Top
-@section The generated scanner
-
-The output of @code{flex} is the file @file{lex.yy.c}, which contains
-the scanning routine @samp{yylex()}, a number of tables used by
-it for matching tokens, and a number of auxiliary routines
-and macros. By default, @samp{yylex()} is declared as follows:
-
-@example
-int yylex()
- @{
- @dots{} various definitions and the actions in here @dots{}
- @}
-@end example
-
-(If your environment supports function prototypes, then it
-will be "int yylex( void )".) This definition may be
-changed by defining the "YY_DECL" macro. For example, you
-could use:
-
-@example
-#define YY_DECL float lexscan( a, b ) float a, b;
-@end example
-
-to give the scanning routine the name @code{lexscan}, returning a
-float, and taking two floats as arguments. Note that if
-you give arguments to the scanning routine using a
-K&R-style/non-prototyped function declaration, you must
-terminate the definition with a semi-colon (@samp{;}).
-
-Whenever @samp{yylex()} is called, it scans tokens from the
-global input file @code{yyin} (which defaults to stdin). It
-continues until it either reaches an end-of-file (at which
-point it returns the value 0) or one of its actions
-executes a @code{return} statement.
-
-If the scanner reaches an end-of-file, subsequent calls are undefined
-unless either @code{yyin} is pointed at a new input file (in which case
-scanning continues from that file), or @samp{yyrestart()} is called.
-@samp{yyrestart()} takes one argument, a @samp{FILE *} pointer (which
-can be nil, if you've set up @code{YY_INPUT} to scan from a source
-other than @code{yyin}), and initializes @code{yyin} for scanning from
-that file. Essentially there is no difference between just assigning
-@code{yyin} to a new input file or using @samp{yyrestart()} to do so;
-the latter is available for compatibility with previous versions of
-@code{flex}, and because it can be used to switch input files in the
-middle of scanning. It can also be used to throw away the current
-input buffer, by calling it with an argument of @code{yyin}; but
-better is to use @code{YY_FLUSH_BUFFER} (see above). Note that
-@samp{yyrestart()} does @emph{not} reset the start condition to
-@code{INITIAL} (see Start Conditions, below).
-
-
-If @samp{yylex()} stops scanning due to executing a @code{return}
-statement in one of the actions, the scanner may then be called
-again and it will resume scanning where it left off.
-
-By default (and for purposes of efficiency), the scanner
-uses block-reads rather than simple @samp{getc()} calls to read
-characters from @code{yyin}. The nature of how it gets its input
-can be controlled by defining the @code{YY_INPUT} macro.
-YY_INPUT's calling sequence is
-"YY_INPUT(buf,result,max_size)". Its action is to place
-up to @var{max_size} characters in the character array @var{buf} and
-return in the integer variable @var{result} either the number of
-characters read or the constant YY_NULL (0 on Unix
-systems) to indicate EOF. The default YY_INPUT reads from
-the global file-pointer "yyin".
-
-A sample definition of YY_INPUT (in the definitions
-section of the input file):
-
-@example
-%@{
-#define YY_INPUT(buf,result,max_size) \
- @{ \
- int c = getchar(); \
- result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
- @}
-%@}
-@end example
-
-This definition will change the input processing to occur
-one character at a time.
-
-When the scanner receives an end-of-file indication from
-YY_INPUT, it then checks the @samp{yywrap()} function. If
-@samp{yywrap()} returns false (zero), then it is assumed that the
-function has gone ahead and set up @code{yyin} to point to
-another input file, and scanning continues. If it returns
-true (non-zero), then the scanner terminates, returning 0
-to its caller. Note that in either case, the start
-condition remains unchanged; it does @emph{not} revert to @code{INITIAL}.
-
-If you do not supply your own version of @samp{yywrap()}, then you
-must either use @samp{%option noyywrap} (in which case the scanner
-behaves as though @samp{yywrap()} returned 1), or you must link with
-@samp{-lfl} to obtain the default version of the routine, which always
-returns 1.
-
-Three routines are available for scanning from in-memory
-buffers rather than files: @samp{yy_scan_string()},
-@samp{yy_scan_bytes()}, and @samp{yy_scan_buffer()}. See the discussion
-of them below in the section Multiple Input Buffers.
-
-The scanner writes its @samp{ECHO} output to the @code{yyout} global
-(default, stdout), which may be redefined by the user
-simply by assigning it to some other @code{FILE} pointer.
-
-@node Start conditions, Multiple buffers, Generated scanner, Top
-@section Start conditions
-
-@code{flex} provides a mechanism for conditionally activating
-rules. Any rule whose pattern is prefixed with "<sc>"
-will only be active when the scanner is in the start
-condition named "sc". For example,
-
-@example
-<STRING>[^"]* @{ /* eat up the string body ... */
- @dots{}
- @}
-@end example
-
-@noindent
-will be active only when the scanner is in the "STRING"
-start condition, and
-
-@example
-<INITIAL,STRING,QUOTE>\. @{ /* handle an escape ... */
- @dots{}
- @}
-@end example
-
-@noindent
-will be active only when the current start condition is
-either "INITIAL", "STRING", or "QUOTE".
-
-Start conditions are declared in the definitions (first)
-section of the input using unindented lines beginning with
-either @samp{%s} or @samp{%x} followed by a list of names. The former
-declares @emph{inclusive} start conditions, the latter @emph{exclusive}
-start conditions. A start condition is activated using
-the @code{BEGIN} action. Until the next @code{BEGIN} action is
-executed, rules with the given start condition will be active
-and rules with other start conditions will be inactive.
-If the start condition is @emph{inclusive}, then rules with no
-start conditions at all will also be active. If it is
-@emph{exclusive}, then @emph{only} rules qualified with the start
-condition will be active. A set of rules contingent on the
-same exclusive start condition describe a scanner which is
-independent of any of the other rules in the @code{flex} input.
-Because of this, exclusive start conditions make it easy
-to specify "mini-scanners" which scan portions of the
-input that are syntactically different from the rest
-(e.g., comments).
-
-If the distinction between inclusive and exclusive start
-conditions is still a little vague, here's a simple
-example illustrating the connection between the two. The set
-of rules:
-
-@example
-%s example
-%%
-
-<example>foo do_something();
-
-bar something_else();
-@end example
-
-@noindent
-is equivalent to
-
-@example
-%x example
-%%
-
-<example>foo do_something();
-
-<INITIAL,example>bar something_else();
-@end example
-
-Without the @samp{<INITIAL,example>} qualifier, the @samp{bar} pattern
-in the second example wouldn't be active (i.e., couldn't match) when
-in start condition @samp{example}. If we just used @samp{<example>}
-to qualify @samp{bar}, though, then it would only be active in
-@samp{example} and not in @code{INITIAL}, while in the first example
-it's active in both, because in the first example the @samp{example}
-starting condition is an @emph{inclusive} (@samp{%s}) start condition.
-
-Also note that the special start-condition specifier @samp{<*>}
-matches every start condition. Thus, the above example
-could also have been written;
-
-@example
-%x example
-%%
-
-<example>foo do_something();
-
-<*>bar something_else();
-@end example
-
-The default rule (to @samp{ECHO} any unmatched character) remains
-active in start conditions. It is equivalent to:
-
-@example
-<*>.|\\n ECHO;
-@end example
-
-@samp{BEGIN(0)} returns to the original state where only the
-rules with no start conditions are active. This state can
-also be referred to as the start-condition "INITIAL", so
-@samp{BEGIN(INITIAL)} is equivalent to @samp{BEGIN(0)}. (The
-parentheses around the start condition name are not required but
-are considered good style.)
-
-@code{BEGIN} actions can also be given as indented code at the
-beginning of the rules section. For example, the
-following will cause the scanner to enter the "SPECIAL" start
-condition whenever @samp{yylex()} is called and the global
-variable @code{enter_special} is true:
-
-@example
- int enter_special;
-
-%x SPECIAL
-%%
- if ( enter_special )
- BEGIN(SPECIAL);
-
-<SPECIAL>blahblahblah
-@dots{}more rules follow@dots{}
-@end example
-
-To illustrate the uses of start conditions, here is a
-scanner which provides two different interpretations of a
-string like "123.456". By default it will treat it as as
-three tokens, the integer "123", a dot ('.'), and the
-integer "456". But if the string is preceded earlier in
-the line by the string "expect-floats" it will treat it as
-a single token, the floating-point number 123.456:
-
-@example
-%@{
-#include <math.h>
-%@}
-%s expect
-
-%%
-expect-floats BEGIN(expect);
-
-<expect>[0-9]+"."[0-9]+ @{
- printf( "found a float, = %f\n",
- atof( yytext ) );
- @}
-<expect>\n @{
- /* that's the end of the line, so
- * we need another "expect-number"
- * before we'll recognize any more
- * numbers
- */
- BEGIN(INITIAL);
- @}
-
-[0-9]+ @{
-
-Version 2.5 December 1994 18
-
- printf( "found an integer, = %d\n",
- atoi( yytext ) );
- @}
-
-"." printf( "found a dot\n" );
-@end example
-
-Here is a scanner which recognizes (and discards) C
-comments while maintaining a count of the current input line.
-
-@example
-%x comment
-%%
- int line_num = 1;
-
-"/*" BEGIN(comment);
-
-<comment>[^*\n]* /* eat anything that's not a '*' */
-<comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */
-<comment>\n ++line_num;
-<comment>"*"+"/" BEGIN(INITIAL);
-@end example
-
-This scanner goes to a bit of trouble to match as much
-text as possible with each rule. In general, when
-attempting to write a high-speed scanner try to match as
-much possible in each rule, as it's a big win.
-
-Note that start-conditions names are really integer values
-and can be stored as such. Thus, the above could be
-extended in the following fashion:
-
-@example
-%x comment foo
-%%
- int line_num = 1;
- int comment_caller;
-
-"/*" @{
- comment_caller = INITIAL;
- BEGIN(comment);
- @}
-
-@dots{}
-
-<foo>"/*" @{
- comment_caller = foo;
- BEGIN(comment);
- @}
-
-<comment>[^*\n]* /* eat anything that's not a '*' */
-<comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */
-<comment>\n ++line_num;
-<comment>"*"+"/" BEGIN(comment_caller);
-@end example
-
-Furthermore, you can access the current start condition
-using the integer-valued @code{YY_START} macro. For example, the
-above assignments to @code{comment_caller} could instead be
-written
-
-@example
-comment_caller = YY_START;
-@end example
-
-Flex provides @code{YYSTATE} as an alias for @code{YY_START} (since that
-is what's used by AT&T @code{lex}).
-
-Note that start conditions do not have their own
-name-space; %s's and %x's declare names in the same fashion as
-#define's.
-
-Finally, here's an example of how to match C-style quoted
-strings using exclusive start conditions, including
-expanded escape sequences (but not including checking for
-a string that's too long):
-
-@example
-%x str
-
-%%
- char string_buf[MAX_STR_CONST];
- char *string_buf_ptr;
-
-\" string_buf_ptr = string_buf; BEGIN(str);
-
-<str>\" @{ /* saw closing quote - all done */
- BEGIN(INITIAL);
- *string_buf_ptr = '\0';
- /* return string constant token type and
- * value to parser
- */
- @}
-
-<str>\n @{
- /* error - unterminated string constant */
- /* generate error message */
- @}
-
-<str>\\[0-7]@{1,3@} @{
- /* octal escape sequence */
- int result;
-
- (void) sscanf( yytext + 1, "%o", &result );
-
- if ( result > 0xff )
- /* error, constant is out-of-bounds */
-
- *string_buf_ptr++ = result;
- @}
-
-<str>\\[0-9]+ @{
- /* generate error - bad escape sequence; something
- * like '\48' or '\0777777'
- */
- @}
-
-<str>\\n *string_buf_ptr++ = '\n';
-<str>\\t *string_buf_ptr++ = '\t';
-<str>\\r *string_buf_ptr++ = '\r';
-<str>\\b *string_buf_ptr++ = '\b';
-<str>\\f *string_buf_ptr++ = '\f';
-
-<str>\\(.|\n) *string_buf_ptr++ = yytext[1];
-
-<str>[^\\\n\"]+ @{
- char *yptr = yytext;
-
- while ( *yptr )
- *string_buf_ptr++ = *yptr++;
- @}
-@end example
-
-Often, such as in some of the examples above, you wind up
-writing a whole bunch of rules all preceded by the same
-start condition(s). Flex makes this a little easier and
-cleaner by introducing a notion of start condition @dfn{scope}.
-A start condition scope is begun with:
-
-@example
-<SCs>@{
-@end example
-
-@noindent
-where SCs is a list of one or more start conditions.
-Inside the start condition scope, every rule automatically
-has the prefix @samp{<SCs>} applied to it, until a @samp{@}} which
-matches the initial @samp{@{}. So, for example,
-
-@example
-<ESC>@{
- "\\n" return '\n';
- "\\r" return '\r';
- "\\f" return '\f';
- "\\0" return '\0';
-@}
-@end example
-
-@noindent
-is equivalent to:
-
-@example
-<ESC>"\\n" return '\n';
-<ESC>"\\r" return '\r';
-<ESC>"\\f" return '\f';
-<ESC>"\\0" return '\0';
-@end example
-
-Start condition scopes may be nested.
-
-Three routines are available for manipulating stacks of
-start conditions:
-
-@table @samp
-@item void yy_push_state(int new_state)
-pushes the current start condition onto the top of
-the start condition stack and switches to @var{new_state}
-as though you had used @samp{BEGIN new_state} (recall that
-start condition names are also integers).
-
-@item void yy_pop_state()
-pops the top of the stack and switches to it via
-@code{BEGIN}.
-
-@item int yy_top_state()
-returns the top of the stack without altering the
-stack's contents.
-@end table
-
-The start condition stack grows dynamically and so has no
-built-in size limitation. If memory is exhausted, program
-execution aborts.
-
-To use start condition stacks, your scanner must include a
-@samp{%option stack} directive (see Options below).
-
-@node Multiple buffers, End-of-file rules, Start conditions, Top
-@section Multiple input buffers
-
-Some scanners (such as those which support "include"
-files) require reading from several input streams. As
-@code{flex} scanners do a large amount of buffering, one cannot
-control where the next input will be read from by simply
-writing a @code{YY_INPUT} which is sensitive to the scanning
-context. @code{YY_INPUT} is only called when the scanner reaches
-the end of its buffer, which may be a long time after
-scanning a statement such as an "include" which requires
-switching the input source.
-
-To negotiate these sorts of problems, @code{flex} provides a
-mechanism for creating and switching between multiple
-input buffers. An input buffer is created by using:
-
-@example
-YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
-@end example
-
-@noindent
-which takes a @code{FILE} pointer and a size and creates a buffer
-associated with the given file and large enough to hold
-@var{size} characters (when in doubt, use @code{YY_BUF_SIZE} for the
-size). It returns a @code{YY_BUFFER_STATE} handle, which may
-then be passed to other routines (see below). The
-@code{YY_BUFFER_STATE} type is a pointer to an opaque @code{struct}
-@code{yy_buffer_state} structure, so you may safely initialize
-YY_BUFFER_STATE variables to @samp{((YY_BUFFER_STATE) 0)} if you
-wish, and also refer to the opaque structure in order to
-correctly declare input buffers in source files other than
-that of your scanner. Note that the @code{FILE} pointer in the
-call to @code{yy_create_buffer} is only used as the value of @code{yyin}
-seen by @code{YY_INPUT}; if you redefine @code{YY_INPUT} so it no longer
-uses @code{yyin}, then you can safely pass a nil @code{FILE} pointer to
-@code{yy_create_buffer}. You select a particular buffer to scan
-from using:
-
-@example
-void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
-@end example
-
-switches the scanner's input buffer so subsequent tokens
-will come from @var{new_buffer}. Note that
-@samp{yy_switch_to_buffer()} may be used by @samp{yywrap()} to set
-things up for continued scanning, instead of opening a new
-file and pointing @code{yyin} at it. Note also that switching
-input sources via either @samp{yy_switch_to_buffer()} or @samp{yywrap()}
-does @emph{not} change the start condition.
-
-@example
-void yy_delete_buffer( YY_BUFFER_STATE buffer )
-@end example
-
-@noindent
-is used to reclaim the storage associated with a buffer.
-You can also clear the current contents of a buffer using:
-
-@example
-void yy_flush_buffer( YY_BUFFER_STATE buffer )
-@end example
-
-This function discards the buffer's contents, so the next time the
-scanner attempts to match a token from the buffer, it will first fill
-the buffer anew using @code{YY_INPUT}.
-
-@samp{yy_new_buffer()} is an alias for @samp{yy_create_buffer()},
-provided for compatibility with the C++ use of @code{new} and @code{delete}
-for creating and destroying dynamic objects.
-
-Finally, the @code{YY_CURRENT_BUFFER} macro returns a
-@code{YY_BUFFER_STATE} handle to the current buffer.
-
-Here is an example of using these features for writing a
-scanner which expands include files (the @samp{<<EOF>>} feature
-is discussed below):
-
-@example
-/* the "incl" state is used for picking up the name
- * of an include file
- */
-%x incl
-
-%@{
-#define MAX_INCLUDE_DEPTH 10
-YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
-int include_stack_ptr = 0;
-%@}
-
-%%
-include BEGIN(incl);
-
-[a-z]+ ECHO;
-[^a-z\n]*\n? ECHO;
-
-<incl>[ \t]* /* eat the whitespace */
-<incl>[^ \t\n]+ @{ /* got the include file name */
- if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
- @{
- fprintf( stderr, "Includes nested too deeply" );
- exit( 1 );
- @}
-
- include_stack[include_stack_ptr++] =
- YY_CURRENT_BUFFER;
-
- yyin = fopen( yytext, "r" );
-
- if ( ! yyin )
- error( @dots{} );
-
- yy_switch_to_buffer(
- yy_create_buffer( yyin, YY_BUF_SIZE ) );
-
- BEGIN(INITIAL);
- @}
-
-<<EOF>> @{
- if ( --include_stack_ptr < 0 )
- @{
- yyterminate();
- @}
-
- else
- @{
- yy_delete_buffer( YY_CURRENT_BUFFER );
- yy_switch_to_buffer(
- include_stack[include_stack_ptr] );
- @}
- @}
-@end example
-
-Three routines are available for setting up input buffers
-for scanning in-memory strings instead of files. All of
-them create a new input buffer for scanning the string,
-and return a corresponding @code{YY_BUFFER_STATE} handle (which
-you should delete with @samp{yy_delete_buffer()} when done with
-it). They also switch to the new buffer using
-@samp{yy_switch_to_buffer()}, so the next call to @samp{yylex()} will
-start scanning the string.
-
-@table @samp
-@item yy_scan_string(const char *str)
-scans a NUL-terminated string.
-
-@item yy_scan_bytes(const char *bytes, int len)
-scans @code{len} bytes (including possibly NUL's) starting
-at location @var{bytes}.
-@end table
-
-Note that both of these functions create and scan a @emph{copy}
-of the string or bytes. (This may be desirable, since
-@samp{yylex()} modifies the contents of the buffer it is
-scanning.) You can avoid the copy by using:
-
-@table @samp
-@item yy_scan_buffer(char *base, yy_size_t size)
-which scans in place the buffer starting at @var{base},
-consisting of @var{size} bytes, the last two bytes of
-which @emph{must} be @code{YY_END_OF_BUFFER_CHAR} (ASCII NUL).
-These last two bytes are not scanned; thus,
-scanning consists of @samp{base[0]} through @samp{base[size-2]},
-inclusive.
-
-If you fail to set up @var{base} in this manner (i.e.,
-forget the final two @code{YY_END_OF_BUFFER_CHAR} bytes),
-then @samp{yy_scan_buffer()} returns a nil pointer instead
-of creating a new input buffer.
-
-The type @code{yy_size_t} is an integral type to which you
-can cast an integer expression reflecting the size
-of the buffer.
-@end table
-
-@node End-of-file rules, Miscellaneous, Multiple buffers, Top
-@section End-of-file rules
-
-The special rule "<<EOF>>" indicates actions which are to
-be taken when an end-of-file is encountered and yywrap()
-returns non-zero (i.e., indicates no further files to
-process). The action must finish by doing one of four
-things:
-
-@itemize -
-@item
-assigning @code{yyin} to a new input file (in previous
-versions of flex, after doing the assignment you
-had to call the special action @code{YY_NEW_FILE}; this is
-no longer necessary);
-
-@item
-executing a @code{return} statement;
-
-@item
-executing the special @samp{yyterminate()} action;
-
-@item
-or, switching to a new buffer using
-@samp{yy_switch_to_buffer()} as shown in the example
-above.
-@end itemize
-
-<<EOF>> rules may not be used with other patterns; they
-may only be qualified with a list of start conditions. If
-an unqualified <<EOF>> rule is given, it applies to @emph{all}
-start conditions which do not already have <<EOF>>
-actions. To specify an <<EOF>> rule for only the initial
-start condition, use
-
-@example
-<INITIAL><<EOF>>
-@end example
-
-These rules are useful for catching things like unclosed
-comments. An example:
-
-@example
-%x quote
-%%
-
-@dots{}other rules for dealing with quotes@dots{}
-
-<quote><<EOF>> @{
- error( "unterminated quote" );
- yyterminate();
- @}
-<<EOF>> @{
- if ( *++filelist )
- yyin = fopen( *filelist, "r" );
- else
- yyterminate();
- @}
-@end example
-
-@node Miscellaneous, User variables, End-of-file rules, Top
-@section Miscellaneous macros
-
-The macro @code{YY_USER_ACTION} can be defined to provide an
-action which is always executed prior to the matched
-rule's action. For example, it could be #define'd to call
-a routine to convert yytext to lower-case. When
-@code{YY_USER_ACTION} is invoked, the variable @code{yy_act} gives the
-number of the matched rule (rules are numbered starting
-with 1). Suppose you want to profile how often each of
-your rules is matched. The following would do the trick:
-
-@example
-#define YY_USER_ACTION ++ctr[yy_act]
-@end example
-
-where @code{ctr} is an array to hold the counts for the different
-rules. Note that the macro @code{YY_NUM_RULES} gives the total number
-of rules (including the default rule, even if you use @samp{-s}, so
-a correct declaration for @code{ctr} is:
-
-@example
-int ctr[YY_NUM_RULES];
-@end example
-
-The macro @code{YY_USER_INIT} may be defined to provide an action
-which is always executed before the first scan (and before
-the scanner's internal initializations are done). For
-example, it could be used to call a routine to read in a
-data table or open a logging file.
-
-The macro @samp{yy_set_interactive(is_interactive)} can be used
-to control whether the current buffer is considered
-@emph{interactive}. An interactive buffer is processed more slowly,
-but must be used when the scanner's input source is indeed
-interactive to avoid problems due to waiting to fill
-buffers (see the discussion of the @samp{-I} flag below). A
-non-zero value in the macro invocation marks the buffer as
-interactive, a zero value as non-interactive. Note that
-use of this macro overrides @samp{%option always-interactive} or
-@samp{%option never-interactive} (see Options below).
-@samp{yy_set_interactive()} must be invoked prior to beginning to
-scan the buffer that is (or is not) to be considered
-interactive.
-
-The macro @samp{yy_set_bol(at_bol)} can be used to control
-whether the current buffer's scanning context for the next
-token match is done as though at the beginning of a line.
-A non-zero macro argument makes rules anchored with
-
-The macro @samp{YY_AT_BOL()} returns true if the next token
-scanned from the current buffer will have '^' rules
-active, false otherwise.
-
-In the generated scanner, the actions are all gathered in
-one large switch statement and separated using @code{YY_BREAK},
-which may be redefined. By default, it is simply a
-"break", to separate each rule's action from the following
-rule's. Redefining @code{YY_BREAK} allows, for example, C++
-users to #define YY_BREAK to do nothing (while being very
-careful that every rule ends with a "break" or a
-"return"!) to avoid suffering from unreachable statement
-warnings where because a rule's action ends with "return",
-the @code{YY_BREAK} is inaccessible.
-
-@node User variables, YACC interface, Miscellaneous, Top
-@section Values available to the user
-
-This section summarizes the various values available to
-the user in the rule actions.
-
-@itemize -
-@item
-@samp{char *yytext} holds the text of the current token.
-It may be modified but not lengthened (you cannot
-append characters to the end).
-
-If the special directive @samp{%array} appears in the
-first section of the scanner description, then
-@code{yytext} is instead declared @samp{char yytext[YYLMAX]},
-where @code{YYLMAX} is a macro definition that you can
-redefine in the first section if you don't like the
-default value (generally 8KB). Using @samp{%array}
-results in somewhat slower scanners, but the value
-of @code{yytext} becomes immune to calls to @samp{input()} and
-@samp{unput()}, which potentially destroy its value when
-@code{yytext} is a character pointer. The opposite of
-@samp{%array} is @samp{%pointer}, which is the default.
-
-You cannot use @samp{%array} when generating C++ scanner
-classes (the @samp{-+} flag).
-
-@item
-@samp{int yyleng} holds the length of the current token.
-
-@item
-@samp{FILE *yyin} is the file which by default @code{flex} reads
-from. It may be redefined but doing so only makes
-sense before scanning begins or after an EOF has
-been encountered. Changing it in the midst of
-scanning will have unexpected results since @code{flex}
-buffers its input; use @samp{yyrestart()} instead. Once
-scanning terminates because an end-of-file has been
-seen, you can assign @code{yyin} at the new input file and
-then call the scanner again to continue scanning.
-
-@item
-@samp{void yyrestart( FILE *new_file )} may be called to
-point @code{yyin} at the new input file. The switch-over
-to the new file is immediate (any previously
-buffered-up input is lost). Note that calling
-@samp{yyrestart()} with @code{yyin} as an argument thus throws
-away the current input buffer and continues
-scanning the same input file.
-
-@item
-@samp{FILE *yyout} is the file to which @samp{ECHO} actions are
-done. It can be reassigned by the user.
-
-@item
-@code{YY_CURRENT_BUFFER} returns a @code{YY_BUFFER_STATE} handle
-to the current buffer.
-
-@item
-@code{YY_START} returns an integer value corresponding to
-the current start condition. You can subsequently
-use this value with @code{BEGIN} to return to that start
-condition.
-@end itemize
-
-@node YACC interface, Options, User variables, Top
-@section Interfacing with @code{yacc}
-
-One of the main uses of @code{flex} is as a companion to the @code{yacc}
-parser-generator. @code{yacc} parsers expect to call a routine
-named @samp{yylex()} to find the next input token. The routine
-is supposed to return the type of the next token as well
-as putting any associated value in the global @code{yylval}. To
-use @code{flex} with @code{yacc}, one specifies the @samp{-d} option to @code{yacc} to
-instruct it to generate the file @file{y.tab.h} containing
-definitions of all the @samp{%tokens} appearing in the @code{yacc} input.
-This file is then included in the @code{flex} scanner. For
-example, if one of the tokens is "TOK_NUMBER", part of the
-scanner might look like:
-
-@example
-%@{
-#include "y.tab.h"
-%@}
-
-%%
-
-[0-9]+ yylval = atoi( yytext ); return TOK_NUMBER;
-@end example
-
-@node Options, Performance, YACC interface, Top
-@section Options
-@code{flex} has the following options:
-
-@table @samp
-@item -b
-Generate backing-up information to @file{lex.backup}.
-This is a list of scanner states which require
-backing up and the input characters on which they
-do so. By adding rules one can remove backing-up
-states. If @emph{all} backing-up states are eliminated
-and @samp{-Cf} or @samp{-CF} is used, the generated scanner will
-run faster (see the @samp{-p} flag). Only users who wish
-to squeeze every last cycle out of their scanners
-need worry about this option. (See the section on
-Performance Considerations below.)
-
-@item -c
-is a do-nothing, deprecated option included for
-POSIX compliance.
-
-@item -d
-makes the generated scanner run in @dfn{debug} mode.
-Whenever a pattern is recognized and the global
-@code{yy_flex_debug} is non-zero (which is the default),
-the scanner will write to @code{stderr} a line of the
-form:
-
-@example
---accepting rule at line 53 ("the matched text")
-@end example
-
-The line number refers to the location of the rule
-in the file defining the scanner (i.e., the file
-that was fed to flex). Messages are also generated
-when the scanner backs up, accepts the default
-rule, reaches the end of its input buffer (or
-encounters a NUL; at this point, the two look the
-same as far as the scanner's concerned), or reaches
-an end-of-file.
-
-@item -f
-specifies @dfn{fast scanner}. No table compression is
-done and stdio is bypassed. The result is large
-but fast. This option is equivalent to @samp{-Cfr} (see
-below).
-
-@item -h
-generates a "help" summary of @code{flex's} options to
-@code{stdout} and then exits. @samp{-?} and @samp{--help} are synonyms
-for @samp{-h}.
-
-@item -i
-instructs @code{flex} to generate a @emph{case-insensitive}
-scanner. The case of letters given in the @code{flex} input
-patterns will be ignored, and tokens in the input
-will be matched regardless of case. The matched
-text given in @code{yytext} will have the preserved case
-(i.e., it will not be folded).
-
-@item -l
-turns on maximum compatibility with the original
-AT&T @code{lex} implementation. Note that this does not
-mean @emph{full} compatibility. Use of this option costs
-a considerable amount of performance, and it cannot
-be used with the @samp{-+, -f, -F, -Cf}, or @samp{-CF} options.
-For details on the compatibilities it provides, see
-the section "Incompatibilities With Lex And POSIX"
-below. This option also results in the name
-@code{YY_FLEX_LEX_COMPAT} being #define'd in the generated
-scanner.
-
-@item -n
-is another do-nothing, deprecated option included
-only for POSIX compliance.
-
-@item -p
-generates a performance report to stderr. The
-report consists of comments regarding features of
-the @code{flex} input file which will cause a serious loss
-of performance in the resulting scanner. If you
-give the flag twice, you will also get comments
-regarding features that lead to minor performance
-losses.
-
-Note that the use of @code{REJECT}, @samp{%option yylineno} and
-variable trailing context (see the Deficiencies / Bugs section below)
-entails a substantial performance penalty; use of @samp{yymore()},
-the @samp{^} operator, and the @samp{-I} flag entail minor performance
-penalties.
-
-@item -s
-causes the @dfn{default rule} (that unmatched scanner
-input is echoed to @code{stdout}) to be suppressed. If
-the scanner encounters input that does not match
-any of its rules, it aborts with an error. This
-option is useful for finding holes in a scanner's
-rule set.
-
-@item -t
-instructs @code{flex} to write the scanner it generates to
-standard output instead of @file{lex.yy.c}.
-
-@item -v
-specifies that @code{flex} should write to @code{stderr} a
-summary of statistics regarding the scanner it
-generates. Most of the statistics are meaningless to
-the casual @code{flex} user, but the first line identifies
-the version of @code{flex} (same as reported by @samp{-V}), and
-the next line the flags used when generating the
-scanner, including those that are on by default.
-
-@item -w
-suppresses warning messages.
-
-@item -B
-instructs @code{flex} to generate a @emph{batch} scanner, the
-opposite of @emph{interactive} scanners generated by @samp{-I}
-(see below). In general, you use @samp{-B} when you are
-@emph{certain} that your scanner will never be used
-interactively, and you want to squeeze a @emph{little} more
-performance out of it. If your goal is instead to
-squeeze out a @emph{lot} more performance, you should be
-using the @samp{-Cf} or @samp{-CF} options (discussed below),
-which turn on @samp{-B} automatically anyway.
-
-@item -F
-specifies that the @dfn{fast} scanner table
-representation should be used (and stdio bypassed). This
-representation is about as fast as the full table
-representation @samp{(-f)}, and for some sets of patterns
-will be considerably smaller (and for others,
-larger). In general, if the pattern set contains
-both "keywords" and a catch-all, "identifier" rule,
-such as in the set:
-
-@example
-"case" return TOK_CASE;
-"switch" return TOK_SWITCH;
-...
-"default" return TOK_DEFAULT;
-[a-z]+ return TOK_ID;
-@end example
-
-@noindent
-then you're better off using the full table
-representation. If only the "identifier" rule is
-present and you then use a hash table or some such to
-detect the keywords, you're better off using @samp{-F}.
-
-This option is equivalent to @samp{-CFr} (see below). It
-cannot be used with @samp{-+}.
-
-@item -I
-instructs @code{flex} to generate an @emph{interactive} scanner.
-An interactive scanner is one that only looks ahead
-to decide what token has been matched if it
-absolutely must. It turns out that always looking one
-extra character ahead, even if the scanner has
-already seen enough text to disambiguate the
-current token, is a bit faster than only looking ahead
-when necessary. But scanners that always look
-ahead give dreadful interactive performance; for
-example, when a user types a newline, it is not
-recognized as a newline token until they enter
-@emph{another} token, which often means typing in another
-whole line.
-
-@code{Flex} scanners default to @emph{interactive} unless you use
-the @samp{-Cf} or @samp{-CF} table-compression options (see
-below). That's because if you're looking for
-high-performance you should be using one of these
-options, so if you didn't, @code{flex} assumes you'd
-rather trade off a bit of run-time performance for
-intuitive interactive behavior. Note also that you
-@emph{cannot} use @samp{-I} in conjunction with @samp{-Cf} or @samp{-CF}.
-Thus, this option is not really needed; it is on by
-default for all those cases in which it is allowed.
-
-You can force a scanner to @emph{not} be interactive by
-using @samp{-B} (see above).
-
-@item -L
-instructs @code{flex} not to generate @samp{#line} directives.
-Without this option, @code{flex} peppers the generated
-scanner with #line directives so error messages in
-the actions will be correctly located with respect
-to either the original @code{flex} input file (if the
-errors are due to code in the input file), or
-@file{lex.yy.c} (if the errors are @code{flex's} fault -- you
-should report these sorts of errors to the email
-address given below).
-
-@item -T
-makes @code{flex} run in @code{trace} mode. It will generate a
-lot of messages to @code{stderr} concerning the form of
-the input and the resultant non-deterministic and
-deterministic finite automata. This option is
-mostly for use in maintaining @code{flex}.
-
-@item -V
-prints the version number to @code{stdout} and exits.
-@samp{--version} is a synonym for @samp{-V}.
-
-@item -7
-instructs @code{flex} to generate a 7-bit scanner, i.e.,
-one which can only recognized 7-bit characters in
-its input. The advantage of using @samp{-7} is that the
-scanner's tables can be up to half the size of
-those generated using the @samp{-8} option (see below).
-The disadvantage is that such scanners often hang
-or crash if their input contains an 8-bit
-character.
-
-Note, however, that unless you generate your
-scanner using the @samp{-Cf} or @samp{-CF} table compression options,
-use of @samp{-7} will save only a small amount of table
-space, and make your scanner considerably less
-portable. @code{Flex's} default behavior is to generate
-an 8-bit scanner unless you use the @samp{-Cf} or @samp{-CF}, in
-which case @code{flex} defaults to generating 7-bit
-scanners unless your site was always configured to
-generate 8-bit scanners (as will often be the case
-with non-USA sites). You can tell whether flex
-generated a 7-bit or an 8-bit scanner by inspecting
-the flag summary in the @samp{-v} output as described
-above.
-
-Note that if you use @samp{-Cfe} or @samp{-CFe} (those table
-compression options, but also using equivalence
-classes as discussed see below), flex still
-defaults to generating an 8-bit scanner, since
-usually with these compression options full 8-bit
-tables are not much more expensive than 7-bit
-tables.
-
-@item -8
-instructs @code{flex} to generate an 8-bit scanner, i.e.,
-one which can recognize 8-bit characters. This
-flag is only needed for scanners generated using
-@samp{-Cf} or @samp{-CF}, as otherwise flex defaults to
-generating an 8-bit scanner anyway.
-
-See the discussion of @samp{-7} above for flex's default
-behavior and the tradeoffs between 7-bit and 8-bit
-scanners.
-
-@item -+
-specifies that you want flex to generate a C++
-scanner class. See the section on Generating C++
-Scanners below for details.
-
-@item -C[aefFmr]
-controls the degree of table compression and, more
-generally, trade-offs between small scanners and
-fast scanners.
-
-@samp{-Ca} ("align") instructs flex to trade off larger
-tables in the generated scanner for faster
-performance because the elements of the tables are better
-aligned for memory access and computation. On some
-RISC architectures, fetching and manipulating
-long-words is more efficient than with smaller-sized
-units such as shortwords. This option can double
-the size of the tables used by your scanner.
-
-@samp{-Ce} directs @code{flex} to construct @dfn{equivalence classes},
-i.e., sets of characters which have identical
-lexical properties (for example, if the only appearance
-of digits in the @code{flex} input is in the character
-class "[0-9]" then the digits '0', '1', @dots{}, '9'
-will all be put in the same equivalence class).
-Equivalence classes usually give dramatic
-reductions in the final table/object file sizes
-(typically a factor of 2-5) and are pretty cheap
-performance-wise (one array look-up per character
-scanned).
-
-@samp{-Cf} specifies that the @emph{full} scanner tables should
-be generated - @code{flex} should not compress the tables
-by taking advantages of similar transition
-functions for different states.
-
-@samp{-CF} specifies that the alternate fast scanner
-representation (described above under the @samp{-F} flag)
-should be used. This option cannot be used with
-@samp{-+}.
-
-@samp{-Cm} directs @code{flex} to construct @dfn{meta-equivalence
-classes}, which are sets of equivalence classes (or
-characters, if equivalence classes are not being
-used) that are commonly used together.
-Meta-equivalence classes are often a big win when using
-compressed tables, but they have a moderate
-performance impact (one or two "if" tests and one array
-look-up per character scanned).
-
-@samp{-Cr} causes the generated scanner to @emph{bypass} use of
-the standard I/O library (stdio) for input.
-Instead of calling @samp{fread()} or @samp{getc()}, the scanner
-will use the @samp{read()} system call, resulting in a
-performance gain which varies from system to
-system, but in general is probably negligible unless
-you are also using @samp{-Cf} or @samp{-CF}. Using @samp{-Cr} can cause
-strange behavior if, for example, you read from
-@code{yyin} using stdio prior to calling the scanner
-(because the scanner will miss whatever text your
-previous reads left in the stdio input buffer).
-
-@samp{-Cr} has no effect if you define @code{YY_INPUT} (see The
-Generated Scanner above).
-
-A lone @samp{-C} specifies that the scanner tables should
-be compressed but neither equivalence classes nor
-meta-equivalence classes should be used.
-
-The options @samp{-Cf} or @samp{-CF} and @samp{-Cm} do not make sense
-together - there is no opportunity for
-meta-equivalence classes if the table is not being
-compressed. Otherwise the options may be freely
-mixed, and are cumulative.
-
-The default setting is @samp{-Cem}, which specifies that
-@code{flex} should generate equivalence classes and
-meta-equivalence classes. This setting provides the
-highest degree of table compression. You can trade
-off faster-executing scanners at the cost of larger
-tables with the following generally being true:
-
-@example
-slowest & smallest
- -Cem
- -Cm
- -Ce
- -C
- -C@{f,F@}e
- -C@{f,F@}
- -C@{f,F@}a
-fastest & largest
-@end example
-
-Note that scanners with the smallest tables are
-usually generated and compiled the quickest, so
-during development you will usually want to use the
-default, maximal compression.
-
-@samp{-Cfe} is often a good compromise between speed and
-size for production scanners.
-
-@item -ooutput
-directs flex to write the scanner to the file @samp{out-}
-@code{put} instead of @file{lex.yy.c}. If you combine @samp{-o} with
-the @samp{-t} option, then the scanner is written to
-@code{stdout} but its @samp{#line} directives (see the @samp{-L} option
-above) refer to the file @code{output}.
-
-@item -Pprefix
-changes the default @samp{yy} prefix used by @code{flex} for all
-globally-visible variable and function names to
-instead be @var{prefix}. For example, @samp{-Pfoo} changes the
-name of @code{yytext} to @file{footext}. It also changes the
-name of the default output file from @file{lex.yy.c} to
-@file{lex.foo.c}. Here are all of the names affected:
-
-@example
-yy_create_buffer
-yy_delete_buffer
-yy_flex_debug
-yy_init_buffer
-yy_flush_buffer
-yy_load_buffer_state
-yy_switch_to_buffer
-yyin
-yyleng
-yylex
-yylineno
-yyout
-yyrestart
-yytext
-yywrap
-@end example
-
-(If you are using a C++ scanner, then only @code{yywrap}
-and @code{yyFlexLexer} are affected.) Within your scanner
-itself, you can still refer to the global variables
-and functions using either version of their name;
-but externally, they have the modified name.
-
-This option lets you easily link together multiple
-@code{flex} programs into the same executable. Note,
-though, that using this option also renames
-@samp{yywrap()}, so you now @emph{must} either provide your own
-(appropriately-named) version of the routine for
-your scanner, or use @samp{%option noyywrap}, as linking
-with @samp{-lfl} no longer provides one for you by
-default.
-
-@item -Sskeleton_file
-overrides the default skeleton file from which @code{flex}
-constructs its scanners. You'll never need this
-option unless you are doing @code{flex} maintenance or
-development.
-@end table
-
-@code{flex} also provides a mechanism for controlling options
-within the scanner specification itself, rather than from
-the flex command-line. This is done by including @samp{%option}
-directives in the first section of the scanner
-specification. You can specify multiple options with a single
-@samp{%option} directive, and multiple directives in the first
-section of your flex input file. Most options are given
-simply as names, optionally preceded by the word "no"
-(with no intervening whitespace) to negate their meaning.
-A number are equivalent to flex flags or their negation:
-
-@example
-7bit -7 option
-8bit -8 option
-align -Ca option
-backup -b option
-batch -B option
-c++ -+ option
-
-caseful or
-case-sensitive opposite of -i (default)
-
-case-insensitive or
-caseless -i option
-
-debug -d option
-default opposite of -s option
-ecs -Ce option
-fast -F option
-full -f option
-interactive -I option
-lex-compat -l option
-meta-ecs -Cm option
-perf-report -p option
-read -Cr option
-stdout -t option
-verbose -v option
-warn opposite of -w option
- (use "%option nowarn" for -w)
-
-array equivalent to "%array"
-pointer equivalent to "%pointer" (default)
-@end example
-
-Some @samp{%option's} provide features otherwise not available:
-
-@table @samp
-@item always-interactive
-instructs flex to generate a scanner which always
-considers its input "interactive". Normally, on
-each new input file the scanner calls @samp{isatty()} in
-an attempt to determine whether the scanner's input
-source is interactive and thus should be read a
-character at a time. When this option is used,
-however, then no such call is made.
-
-@item main
-directs flex to provide a default @samp{main()} program
-for the scanner, which simply calls @samp{yylex()}. This
-option implies @code{noyywrap} (see below).
-
-@item never-interactive
-instructs flex to generate a scanner which never
-considers its input "interactive" (again, no call
-made to @samp{isatty())}. This is the opposite of @samp{always-}
-@emph{interactive}.
-
-@item stack
-enables the use of start condition stacks (see
-Start Conditions above).
-
-@item stdinit
-if unset (i.e., @samp{%option nostdinit}) initializes @code{yyin}
-and @code{yyout} to nil @code{FILE} pointers, instead of @code{stdin}
-and @code{stdout}.
-
-@item yylineno
-directs @code{flex} to generate a scanner that maintains the number
-of the current line read from its input in the global variable
-@code{yylineno}. This option is implied by @samp{%option lex-compat}.
-
-@item yywrap
-if unset (i.e., @samp{%option noyywrap}), makes the
-scanner not call @samp{yywrap()} upon an end-of-file, but
-simply assume that there are no more files to scan
-(until the user points @code{yyin} at a new file and calls
-@samp{yylex()} again).
-@end table
-
-@code{flex} scans your rule actions to determine whether you use
-the @code{REJECT} or @samp{yymore()} features. The @code{reject} and @code{yymore}
-options are available to override its decision as to
-whether you use the options, either by setting them (e.g.,
-@samp{%option reject}) to indicate the feature is indeed used, or
-unsetting them to indicate it actually is not used (e.g.,
-@samp{%option noyymore}).
-
-Three options take string-delimited values, offset with '=':
-
-@example
-%option outfile="ABC"
-@end example
-
-@noindent
-is equivalent to @samp{-oABC}, and
-
-@example
-%option prefix="XYZ"
-@end example
-
-@noindent
-is equivalent to @samp{-PXYZ}.
-
-Finally,
-
-@example
-%option yyclass="foo"
-@end example
-
-@noindent
-only applies when generating a C++ scanner (@samp{-+} option). It
-informs @code{flex} that you have derived @samp{foo} as a subclass of
-@code{yyFlexLexer} so @code{flex} will place your actions in the member
-function @samp{foo::yylex()} instead of @samp{yyFlexLexer::yylex()}.
-It also generates a @samp{yyFlexLexer::yylex()} member function that
-emits a run-time error (by invoking @samp{yyFlexLexer::LexerError()})
-if called. See Generating C++ Scanners, below, for additional
-information.
-
-A number of options are available for lint purists who
-want to suppress the appearance of unneeded routines in
-the generated scanner. Each of the following, if unset,
-results in the corresponding routine not appearing in the
-generated scanner:
-
-@example
-input, unput
-yy_push_state, yy_pop_state, yy_top_state
-yy_scan_buffer, yy_scan_bytes, yy_scan_string
-@end example
-
-@noindent
-(though @samp{yy_push_state()} and friends won't appear anyway
-unless you use @samp{%option stack}).
-
-@node Performance, C++, Options, Top
-@section Performance considerations
-
-The main design goal of @code{flex} is that it generate
-high-performance scanners. It has been optimized for dealing
-well with large sets of rules. Aside from the effects on
-scanner speed of the table compression @samp{-C} options outlined
-above, there are a number of options/actions which degrade
-performance. These are, from most expensive to least:
-
-@example
-REJECT
-%option yylineno
-arbitrary trailing context
-
-pattern sets that require backing up
-%array
-%option interactive
-%option always-interactive
-
-'^' beginning-of-line operator
-yymore()
-@end example
-
-with the first three all being quite expensive and the
-last two being quite cheap. Note also that @samp{unput()} is
-implemented as a routine call that potentially does quite
-a bit of work, while @samp{yyless()} is a quite-cheap macro; so
-if just putting back some excess text you scanned, use
-@samp{yyless()}.
-
-@code{REJECT} should be avoided at all costs when performance is
-important. It is a particularly expensive option.
-
-Getting rid of backing up is messy and often may be an
-enormous amount of work for a complicated scanner. In
-principal, one begins by using the @samp{-b} flag to generate a
-@file{lex.backup} file. For example, on the input
-
-@example
-%%
-foo return TOK_KEYWORD;
-foobar return TOK_KEYWORD;
-@end example
-
-@noindent
-the file looks like:
-
-@example
-State #6 is non-accepting -
- associated rule line numbers:
- 2 3
- out-transitions: [ o ]
- jam-transitions: EOF [ \001-n p-\177 ]
-
-State #8 is non-accepting -
- associated rule line numbers:
- 3
- out-transitions: [ a ]
- jam-transitions: EOF [ \001-` b-\177 ]
-
-State #9 is non-accepting -
- associated rule line numbers:
- 3
- out-transitions: [ r ]
- jam-transitions: EOF [ \001-q s-\177 ]
-
-Compressed tables always back up.
-@end example
-
-The first few lines tell us that there's a scanner state
-in which it can make a transition on an 'o' but not on any
-other character, and that in that state the currently
-scanned text does not match any rule. The state occurs
-when trying to match the rules found at lines 2 and 3 in
-the input file. If the scanner is in that state and then
-reads something other than an 'o', it will have to back up
-to find a rule which is matched. With a bit of
-head-scratching one can see that this must be the state it's in
-when it has seen "fo". When this has happened, if
-anything other than another 'o' is seen, the scanner will
-have to back up to simply match the 'f' (by the default
-rule).
-
-The comment regarding State #8 indicates there's a problem
-when "foob" has been scanned. Indeed, on any character
-other than an 'a', the scanner will have to back up to
-accept "foo". Similarly, the comment for State #9
-concerns when "fooba" has been scanned and an 'r' does not
-follow.
-
-The final comment reminds us that there's no point going
-to all the trouble of removing backing up from the rules
-unless we're using @samp{-Cf} or @samp{-CF}, since there's no
-performance gain doing so with compressed scanners.
-
-The way to remove the backing up is to add "error" rules:
-
-@example
-%%
-foo return TOK_KEYWORD;
-foobar return TOK_KEYWORD;
-
-fooba |
-foob |
-fo @{
- /* false alarm, not really a keyword */
- return TOK_ID;
- @}
-@end example
-
-Eliminating backing up among a list of keywords can also
-be done using a "catch-all" rule:
-
-@example
-%%
-foo return TOK_KEYWORD;
-foobar return TOK_KEYWORD;
-
-[a-z]+ return TOK_ID;
-@end example
-
-This is usually the best solution when appropriate.
-
-Backing up messages tend to cascade. With a complicated
-set of rules it's not uncommon to get hundreds of
-messages. If one can decipher them, though, it often only
-takes a dozen or so rules to eliminate the backing up
-(though it's easy to make a mistake and have an error rule
-accidentally match a valid token. A possible future @code{flex}
-feature will be to automatically add rules to eliminate
-backing up).
-
-It's important to keep in mind that you gain the benefits
-of eliminating backing up only if you eliminate @emph{every}
-instance of backing up. Leaving just one means you gain
-nothing.
-
-@var{Variable} trailing context (where both the leading and
-trailing parts do not have a fixed length) entails almost
-the same performance loss as @code{REJECT} (i.e., substantial).
-So when possible a rule like:
-
-@example
-%%
-mouse|rat/(cat|dog) run();
-@end example
-
-@noindent
-is better written:
-
-@example
-%%
-mouse/cat|dog run();
-rat/cat|dog run();
-@end example
-
-@noindent
-or as
-
-@example
-%%
-mouse|rat/cat run();
-mouse|rat/dog run();
-@end example
-
-Note that here the special '|' action does @emph{not} provide any
-savings, and can even make things worse (see Deficiencies
-/ Bugs below).
-
-Another area where the user can increase a scanner's
-performance (and one that's easier to implement) arises from
-the fact that the longer the tokens matched, the faster
-the scanner will run. This is because with long tokens
-the processing of most input characters takes place in the
-(short) inner scanning loop, and does not often have to go
-through the additional work of setting up the scanning
-environment (e.g., @code{yytext}) for the action. Recall the
-scanner for C comments:
-
-@example
-%x comment
-%%
- int line_num = 1;
-
-"/*" BEGIN(comment);
-
-<comment>[^*\n]*
-<comment>"*"+[^*/\n]*
-<comment>\n ++line_num;
-<comment>"*"+"/" BEGIN(INITIAL);
-@end example
-
-This could be sped up by writing it as:
-
-@example
-%x comment
-%%
- int line_num = 1;
-
-"/*" BEGIN(comment);
-
-<comment>[^*\n]*
-<comment>[^*\n]*\n ++line_num;
-<comment>"*"+[^*/\n]*
-<comment>"*"+[^*/\n]*\n ++line_num;
-<comment>"*"+"/" BEGIN(INITIAL);
-@end example
-
-Now instead of each newline requiring the processing of
-another action, recognizing the newlines is "distributed"
-over the other rules to keep the matched text as long as
-possible. Note that @emph{adding} rules does @emph{not} slow down the
-scanner! The speed of the scanner is independent of the
-number of rules or (modulo the considerations given at the
-beginning of this section) how complicated the rules are
-with regard to operators such as '*' and '|'.
-
-A final example in speeding up a scanner: suppose you want
-to scan through a file containing identifiers and
-keywords, one per line and with no other extraneous
-characters, and recognize all the keywords. A natural first
-approach is:
-
-@example
-%%
-asm |
-auto |
-break |
-@dots{} etc @dots{}
-volatile |
-while /* it's a keyword */
-
-.|\n /* it's not a keyword */
-@end example
-
-To eliminate the back-tracking, introduce a catch-all
-rule:
-
-@example
-%%
-asm |
-auto |
-break |
-... etc ...
-volatile |
-while /* it's a keyword */
-
-[a-z]+ |
-.|\n /* it's not a keyword */
-@end example
-
-Now, if it's guaranteed that there's exactly one word per
-line, then we can reduce the total number of matches by a
-half by merging in the recognition of newlines with that
-of the other tokens:
-
-@example
-%%
-asm\n |
-auto\n |
-break\n |
-@dots{} etc @dots{}
-volatile\n |
-while\n /* it's a keyword */
-
-[a-z]+\n |
-.|\n /* it's not a keyword */
-@end example
-
-One has to be careful here, as we have now reintroduced
-backing up into the scanner. In particular, while @emph{we} know
-that there will never be any characters in the input
-stream other than letters or newlines, @code{flex} can't figure
-this out, and it will plan for possibly needing to back up
-when it has scanned a token like "auto" and then the next
-character is something other than a newline or a letter.
-Previously it would then just match the "auto" rule and be
-done, but now it has no "auto" rule, only a "auto\n" rule.
-To eliminate the possibility of backing up, we could
-either duplicate all rules but without final newlines, or,
-since we never expect to encounter such an input and
-therefore don't how it's classified, we can introduce one
-more catch-all rule, this one which doesn't include a
-newline:
-
-@example
-%%
-asm\n |
-auto\n |
-break\n |
-@dots{} etc @dots{}
-volatile\n |
-while\n /* it's a keyword */
-
-[a-z]+\n |
-[a-z]+ |
-.|\n /* it's not a keyword */
-@end example
-
-Compiled with @samp{-Cf}, this is about as fast as one can get a
-@code{flex} scanner to go for this particular problem.
-
-A final note: @code{flex} is slow when matching NUL's,
-particularly when a token contains multiple NUL's. It's best to
-write rules which match @emph{short} amounts of text if it's
-anticipated that the text will often include NUL's.
-
-Another final note regarding performance: as mentioned
-above in the section How the Input is Matched, dynamically
-resizing @code{yytext} to accommodate huge tokens is a slow
-process because it presently requires that the (huge) token
-be rescanned from the beginning. Thus if performance is
-vital, you should attempt to match "large" quantities of
-text but not "huge" quantities, where the cutoff between
-the two is at about 8K characters/token.
-
-@node C++, Incompatibilities, Performance, Top
-@section Generating C++ scanners
-
-@code{flex} provides two different ways to generate scanners for
-use with C++. The first way is to simply compile a
-scanner generated by @code{flex} using a C++ compiler instead of a C
-compiler. You should not encounter any compilations
-errors (please report any you find to the email address
-given in the Author section below). You can then use C++
-code in your rule actions instead of C code. Note that
-the default input source for your scanner remains @code{yyin},
-and default echoing is still done to @code{yyout}. Both of these
-remain @samp{FILE *} variables and not C++ @code{streams}.
-
-You can also use @code{flex} to generate a C++ scanner class, using
-the @samp{-+} option, (or, equivalently, @samp{%option c++}), which
-is automatically specified if the name of the flex executable ends
-in a @samp{+}, such as @code{flex++}. When using this option, flex
-defaults to generating the scanner to the file @file{lex.yy.cc} instead
-of @file{lex.yy.c}. The generated scanner includes the header file
-@file{FlexLexer.h}, which defines the interface to two C++ classes.
-
-The first class, @code{FlexLexer}, provides an abstract base
-class defining the general scanner class interface. It
-provides the following member functions:
-
-@table @samp
-@item const char* YYText()
-returns the text of the most recently matched
-token, the equivalent of @code{yytext}.
-
-@item int YYLeng()
-returns the length of the most recently matched
-token, the equivalent of @code{yyleng}.
-
-@item int lineno() const
-returns the current input line number (see @samp{%option yylineno}),
-or 1 if @samp{%option yylineno} was not used.
-
-@item void set_debug( int flag )
-sets the debugging flag for the scanner, equivalent to assigning to
-@code{yy_flex_debug} (see the Options section above). Note that you
-must build the scanner using @samp{%option debug} to include debugging
-information in it.
-
-@item int debug() const
-returns the current setting of the debugging flag.
-@end table
-
-Also provided are member functions equivalent to
-@samp{yy_switch_to_buffer(), yy_create_buffer()} (though the
-first argument is an @samp{istream*} object pointer and not a
-@samp{FILE*}, @samp{yy_flush_buffer()}, @samp{yy_delete_buffer()},
-and @samp{yyrestart()} (again, the first argument is a @samp{istream*}
-object pointer).
-
-The second class defined in @file{FlexLexer.h} is @code{yyFlexLexer},
-which is derived from @code{FlexLexer}. It defines the following
-additional member functions:
-
-@table @samp
-@item yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )
-constructs a @code{yyFlexLexer} object using the given
-streams for input and output. If not specified,
-the streams default to @code{cin} and @code{cout}, respectively.
-
-@item virtual int yylex()
-performs the same role is @samp{yylex()} does for ordinary
-flex scanners: it scans the input stream, consuming
-tokens, until a rule's action returns a value. If you derive a subclass
-@var{S}
-from @code{yyFlexLexer}
-and want to access the member functions and variables of
-@var{S}
-inside @samp{yylex()},
-then you need to use @samp{%option yyclass="@var{S}"}
-to inform @code{flex}
-that you will be using that subclass instead of @code{yyFlexLexer}.
-In this case, rather than generating @samp{yyFlexLexer::yylex()},
-@code{flex} generates @samp{@var{S}::yylex()}
-(and also generates a dummy @samp{yyFlexLexer::yylex()}
-that calls @samp{yyFlexLexer::LexerError()}
-if called).
-
-@item virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)
-reassigns @code{yyin} to @code{new_in}
-(if non-nil)
-and @code{yyout} to @code{new_out}
-(ditto), deleting the previous input buffer if @code{yyin}
-is reassigned.
-
-@item int yylex( istream* new_in = 0, ostream* new_out = 0 )
-first switches the input streams via @samp{switch_streams( new_in, new_out )}
-and then returns the value of @samp{yylex()}.
-@end table
-
-In addition, @code{yyFlexLexer} defines the following protected
-virtual functions which you can redefine in derived
-classes to tailor the scanner:
-
-@table @samp
-@item virtual int LexerInput( char* buf, int max_size )
-reads up to @samp{max_size} characters into @var{buf} and
-returns the number of characters read. To indicate
-end-of-input, return 0 characters. Note that
-"interactive" scanners (see the @samp{-B} and @samp{-I} flags)
-define the macro @code{YY_INTERACTIVE}. If you redefine
-@code{LexerInput()} and need to take different actions
-depending on whether or not the scanner might be
-scanning an interactive input source, you can test
-for the presence of this name via @samp{#ifdef}.
-
-@item virtual void LexerOutput( const char* buf, int size )
-writes out @var{size} characters from the buffer @var{buf},
-which, while NUL-terminated, may also contain
-"internal" NUL's if the scanner's rules can match
-text with NUL's in them.
-
-@item virtual void LexerError( const char* msg )
-reports a fatal error message. The default version
-of this function writes the message to the stream
-@code{cerr} and exits.
-@end table
-
-Note that a @code{yyFlexLexer} object contains its @emph{entire}
-scanning state. Thus you can use such objects to create
-reentrant scanners. You can instantiate multiple instances of
-the same @code{yyFlexLexer} class, and you can also combine
-multiple C++ scanner classes together in the same program
-using the @samp{-P} option discussed above.
-Finally, note that the @samp{%array} feature is not available to
-C++ scanner classes; you must use @samp{%pointer} (the default).
-
-Here is an example of a simple C++ scanner:
-
-@example
- // An example of using the flex C++ scanner class.
-
-%@{
-int mylineno = 0;
-%@}
-
-string \"[^\n"]+\"
-
-ws [ \t]+
-
-alpha [A-Za-z]
-dig [0-9]
-name (@{alpha@}|@{dig@}|\$)(@{alpha@}|@{dig@}|[_.\-/$])*
-num1 [-+]?@{dig@}+\.?([eE][-+]?@{dig@}+)?
-num2 [-+]?@{dig@}*\.@{dig@}+([eE][-+]?@{dig@}+)?
-number @{num1@}|@{num2@}
-
-%%
-
-@{ws@} /* skip blanks and tabs */
-
-"/*" @{
- int c;
-
- while((c = yyinput()) != 0)
- @{
- if(c == '\n')
- ++mylineno;
-
- else if(c == '*')
- @{
- if((c = yyinput()) == '/')
- break;
- else
- unput(c);
- @}
- @}
- @}
-
-@{number@} cout << "number " << YYText() << '\n';
-
-\n mylineno++;
-
-@{name@} cout << "name " << YYText() << '\n';
-
-@{string@} cout << "string " << YYText() << '\n';
-
-%%
-
-Version 2.5 December 1994 44
-
-int main( int /* argc */, char** /* argv */ )
- @{
- FlexLexer* lexer = new yyFlexLexer;
- while(lexer->yylex() != 0)
- ;
- return 0;
- @}
-@end example
-
-If you want to create multiple (different) lexer classes,
-you use the @samp{-P} flag (or the @samp{prefix=} option) to rename each
-@code{yyFlexLexer} to some other @code{xxFlexLexer}. You then can
-include @samp{<FlexLexer.h>} in your other sources once per lexer
-class, first renaming @code{yyFlexLexer} as follows:
-
-@example
-#undef yyFlexLexer
-#define yyFlexLexer xxFlexLexer
-#include <FlexLexer.h>
-
-#undef yyFlexLexer
-#define yyFlexLexer zzFlexLexer
-#include <FlexLexer.h>
-@end example
-
-if, for example, you used @samp{%option prefix="xx"} for one of
-your scanners and @samp{%option prefix="zz"} for the other.
-
-IMPORTANT: the present form of the scanning class is
-@emph{experimental} and may change considerably between major
-releases.
-
-@node Incompatibilities, Diagnostics, C++, Top
-@section Incompatibilities with @code{lex} and POSIX
-
-@code{flex} is a rewrite of the AT&T Unix @code{lex} tool (the two
-implementations do not share any code, though), with some
-extensions and incompatibilities, both of which are of
-concern to those who wish to write scanners acceptable to
-either implementation. Flex is fully compliant with the
-POSIX @code{lex} specification, except that when using @samp{%pointer}
-(the default), a call to @samp{unput()} destroys the contents of
-@code{yytext}, which is counter to the POSIX specification.
-
-In this section we discuss all of the known areas of
-incompatibility between flex, AT&T lex, and the POSIX
-specification.
-
-@code{flex's} @samp{-l} option turns on maximum compatibility with the
-original AT&T @code{lex} implementation, at the cost of a major
-loss in the generated scanner's performance. We note
-below which incompatibilities can be overcome using the @samp{-l}
-option.
-
-@code{flex} is fully compatible with @code{lex} with the following
-exceptions:
-
-@itemize -
-@item
-The undocumented @code{lex} scanner internal variable @code{yylineno}
-is not supported unless @samp{-l} or @samp{%option yylineno} is used.
-@code{yylineno} should be maintained on a per-buffer basis, rather
-than a per-scanner (single global variable) basis. @code{yylineno} is
-not part of the POSIX specification.
-
-@item
-The @samp{input()} routine is not redefinable, though it
-may be called to read characters following whatever
-has been matched by a rule. If @samp{input()} encounters
-an end-of-file the normal @samp{yywrap()} processing is
-done. A ``real'' end-of-file is returned by
-@samp{input()} as @code{EOF}.
-
-Input is instead controlled by defining the
-@code{YY_INPUT} macro.
-
-The @code{flex} restriction that @samp{input()} cannot be
-redefined is in accordance with the POSIX
-specification, which simply does not specify any way of
-controlling the scanner's input other than by making
-an initial assignment to @code{yyin}.
-
-@item
-The @samp{unput()} routine is not redefinable. This
-restriction is in accordance with POSIX.
-
-@item
-@code{flex} scanners are not as reentrant as @code{lex} scanners.
-In particular, if you have an interactive scanner
-and an interrupt handler which long-jumps out of
-the scanner, and the scanner is subsequently called
-again, you may get the following message:
-
-@example
-fatal flex scanner internal error--end of buffer missed
-@end example
-
-To reenter the scanner, first use
-
-@example
-yyrestart( yyin );
-@end example
-
-Note that this call will throw away any buffered
-input; usually this isn't a problem with an
-interactive scanner.
-
-Also note that flex C++ scanner classes @emph{are}
-reentrant, so if using C++ is an option for you, you
-should use them instead. See "Generating C++
-Scanners" above for details.
-
-@item
-@samp{output()} is not supported. Output from the @samp{ECHO}
-macro is done to the file-pointer @code{yyout} (default
-@code{stdout}).
-
-@samp{output()} is not part of the POSIX specification.
-
-@item
-@code{lex} does not support exclusive start conditions
-(%x), though they are in the POSIX specification.
-
-@item
-When definitions are expanded, @code{flex} encloses them
-in parentheses. With lex, the following:
-
-@example
-NAME [A-Z][A-Z0-9]*
-%%
-foo@{NAME@}? printf( "Found it\n" );
-%%
-@end example
-
-will not match the string "foo" because when the
-macro is expanded the rule is equivalent to
-"foo[A-Z][A-Z0-9]*?" and the precedence is such that the
-'?' is associated with "[A-Z0-9]*". With @code{flex}, the
-rule will be expanded to "foo([A-Z][A-Z0-9]*)?" and
-so the string "foo" will match.
-
-Note that if the definition begins with @samp{^} or ends
-with @samp{$} then it is @emph{not} expanded with parentheses, to
-allow these operators to appear in definitions
-without losing their special meanings. But the
-@samp{<s>, /}, and @samp{<<EOF>>} operators cannot be used in a
-@code{flex} definition.
-
-Using @samp{-l} results in the @code{lex} behavior of no
-parentheses around the definition.
-
-The POSIX specification is that the definition be enclosed in
-parentheses.
-
-@item
-Some implementations of @code{lex} allow a rule's action to begin on
-a separate line, if the rule's pattern has trailing whitespace:
-
-@example
-%%
-foo|bar<space here>
- @{ foobar_action(); @}
-@end example
-
-@code{flex} does not support this feature.
-
-@item
-The @code{lex} @samp{%r} (generate a Ratfor scanner) option is
-not supported. It is not part of the POSIX
-specification.
-
-@item
-After a call to @samp{unput()}, @code{yytext} is undefined until
-the next token is matched, unless the scanner was
-built using @samp{%array}. This is not the case with @code{lex}
-or the POSIX specification. The @samp{-l} option does
-away with this incompatibility.
-
-@item
-The precedence of the @samp{@{@}} (numeric range) operator
-is different. @code{lex} interprets "abc@{1,3@}" as "match
-one, two, or three occurrences of 'abc'", whereas
-@code{flex} interprets it as "match 'ab' followed by one,
-two, or three occurrences of 'c'". The latter is
-in agreement with the POSIX specification.
-
-@item
-The precedence of the @samp{^} operator is different. @code{lex}
-interprets "^foo|bar" as "match either 'foo' at the
-beginning of a line, or 'bar' anywhere", whereas
-@code{flex} interprets it as "match either 'foo' or 'bar'
-if they come at the beginning of a line". The
-latter is in agreement with the POSIX specification.
-
-@item
-The special table-size declarations such as @samp{%a}
-supported by @code{lex} are not required by @code{flex} scanners;
-@code{flex} ignores them.
-
-@item
-The name FLEX_SCANNER is #define'd so scanners may
-be written for use with either @code{flex} or @code{lex}.
-Scanners also include @code{YY_FLEX_MAJOR_VERSION} and
-@code{YY_FLEX_MINOR_VERSION} indicating which version of
-@code{flex} generated the scanner (for example, for the
-2.5 release, these defines would be 2 and 5
-respectively).
-@end itemize
-
-The following @code{flex} features are not included in @code{lex} or the
-POSIX specification:
-
-@example
-C++ scanners
-%option
-start condition scopes
-start condition stacks
-interactive/non-interactive scanners
-yy_scan_string() and friends
-yyterminate()
-yy_set_interactive()
-yy_set_bol()
-YY_AT_BOL()
-<<EOF>>
-<*>
-YY_DECL
-YY_START
-YY_USER_ACTION
-YY_USER_INIT
-#line directives
-%@{@}'s around actions
-multiple actions on a line
-@end example
-
-@noindent
-plus almost all of the flex flags. The last feature in
-the list refers to the fact that with @code{flex} you can put
-multiple actions on the same line, separated with
-semicolons, while with @code{lex}, the following
-
-@example
-foo handle_foo(); ++num_foos_seen;
-@end example
-
-@noindent
-is (rather surprisingly) truncated to
-
-@example
-foo handle_foo();
-@end example
-
-@code{flex} does not truncate the action. Actions that are not
-enclosed in braces are simply terminated at the end of the
-line.
-
-@node Diagnostics, Files, Incompatibilities, Top
-@section Diagnostics
-
-@table @samp
-@item warning, rule cannot be matched
-indicates that the given
-rule cannot be matched because it follows other rules that
-will always match the same text as it. For example, in
-the following "foo" cannot be matched because it comes
-after an identifier "catch-all" rule:
-
-@example
-[a-z]+ got_identifier();
-foo got_foo();
-@end example
-
-Using @code{REJECT} in a scanner suppresses this warning.
-
-@item warning, -s option given but default rule can be matched
-means that it is possible (perhaps only in a particular
-start condition) that the default rule (match any single
-character) is the only one that will match a particular
-input. Since @samp{-s} was given, presumably this is not
-intended.
-
-@item reject_used_but_not_detected undefined
-@itemx yymore_used_but_not_detected undefined
-These errors can
-occur at compile time. They indicate that the scanner
-uses @code{REJECT} or @samp{yymore()} but that @code{flex} failed to notice the
-fact, meaning that @code{flex} scanned the first two sections
-looking for occurrences of these actions and failed to
-find any, but somehow you snuck some in (via a #include
-file, for example). Use @samp{%option reject} or @samp{%option yymore}
-to indicate to flex that you really do use these features.
-
-@item flex scanner jammed
-a scanner compiled with @samp{-s} has
-encountered an input string which wasn't matched by any of
-its rules. This error can also occur due to internal
-problems.
-
-@item token too large, exceeds YYLMAX
-your scanner uses @samp{%array}
-and one of its rules matched a string longer than the @samp{YYL-}
-@code{MAX} constant (8K bytes by default). You can increase the
-value by #define'ing @code{YYLMAX} in the definitions section of
-your @code{flex} input.
-
-@item scanner requires -8 flag to use the character '@var{x}'
-Your
-scanner specification includes recognizing the 8-bit
-character @var{x} and you did not specify the -8 flag, and your
-scanner defaulted to 7-bit because you used the @samp{-Cf} or @samp{-CF}
-table compression options. See the discussion of the @samp{-7}
-flag for details.
-
-@item flex scanner push-back overflow
-you used @samp{unput()} to push
-back so much text that the scanner's buffer could not hold
-both the pushed-back text and the current token in @code{yytext}.
-Ideally the scanner should dynamically resize the buffer
-in this case, but at present it does not.
-
-@item input buffer overflow, can't enlarge buffer because scanner uses REJECT
-the scanner was working on matching an
-extremely large token and needed to expand the input
-buffer. This doesn't work with scanners that use @code{REJECT}.
-
-@item fatal flex scanner internal error--end of buffer missed
-This can occur in an scanner which is reentered after a
-long-jump has jumped out (or over) the scanner's
-activation frame. Before reentering the scanner, use:
-
-@example
-yyrestart( yyin );
-@end example
-
-@noindent
-or, as noted above, switch to using the C++ scanner class.
-
-@item too many start conditions in <> construct!
-you listed
-more start conditions in a <> construct than exist (so you
-must have listed at least one of them twice).
-@end table
-
-@node Files, Deficiencies, Diagnostics, Top
-@section Files
-
-@table @file
-@item -lfl
-library with which scanners must be linked.
-
-@item lex.yy.c
-generated scanner (called @file{lexyy.c} on some systems).
-
-@item lex.yy.cc
-generated C++ scanner class, when using @samp{-+}.
-
-@item <FlexLexer.h>
-header file defining the C++ scanner base class,
-@code{FlexLexer}, and its derived class, @code{yyFlexLexer}.
-
-@item flex.skl
-skeleton scanner. This file is only used when
-building flex, not when flex executes.
-
-@item lex.backup
-backing-up information for @samp{-b} flag (called @file{lex.bck}
-on some systems).
-@end table
-
-@node Deficiencies, See also, Files, Top
-@section Deficiencies / Bugs
-
-Some trailing context patterns cannot be properly matched
-and generate warning messages ("dangerous trailing
-context"). These are patterns where the ending of the first
-part of the rule matches the beginning of the second part,
-such as "zx*/xy*", where the 'x*' matches the 'x' at the
-beginning of the trailing context. (Note that the POSIX
-draft states that the text matched by such patterns is
-undefined.)
-
-For some trailing context rules, parts which are actually
-fixed-length are not recognized as such, leading to the
-abovementioned performance loss. In particular, parts
-using '|' or @{n@} (such as "foo@{3@}") are always considered
-variable-length.
-
-Combining trailing context with the special '|' action can
-result in @emph{fixed} trailing context being turned into the
-more expensive @var{variable} trailing context. For example, in
-the following:
-
-@example
-%%
-abc |
-xyz/def
-@end example
-
-Use of @samp{unput()} invalidates yytext and yyleng, unless the
-@samp{%array} directive or the @samp{-l} option has been used.
-
-Pattern-matching of NUL's is substantially slower than
-matching other characters.
-
-Dynamic resizing of the input buffer is slow, as it
-entails rescanning all the text matched so far by the
-current (generally huge) token.
-
-Due to both buffering of input and read-ahead, you cannot
-intermix calls to <stdio.h> routines, such as, for
-example, @samp{getchar()}, with @code{flex} rules and expect it to work.
-Call @samp{input()} instead.
-
-The total table entries listed by the @samp{-v} flag excludes the
-number of table entries needed to determine what rule has
-been matched. The number of entries is equal to the
-number of DFA states if the scanner does not use @code{REJECT}, and
-somewhat greater than the number of states if it does.
-
-@code{REJECT} cannot be used with the @samp{-f} or @samp{-F} options.
-
-The @code{flex} internal algorithms need documentation.
-
-@node See also, Author, Deficiencies, Top
-@section See also
-
-@code{lex}(1), @code{yacc}(1), @code{sed}(1), @code{awk}(1).
-
-John Levine, Tony Mason, and Doug Brown: Lex & Yacc;
-O'Reilly and Associates. Be sure to get the 2nd edition.
-
-M. E. Lesk and E. Schmidt, LEX - Lexical Analyzer Generator.
-
-Alfred Aho, Ravi Sethi and Jeffrey Ullman: Compilers:
-Principles, Techniques and Tools; Addison-Wesley (1986).
-Describes the pattern-matching techniques used by @code{flex}
-(deterministic finite automata).
-
-@node Author, , See also, Top
-@section Author
-
-Vern Paxson, with the help of many ideas and much inspiration from
-Van Jacobson. Original version by Jef Poskanzer. The fast table
-representation is a partial implementation of a design done by Van
-Jacobson. The implementation was done by Kevin Gong and Vern Paxson.
-
-Thanks to the many @code{flex} beta-testers, feedbackers, and
-contributors, especially Francois Pinard, Casey Leedom, Stan
-Adermann, Terry Allen, David Barker-Plummer, John Basrai, Nelson
-H.F. Beebe, @samp{benson@@odi.com}, Karl Berry, Peter A. Bigot,
-Simon Blanchard, Keith Bostic, Frederic Brehm, Ian Brockbank, Kin
-Cho, Nick Christopher, Brian Clapper, J.T. Conklin, Jason Coughlin,
-Bill Cox, Nick Cropper, Dave Curtis, Scott David Daniels, Chris
-G. Demetriou, Theo Deraadt, Mike Donahue, Chuck Doucette, Tom Epperly,
-Leo Eskin, Chris Faylor, Chris Flatters, Jon Forrest, Joe Gayda, Kaveh
-R. Ghazi, Eric Goldman, Christopher M. Gould, Ulrich Grepel, Peer
-Griebel, Jan Hajic, Charles Hemphill, NORO Hideo, Jarkko Hietaniemi,
-Scott Hofmann, Jeff Honig, Dana Hudes, Eric Hughes, John Interrante,
-Ceriel Jacobs, Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones,
-Henry Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O Kane,
-Amir Katz, @samp{ken@@ken.hilco.com}, Kevin B. Kenny, Steve Kirsch,
-Winfried Koenig, Marq Kole, Ronald Lamprecht, Greg Lee, Rohan Lenard,
-Craig Leres, John Levine, Steve Liddle, Mike Long, Mohamed el Lozy,
-Brian Madsen, Malte, Joe Marshall, Bengt Martensson, Chris Metcalf,
-Luke Mewburn, Jim Meyering, R. Alexander Milowski, Erik Naggum,
-G.T. Nicol, Landon Noll, James Nordby, Marc Nozell, Richard Ohnemus,
-Karsten Pahnke, Sven Panne, Roland Pesch, Walter Pelissero, Gaumond
-Pierre, Esmond Pitt, Jef Poskanzer, Joe Rahmeh, Jarmo Raiha, Frederic
-Raimbault, Pat Rankin, Rick Richardson, Kevin Rodgers, Kai Uwe Rommel,
-Jim Roskind, Alberto Santini, Andreas Scherer, Darrell Schiebel, Raf
-Schietekat, Doug Schmidt, Philippe Schnoebelen, Andreas Schwab, Alex
-Siegel, Eckehard Stolz, Jan-Erik Strvmquist, Mike Stump, Paul Stuart,
-Dave Tallman, Ian Lance Taylor, Chris Thewalt, Richard M. Timoney,
-Jodi Tsai, Paul Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms,
-Kent Williams, Ken Yap, Ron Zellar, Nathan Zelle, David Zuhn, and
-those whose names have slipped my marginal mail-archiving skills but
-whose contributions are appreciated all the same.
-
-Thanks to Keith Bostic, Jon Forrest, Noah Friedman, John Gilmore,
-Craig Leres, John Levine, Bob Mulcahy, G.T. Nicol, Francois Pinard,
-Rich Salz, and Richard Stallman for help with various distribution
-headaches.
-
-Thanks to Esmond Pitt and Earle Horton for 8-bit character support;
-to Benson Margulies and Fred Burke for C++ support; to Kent Williams
-and Tom Epperly for C++ class support; to Ove Ewerlid for support of
-NUL's; and to Eric Hughes for support of multiple buffers.
-
-This work was primarily done when I was with the Real Time Systems
-Group at the Lawrence Berkeley Laboratory in Berkeley, CA. Many thanks
-to all there for the support I received.
-
-Send comments to @samp{vern@@ee.lbl.gov}.
-
-@c @node Index, , Top, Top
-@c @unnumbered Index
-@c
-@c @printindex cp
-
-@contents
-@bye
-
-@c Local variables:
-@c texinfo-column-for-description: 32
-@c End: