mirror of
				https://github.com/RetroShare/RetroShare.git
				synced 2025-10-31 06:42:04 -04:00 
			
		
		
		
	 3be22536df
			
		
	
	
		3be22536df
		
	
	
	
	
		
			
			git-svn-id: http://svn.code.sf.net/p/retroshare/code/branches/v0.5-gxs-b1@5953 b45a01b8-16f6-495d-af2f-9b41ad6348cc
		
			
				
	
	
		
			933 lines
		
	
	
	
		
			28 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
			
		
		
	
	
			933 lines
		
	
	
	
		
			28 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
| .\" Copyright (c) 2007 by Ian Piumarta
 | |
| .\" All rights reserved.
 | |
| .\" 
 | |
| .\" Permission is hereby granted, free of charge, to any person obtaining a
 | |
| .\" copy of this software and associated documentation files (the 'Software'),
 | |
| .\" to deal in the Software without restriction, including without limitation
 | |
| .\" the rights to use, copy, modify, merge, publish, distribute, and/or sell
 | |
| .\" copies of the Software, and to permit persons to whom the Software is
 | |
| .\" furnished to do so, provided that the above copyright notice(s) and this
 | |
| .\" permission notice appear in all copies of the Software.  Acknowledgement
 | |
| .\" of the use of this Software in supporting documentation would be
 | |
| .\" appreciated but is not required.
 | |
| .\" 
 | |
| .\" THE SOFTWARE IS PROVIDED 'AS IS'.  USE ENTIRELY AT YOUR OWN RISK.
 | |
| .\" 
 | |
| .\" Last edited: 2012-04-29 16:58:44 by piumarta on emilia
 | |
| .\"
 | |
| .TH PEG 1 "April 2012" "Version 0.1"
 | |
| .SH NAME
 | |
| peg, leg \- parser generators
 | |
| .SH SYNOPSIS
 | |
| .B peg
 | |
| .B [\-hvV \-ooutput]
 | |
| .I [filename ...]
 | |
| .sp 0
 | |
| .B leg
 | |
| .B [\-hvV \-ooutput]
 | |
| .I [filename ...]
 | |
| .SH DESCRIPTION
 | |
| .I peg
 | |
| and
 | |
| .I leg
 | |
| are tools for generating recursive-descent parsers: programs that
 | |
| perform pattern matching on text.  They process a Parsing Expression
 | |
| Grammar (PEG) [Ford 2004] to produce a program that recognises legal
 | |
| sentences of that grammar.
 | |
| .I peg
 | |
| processes PEGs written using the original syntax described by Ford;
 | |
| .I leg
 | |
| processes PEGs written using slightly different syntax and conventions
 | |
| that are intended to make it an attractive replacement for parsers
 | |
| built with
 | |
| .IR lex (1)
 | |
| and
 | |
| .IR yacc (1).
 | |
| Unlike
 | |
| .I lex
 | |
| and
 | |
| .IR yacc ,
 | |
| .I peg
 | |
| and
 | |
| .I leg
 | |
| support unlimited backtracking, provide ordered choice as a means for
 | |
| disambiguation, and can combine scanning (lexical analysis) and
 | |
| parsing (syntactic analysis) into a single activity.
 | |
| .PP
 | |
| .I peg
 | |
| reads the specified
 | |
| .IR filename s,
 | |
| or standard input if no
 | |
| .IR filename s
 | |
| are given, for a grammar describing the parser to generate.
 | |
| .I peg
 | |
| then generates a C source file that defines a function
 | |
| .IR yyparse().
 | |
| This C source file can be included in, or compiled and then linked
 | |
| with, a client program.  Each time the client program calls
 | |
| .IR yyparse ()
 | |
| the parser consumes input text according to the parsing rules,
 | |
| starting from the first rule in the grammar.
 | |
| .IR yyparse ()
 | |
| returns non-zero if the input could be parsed according to the
 | |
| grammar; it returns zero if the input could not be parsed.
 | |
| .PP
 | |
| The prefix 'yy' or 'YY' is prepended to all externally-visible symbols
 | |
| in the generated parser.  This is intended to reduce the risk of
 | |
| namespace pollution in client programs.  (The choice of 'yy' is
 | |
| historical; see
 | |
| .IR lex (1)
 | |
| and
 | |
| .IR yacc (1),
 | |
| for example.)
 | |
| .SH OPTIONS
 | |
| .I peg
 | |
| and 
 | |
| .I leg
 | |
| provide the following options:
 | |
| .TP
 | |
| .B \-h
 | |
| prints a summary of available options and then exits.
 | |
| .TP
 | |
| .B \-ooutput
 | |
| writes the generated parser to the file
 | |
| .B output
 | |
| instead of the standard output.
 | |
| .TP
 | |
| .B \-v
 | |
| writes verbose information to standard error while working.
 | |
| .TP
 | |
| .B \-V
 | |
| writes version information to standard error then exits.
 | |
| .SH A SIMPLE EXAMPLE
 | |
| The following
 | |
| .I peg
 | |
| input specifies a grammar with a single rule (called 'start') that is
 | |
| satisfied when the input contains the string "username".
 | |
| .nf
 | |
| 
 | |
|     start <- "username"
 | |
| 
 | |
| .fi
 | |
| (The quotation marks are
 | |
| .I not
 | |
| part of the matched text; they serve to indicate a literal string to
 | |
| be matched.)  In other words,
 | |
| .IR  yyparse ()
 | |
| in the generated C source will return non-zero only if the next eight
 | |
| characters read from the input spell the word "username".  If the
 | |
| input contains anything else,
 | |
| .IR yyparse ()
 | |
| returns zero and no input will have been consumed.  (Subsequent calls
 | |
| to
 | |
| .IR yyparse ()
 | |
| will also return zero, since the parser is effectively blocked looking
 | |
| for the string "username".)  To ensure progress we can add an
 | |
| alternative clause to the 'start' rule that will match any single
 | |
| character if "username" is not found.
 | |
| .nf
 | |
| 
 | |
|     start <- "username"
 | |
|            / .
 | |
| 
 | |
| .fi
 | |
| .IR yyparse ()
 | |
| now always returns non-zero (except at the very end of the input).  To
 | |
| do something useful we can add actions to the rules.  These actions
 | |
| are performed after a complete match is found (starting from the first
 | |
| rule) and are chosen according to the 'path' taken through the grammar
 | |
| to match the input.  (Linguists would call this path a 'phrase
 | |
| marker'.)
 | |
| .nf
 | |
| 
 | |
|     start <- "username"    { printf("%s\\n", getlogin()); }
 | |
|            / < . >         { putchar(yytext[0]); }
 | |
| 
 | |
| .fi
 | |
| The first line instructs the parser to print the user's login name
 | |
| whenever it sees "username" in the input.  If that match fails, the
 | |
| second line tells the parser to echo the next character on the input
 | |
| the standard output.  Our parser is now performing useful work: it
 | |
| will copy the input to the output, replacing all occurrences of
 | |
| "username" with the user's account name.
 | |
| .PP
 | |
| Note the angle brackets ('<' and '>') that were added to the second
 | |
| alternative.  These have no effect on the meaning of the rule, but
 | |
| serve to delimit the text made available to the following action in
 | |
| the variable
 | |
| .IR yytext .
 | |
| .PP
 | |
| If the above grammar is placed in the file
 | |
| .BR username.peg ,
 | |
| running the command
 | |
| .nf
 | |
| 
 | |
|     peg -o username.c username.peg
 | |
| 
 | |
| .fi
 | |
| will save the corresponding parser in the file
 | |
| .BR username.c .
 | |
| To create a complete program this parser could be included by a C
 | |
| program as follows.
 | |
| .nf
 | |
| 
 | |
|     #include <stdio.h>      /* printf(), putchar() */
 | |
|     #include <unistd.h>     /* getlogin() */
 | |
| 
 | |
|     #include "username.c"   /* yyparse() */
 | |
| 
 | |
|     int main()
 | |
|     {
 | |
|       while (yyparse())     /* repeat until EOF */
 | |
|         ;
 | |
|       return 0;
 | |
|     }
 | |
| .fi
 | |
| .SH PEG GRAMMARS
 | |
| A grammar consists of a set of named rules.
 | |
| .nf
 | |
| 
 | |
|     name <- pattern
 | |
| 
 | |
| .fi
 | |
| The
 | |
| .B pattern
 | |
| contains one or more of the following elements.
 | |
| .TP
 | |
| .B name
 | |
| The element stands for the entire pattern in the rule with the given
 | |
| .BR name .
 | |
| .TP
 | |
| .BR \(dq characters \(dq
 | |
| A character or string enclosed in double quotes is matched literally.
 | |
| The ANSI C esacpe sequences are recognised within the
 | |
| .IR characters .
 | |
| .TP
 | |
| .BR ' characters '
 | |
| A character or string enclosed in single quotes is matched literally, as above.
 | |
| .TP
 | |
| .BR [ characters ]
 | |
| A set of characters enclosed in square brackets matches any single
 | |
| character from the set, with escape characters recognised as above.
 | |
| If the set begins with an uparrow (^) then the set is negated (the
 | |
| element matches any character
 | |
| .I not
 | |
| in the set).  Any pair of characters separated with a dash (-)
 | |
| represents the range of characters from the first to the second,
 | |
| inclusive.  A single alphabetic character or underscore is matched by
 | |
| the following set.
 | |
| .nf
 | |
| 
 | |
|     [a-zA-Z_]
 | |
| 
 | |
| .fi
 | |
| Similarly, the following matches  any single non-digit character.
 | |
| .nf
 | |
| 
 | |
|     [^0-9]
 | |
| 
 | |
| .fi
 | |
| .TP
 | |
| .B .
 | |
| A dot matches any character.  Note that the only time this fails is at
 | |
| the end of file, where there is no character to match.
 | |
| .TP
 | |
| .BR ( \ pattern\  )
 | |
| Parentheses are used for grouping (modifying the precendence of the
 | |
| operators described below).
 | |
| .TP
 | |
| .BR { \ action\  }
 | |
| Curly braces surround actions.  The action is arbitray C source code
 | |
| to be executed at the end of matching.  Any braces within the action
 | |
| must be properly nested.  Any input text that was matched before the
 | |
| action and delimited by angle brackets (see below) is made available
 | |
| within the action as the contents of the character array
 | |
| .IR yytext .
 | |
| The length of (number of characters in)
 | |
| .I yytext
 | |
| is available in the variable
 | |
| .IR yyleng .
 | |
| (These variable names are historical; see
 | |
| .IR lex (1).)
 | |
| .TP
 | |
| .B <
 | |
| An opening angle bracket always matches (consuming no input) and
 | |
| causes the parser to begin accumulating matched text.  This text will
 | |
| be made available to actions in the variable
 | |
| .IR yytext .
 | |
| .TP
 | |
| .B >
 | |
| A closing angle bracket always matches (consuming no input) and causes
 | |
| the parser to stop accumulating text for
 | |
| .IR yytext .
 | |
| .PP
 | |
| The above
 | |
| .IR element s
 | |
| can be made optional and/or repeatable with the following suffixes:
 | |
| .TP
 | |
| .RB element\  ?
 | |
| The element is optional.  If present on the input, it is consumed and
 | |
| the match succeeds.  If not present on the input, no text is consumed
 | |
| and the match succeeds anyway.
 | |
| .TP
 | |
| .RB element\  +
 | |
| The element is repeatable.  If present on the input, one or more
 | |
| occurrences of
 | |
| .I element
 | |
| are consumed and the match succeeds.  If no occurrences of
 | |
| .I element
 | |
| are present on the input, the match fails.
 | |
| .TP
 | |
| .RB element\  *
 | |
| The element is optional and repeatable.  If present on the input, one or more
 | |
| occurrences of
 | |
| .I element
 | |
| are consumed and the match succeeds.  If no occurrences of
 | |
| .I element
 | |
| are present on the input, the match succeeds anyway.
 | |
| .PP
 | |
| The above elements and suffixes can be converted into predicates (that
 | |
| match arbitray input text and subsequently succeed or fail
 | |
| .I without
 | |
| consuming that input) with the following prefixes:
 | |
| .TP
 | |
| .BR & \ element
 | |
| The predicate succeeds only if
 | |
| .I element
 | |
| can be matched.  Input text scanned while matching
 | |
| .I element
 | |
| is not consumed from the input and remains available for subsequent
 | |
| matching.
 | |
| .TP
 | |
| .BR ! \ element
 | |
| The predicate succeeds only if
 | |
| .I element
 | |
| cannot be matched.  Input text scanned while matching
 | |
| .I element
 | |
| is not consumed from the input and remains available for subsequent
 | |
| matching.  A popular idiom is
 | |
| .nf
 | |
| 
 | |
|     !.
 | |
| 
 | |
| .fi
 | |
| which matches the end of file, after the last character of the input
 | |
| has already been consumed.
 | |
| .PP
 | |
| A special form of the '&' predicate is provided:
 | |
| .TP
 | |
| .BR & {\ expression\ }
 | |
| In this predicate the simple C
 | |
| .I expression
 | |
| .RB ( not
 | |
| statement) is evaluated immediately when the parser reaches the
 | |
| predicate.  If the
 | |
| .I expression
 | |
| yields non-zero (true) the 'match' succeeds and the parser continues
 | |
| with the next element in the pattern.  If the
 | |
| .I expression
 | |
| yields zero (false) the 'match' fails and the parser backs up to look
 | |
| for an alternative parse of the input.
 | |
| .PP
 | |
| Several elements (with or without prefixes and suffixes) can be
 | |
| combined into a
 | |
| .I sequence
 | |
| by writing them one after the other.  The entire sequence matches only
 | |
| if each individual element within it matches, from left to right.
 | |
| .PP
 | |
| Sequences can be separated into disjoint alternatives by the
 | |
| alternation operator '/'.
 | |
| .TP
 | |
| .RB sequence-1\  / \ sequence-2\  / \ ...\  / \ sequence-N
 | |
| Each sequence is tried in turn until one of them matches, at which
 | |
| time matching for the overall pattern succeeds.  If none of the
 | |
| sequences matches then the match of the overall pattern fails.
 | |
| .PP
 | |
| Finally, the pound sign (#) introduces a comment (discarded) that
 | |
| continues until the end of the line.
 | |
| .PP
 | |
| To summarise the above, the parser tries to match the input text
 | |
| against a pattern containing literals, names (representing other
 | |
| rules), and various operators (written as prefixes, suffixes,
 | |
| juxtaposition for sequencing and and infix alternation operator) that
 | |
| modify how the elements within the pattern are matched.  Matches are
 | |
| made from left to right, 'descending' into named sub-rules as they are
 | |
| encountered.  If the matching process fails, the parser 'back tracks'
 | |
| ('rewinding' the input appropriately in the process) to find the
 | |
| nearest alternative 'path' through the grammar.  In other words the
 | |
| parser performs a depth-first, left-to-right search for the first
 | |
| successfully-matching path through the rules.  If found, the actions
 | |
| along the successful path are executed (in the order they were
 | |
| encountered).
 | |
| .PP
 | |
| Note that predicates are evaluated
 | |
| .I immediately
 | |
| during the search for a successful match, since they contribute to the
 | |
| success or failure of the search.  Actions, however, are evaluated
 | |
| only after a successful match has been found.
 | |
| .SH PEG GRAMMAR FOR PEG GRAMMARS
 | |
| The grammar for
 | |
| .I peg
 | |
| grammars is shown below.  This will both illustrate and formalise
 | |
| the above description.
 | |
| .nf
 | |
| 
 | |
|     Grammar         <- Spacing Definition+ EndOfFile
 | |
|     
 | |
|     Definition      <- Identifier LEFTARROW Expression
 | |
|     Expression      <- Sequence ( SLASH Sequence )*
 | |
|     Sequence        <- Prefix*
 | |
|     Prefix          <- AND Action
 | |
|                      / ( AND | NOT )? Suffix
 | |
|     Suffix          <- Primary ( QUERY / STAR / PLUS )?
 | |
|     Primary         <- Identifier !LEFTARROW
 | |
|                      / OPEN Expression CLOSE
 | |
|                      / Literal
 | |
|                      / Class
 | |
|                      / DOT
 | |
|                      / Action
 | |
|                      / BEGIN
 | |
|                      / END
 | |
|     
 | |
|     Identifier      <- < IdentStart IdentCont* > Spacing
 | |
|     IdentStart      <- [a-zA-Z_]
 | |
|     IdentCont       <- IdentStart / [0-9]
 | |
|     Literal         <- ['] < ( !['] Char  )* > ['] Spacing
 | |
|                      / ["] < ( !["] Char  )* > ["] Spacing
 | |
|     Class           <- '[' < ( !']' Range )* > ']' Spacing
 | |
|     Range           <- Char '-' Char / Char
 | |
|     Char            <- '\\\\' [abefnrtv'"\\[\\]\\\\]
 | |
|                      / '\\\\' [0-3][0-7][0-7]
 | |
|                      / '\\\\' [0-7][0-7]?
 | |
|                      / '\\\\' '-'
 | |
|                      / !'\\\\' .
 | |
|     LEFTARROW       <- '<-' Spacing
 | |
|     SLASH           <- '/' Spacing
 | |
|     AND             <- '&' Spacing
 | |
|     NOT             <- '!' Spacing
 | |
|     QUERY           <- '?' Spacing
 | |
|     STAR            <- '*' Spacing
 | |
|     PLUS            <- '+' Spacing
 | |
|     OPEN            <- '(' Spacing
 | |
|     CLOSE           <- ')' Spacing
 | |
|     DOT             <- '.' Spacing
 | |
|     Spacing         <- ( Space / Comment )*
 | |
|     Comment         <- '#' ( !EndOfLine . )* EndOfLine
 | |
|     Space           <- ' ' / '\\t' / EndOfLine
 | |
|     EndOfLine       <- '\\r\\n' / '\\n' / '\\r'
 | |
|     EndOfFile       <- !.
 | |
|     Action          <- '{' < [^}]* > '}' Spacing
 | |
|     BEGIN           <- '<' Spacing
 | |
|     END             <- '>' Spacing
 | |
| 
 | |
| .fi
 | |
| .SH LEG GRAMMARS
 | |
| .I leg
 | |
| is a variant of
 | |
| .I peg
 | |
| that adds some features of
 | |
| .IR lex (1)
 | |
| and
 | |
| .IR yacc (1).
 | |
| It differs from
 | |
| .I peg
 | |
| in the following ways.
 | |
| .TP
 | |
| .BI %{\  text... \ %}
 | |
| A declaration section can appear anywhere that a rule definition is
 | |
| expected.  The
 | |
| .I text
 | |
| between the delimiters '%{' and '%}' is copied verbatim to the
 | |
| generated C parser code
 | |
| .I before
 | |
| the code that implements the parser itself.
 | |
| .TP
 | |
| .IB name\  = \ pattern
 | |
| The 'assignment' operator replaces the left arrow operator '<-'.
 | |
| .TP
 | |
| .B rule-name
 | |
| Hyphens can appear as letters in the names of rules.  Each hyphen is
 | |
| converted into an underscore in the generated C source code.  A single
 | |
| single hyphen '-' is a legal rule name.
 | |
| .nf
 | |
| 
 | |
|     -       = [ \\t\\n\\r]*
 | |
|     number  = [0-9]+                 -
 | |
|     name    = [a-zA-Z_][a-zA_Z_0-9]* -
 | |
|     l-paren = '('                    -
 | |
|     r-paren = ')'                    -
 | |
|     
 | |
| .fi
 | |
| This example shows how ignored whitespace can be obvious when reading
 | |
| the grammar and yet unobtrusive when placed liberally at the end of
 | |
| every rule associated with a lexical element.
 | |
| .TP
 | |
| .IB seq-1\  | \ seq-2
 | |
| The alternation operator is vertical bar '|' rather than forward
 | |
| slash '/'.  The
 | |
| .I peg
 | |
| rule
 | |
| .nf
 | |
| 
 | |
|     name <- sequence-1
 | |
|           / sequence-2
 | |
|           / sequence-3
 | |
| 
 | |
| .fi
 | |
| is therefore written
 | |
| .nf
 | |
| 
 | |
|     name = sequence-1
 | |
|          | sequence-2
 | |
|          | sequence-3
 | |
|          ;
 | |
| 
 | |
| .fi
 | |
| in
 | |
| .I leg
 | |
| (with the final semicolon being optional, as described next).
 | |
| .TP
 | |
| .IB pattern\  ;
 | |
| A semicolon punctuator can optionally terminate a
 | |
| .IR pattern .
 | |
| .TP
 | |
| .BI %% \ text...
 | |
| A double percent '%%' terminates the rules (and declarations) section of
 | |
| the grammar.  All
 | |
| .I text
 | |
| following '%%' is copied verbatim to the generated C parser code
 | |
| .I after
 | |
| the parser implementation code.
 | |
| .TP
 | |
| .BI $$\ = \ value
 | |
| A sub-rule can return a semantic
 | |
| .I value
 | |
| from an action by assigning it to the pseudo-variable '$$'.  All
 | |
| semantic values must have the same type (which defaults to 'int').
 | |
| This type can be changed by defining YYSTYPE in a declaration section.
 | |
| .TP
 | |
| .IB identifier : name
 | |
| The semantic value returned (by assigning to '$$') from the sub-rule
 | |
| .I name
 | |
| is associated with the
 | |
| .I identifier
 | |
| and can be referred to in subsequent actions.
 | |
| .PP
 | |
| The desk calclator example below illustrates the use of '$$' and ':'.
 | |
| .SH LEG EXAMPLE: A DESK CALCULATOR
 | |
| The extensions in
 | |
| .I leg
 | |
| described above allow useful parsers and evaluators (including
 | |
| declarations, grammar rules, and supporting C functions such
 | |
| as 'main') to be kept within a single source file.  To illustrate this
 | |
| we show a simple desk calculator supporting the four common arithmetic
 | |
| operators and named variables.  The intermediate results of arithmetic
 | |
| evaluation will be accumulated on an implicit stack by returning them
 | |
| as semantic values from sub-rules.
 | |
| .nf
 | |
| 
 | |
|     %{
 | |
|     #include <stdio.h>     /* printf() */
 | |
|     #include <stdlib.h>    /* atoi() */
 | |
|     int vars[26];
 | |
|     %}
 | |
|     
 | |
|     Stmt    = - e:Expr EOL                  { printf("%d\\n", e); }
 | |
|             | ( !EOL . )* EOL               { printf("error\\n"); }
 | |
|     
 | |
|     Expr    = i:ID ASSIGN s:Sum             { $$ = vars[i] = s; }
 | |
|             | s:Sum                         { $$ = s; }
 | |
|     
 | |
|     Sum     = l:Product
 | |
|                     ( PLUS  r:Product       { l += r; }
 | |
|                     | MINUS r:Product       { l -= r; }
 | |
|                     )*                      { $$ = l; }
 | |
|     
 | |
|     Product = l:Value
 | |
|                     ( TIMES  r:Value        { l *= r; }
 | |
|                     | DIVIDE r:Value        { l /= r; }
 | |
|                     )*                      { $$ = l; }
 | |
|     
 | |
|     Value   = i:NUMBER                      { $$ = atoi(yytext); }
 | |
|             | i:ID !ASSIGN                  { $$ = vars[i]; }
 | |
|             | OPEN i:Expr CLOSE             { $$ = i; }
 | |
|     
 | |
|     NUMBER  = < [0-9]+ >    -               { $$ = atoi(yytext); }
 | |
|     ID      = < [a-z]  >    -               { $$ = yytext[0] - 'a'; }
 | |
|     ASSIGN  = '='           -
 | |
|     PLUS    = '+'           -
 | |
|     MINUS   = '-'           -
 | |
|     TIMES   = '*'           -
 | |
|     DIVIDE  = '/'           -
 | |
|     OPEN    = '('           -
 | |
|     CLOSE   = ')'           -
 | |
|     
 | |
|     -       = [ \\t]*
 | |
|     EOL     = '\\n' | '\\r\\n' | '\\r' | ';'
 | |
|     
 | |
|     %%
 | |
|     
 | |
|     int main()
 | |
|     {
 | |
|       while (yyparse())
 | |
|         ;
 | |
|       return 0;
 | |
|     }
 | |
| 
 | |
| .fi
 | |
| .SH LEG GRAMMAR FOR LEG GRAMMARS
 | |
| The grammar for
 | |
| .I leg
 | |
| grammars is shown below.  This will both illustrate and formalise the
 | |
| above description.
 | |
| .nf
 | |
| 
 | |
|     grammar =       -
 | |
|                     ( declaration | definition )+
 | |
|                     trailer? end-of-file
 | |
|     
 | |
|     declaration =   '%{' < ( !'%}' . )* > RPERCENT
 | |
|     
 | |
|     trailer =       '%%' < .* >
 | |
|     
 | |
|     definition =    identifier EQUAL expression SEMICOLON?
 | |
|     
 | |
|     expression =    sequence ( BAR sequence )*
 | |
|     
 | |
|     sequence =      prefix+
 | |
|     
 | |
|     prefix =        AND action
 | |
|     |               ( AND | NOT )? suffix
 | |
|     
 | |
|     suffix =        primary ( QUERY | STAR | PLUS )?
 | |
|     
 | |
|     primary =       identifier COLON identifier !EQUAL
 | |
|     |               identifier !EQUAL
 | |
|     |               OPEN expression CLOSE
 | |
|     |               literal
 | |
|     |               class
 | |
|     |               DOT
 | |
|     |               action
 | |
|     |               BEGIN
 | |
|     |               END
 | |
|     
 | |
|     identifier =    < [-a-zA-Z_][-a-zA-Z_0-9]* > -
 | |
|     
 | |
|     literal =       ['] < ( !['] char )* > ['] -
 | |
|     |               ["] < ( !["] char )* > ["] -
 | |
|     
 | |
|     class =         '[' < ( !']' range )* > ']' -
 | |
|     
 | |
|     range =         char '-' char | char
 | |
|     
 | |
|     char =          '\\\\' [abefnrtv'"\\[\\]\\\\]
 | |
|     |               '\\\\' [0-3][0-7][0-7]
 | |
|     |               '\\\\' [0-7][0-7]?
 | |
|     |               !'\\\\' .
 | |
|     
 | |
|     action =        '{' < [^}]* > '}' -
 | |
|     
 | |
|     EQUAL =         '=' -
 | |
|     COLON =         ':' -
 | |
|     SEMICOLON =     ';' -
 | |
|     BAR =           '|' -
 | |
|     AND =           '&' -
 | |
|     NOT =           '!' -
 | |
|     QUERY =         '?' -
 | |
|     STAR =          '*' -
 | |
|     PLUS =          '+' -
 | |
|     OPEN =          '(' -
 | |
|     CLOSE =         ')' -
 | |
|     DOT =           '.' -
 | |
|     BEGIN =         '<' -
 | |
|     END =           '>' -
 | |
|     RPERCENT =      '%}' -
 | |
|     
 | |
|     - =             ( space | comment )*
 | |
|     space =         ' ' | '\\t' | end-of-line
 | |
|     comment =       '#' ( !end-of-line . )* end-of-line
 | |
|     end-of-line =   '\\r\\n' | '\\n' | '\\r'
 | |
|     end-of-file =   !.
 | |
| 
 | |
| .fi
 | |
| .SH CUSTOMISING THE PARSER
 | |
| The following symbols can be redefined in declaration sections to
 | |
| modify the generated parser code.
 | |
| .TP
 | |
| .B YYSTYPE
 | |
| The semantic value type.  The pseudo-variable '$$' and the
 | |
| identifiers 'bound' to rule results with the colon operator ':' should
 | |
| all be considered as being declared to have this type.  The default
 | |
| value is 'int'.
 | |
| .TP
 | |
| .B YYPARSE
 | |
| The name of the main entry point to the parser.  The default value
 | |
| is 'yyparse'.
 | |
| .TP
 | |
| .B YYPARSEFROM
 | |
| The name of an alternative entry point to the parser.  This function
 | |
| expects one argument: the function corresponding to the rule from
 | |
| which the search for a match should begin.  The default
 | |
| is 'yyparsefrom'.  Note that yyparse() is defined as
 | |
| .nf
 | |
| 
 | |
|     int yyparse() { return yyparsefrom(yy_foo); }
 | |
| 
 | |
| .fi
 | |
| where 'foo' is the name of the first rule in the grammar.
 | |
| .TP
 | |
| .BI YY_INPUT( buf , \ result , \ max_size )
 | |
| This macro is invoked by the parser to obtain more input text.
 | |
| .I buf
 | |
| points to an area of memory that can hold at most
 | |
| .I max_size
 | |
| characters.  The macro should copy input text to
 | |
| .I buf
 | |
| and then assign the integer variable
 | |
| .I result
 | |
| to indicate the number of characters copied.  If no more input is available,
 | |
| the macro should assign 0 to
 | |
| .IR result .
 | |
| By default, the YY_INPUT macro is defined as follows.
 | |
| .nf
 | |
| 
 | |
|     #define YY_INPUT(buf, result, max_size)        \\
 | |
|     {                                              \\
 | |
|       int yyc= getchar();                          \\
 | |
|       result= (EOF == yyc) ? 0 : (*(buf)= yyc, 1); \\
 | |
|     }
 | |
| 
 | |
| .fi
 | |
| .TP
 | |
| .B YY_DEBUG
 | |
| If this symbols is defined then additional code will be included in
 | |
| the parser that prints vast quantities of arcane information to the
 | |
| standard error while the parser is running.
 | |
| .TP
 | |
| .B YY_BEGIN
 | |
| This macro is invoked to mark the start of input text that will be
 | |
| made available in actions as 'yytext'.  This corresponds to
 | |
| occurrences of '<' in the grammar.  These are converted into
 | |
| predicates that are expected to succeed.  The default definition
 | |
| .nf
 | |
| 
 | |
|     #define YY_BEGIN (yybegin= yypos, 1)
 | |
| 
 | |
| .fi
 | |
| therefore saves the current input position and returns 1 ('true') as
 | |
| the result of the predicate.
 | |
| .TP
 | |
| .B YY_END
 | |
| This macros corresponds to '>' in the grammar.  Again, it is a
 | |
| predicate so the default definition saves the input position
 | |
| before 'succeeding'.
 | |
| .nf
 | |
| 
 | |
|     #define YY_END (yyend= yypos, 1)
 | |
| 
 | |
| .fi
 | |
| .TP
 | |
| .BI YY_PARSE( T )
 | |
| This macro declares the parser entry points (yyparse and yyparsefrom)
 | |
| to be of type
 | |
| .IR T .
 | |
| The default definition
 | |
| .nf
 | |
| 
 | |
|     #define YY_PARSE(T) T
 | |
| 
 | |
| .fi
 | |
| leaves yyparse() and yyparsefrom() with global visibility.  If they
 | |
| should not be externally visible in other source files, this macro can
 | |
| be redefined to declare them 'static'.
 | |
| .nf
 | |
| 
 | |
|     #define YY_PARSE(T) static T
 | |
| 
 | |
| .fi
 | |
| .TP
 | |
| .BI YY_CTX_LOCAL
 | |
| If this symbol is defined during compilation of a generated parser
 | |
| then global parser state will be kept in a structure of
 | |
| type 'yycontext' which can be declared as a local variable.  This
 | |
| allows multiple instances of parsers to coexist and to be thread-safe.
 | |
| The parsing function
 | |
| .IR yyparse ()
 | |
| will be declared to expect a first argument of type 'yycontext *', an
 | |
| instance of the structure holding the global state for the parser.
 | |
| This instance must be allocated and initialised to zero by the client.
 | |
| A trivial but complete example is as follows.
 | |
| .nf
 | |
| 
 | |
|     #include <stdio.h>
 | |
| 
 | |
|     #define YY_CTX_LOCAL
 | |
| 
 | |
|     #include "the-generated-parser.peg.c"
 | |
| 
 | |
|     int main()
 | |
|     {
 | |
|       yycontext ctx;
 | |
|       memset(&ctx, 0, sizeof(yycontext));
 | |
|       while (yyparse(&ctx));
 | |
|       return 0;
 | |
|     }
 | |
| 
 | |
| .fi
 | |
| Note that if this symbol is undefined then the compiled parser will
 | |
| statically allocate its global state and will be neither reentrant nor
 | |
| thread-safe.
 | |
| .TP
 | |
| .BI YY_CTX_MEMBERS
 | |
| If YY_CTX_LOCAL is defined (see above) then the macro YY_CTX_MEMBERS
 | |
| can be defined to expand to any additional member field declarations
 | |
| that the client would like included in the declaration of
 | |
| the 'yycontext' structure type.  These additional members are
 | |
| otherwise ignored by the generated parser.  The instance
 | |
| of 'yycontext' associated with the currently-active parser is
 | |
| available in actions through the pointer variable
 | |
| .IR yyctx .
 | |
| .PP
 | |
| The following variables can be reffered to within actions.
 | |
| .TP
 | |
| .B char *yybuf
 | |
| This variable points to the parser's input buffer used to store input
 | |
| text that has not yet been matched.
 | |
| .TP
 | |
| .B int yypos
 | |
| This is the offset (in yybuf) of the next character to be matched and
 | |
| consumed.
 | |
| .TP
 | |
| .B char *yytext
 | |
| The most recent matched text delimited by '<' and '>' is stored in this variable.
 | |
| .TP
 | |
| .B int yyleng
 | |
| This variable indicates the number of characters in 'yytext'.
 | |
| .TP
 | |
| .B yycontext *yyctx
 | |
| This variable points to the instance of 'yycontext' associated with
 | |
| the currently-active parser.
 | |
| .SH DIAGNOSTICS
 | |
| .I peg
 | |
| and
 | |
| .I leg
 | |
| warn about the following conditions while converting a grammar into a parser.
 | |
| .TP
 | |
| .B syntax error
 | |
| The input grammar was malformed in some way.  The error message will
 | |
| include the text about to be matched (often backed up a huge amount
 | |
| from the actual location of the error) and the line number of the most
 | |
| recently considered character (which is often the real location of the
 | |
| problem).
 | |
| .TP
 | |
| .B rule 'foo' used but not defined
 | |
| The grammar referred to a rule named 'foo' but no definition for it
 | |
| was given.  Attempting to use the generated parser will likely result
 | |
| in errors from the linker due to undefined symbols associated with the
 | |
| missing rule.
 | |
| .TP
 | |
| .B rule 'foo' defined but not used
 | |
| The grammar defined a rule named 'foo' and then ignored it.  The code
 | |
| associated with the rule is included in the generated parser which
 | |
| will in all other respects be healthy.
 | |
| .TP
 | |
| .B possible infinite left recursion in rule 'foo'
 | |
| There exists at least one path through the grammar that leads from the
 | |
| rule 'foo' back to (a recursive invocation of) the same rule without
 | |
| consuming any input.
 | |
| .PP
 | |
| Left recursion, especially that found in standards documents, is
 | |
| often 'direct' and implies trivial repetition.
 | |
| .nf
 | |
| 
 | |
|     # (6.7.6)
 | |
|     direct-abstract-declarator =
 | |
|         LPAREN abstract-declarator RPAREN
 | |
|     |   direct-abstract-declarator? LBRACKET assign-expr? RBRACKET
 | |
|     |   direct-abstract-declarator? LBRACKET STAR RBRACKET
 | |
|     |   direct-abstract-declarator? LPAREN param-type-list? RPAREN
 | |
| 
 | |
| .fi
 | |
| The recursion can easily be eliminated by converting the parts of the
 | |
| pattern following the recursion into a repeatable suffix.
 | |
| .nf
 | |
|     
 | |
|     # (6.7.6)
 | |
|     direct-abstract-declarator =
 | |
|         direct-abstract-declarator-head?
 | |
|         direct-abstract-declarator-tail*
 | |
|     
 | |
|     direct-abstract-declarator-head =
 | |
|         LPAREN abstract-declarator RPAREN
 | |
|     
 | |
|     direct-abstract-declarator-tail =
 | |
|         LBRACKET assign-expr? RBRACKET
 | |
|     |   LBRACKET STAR RBRACKET
 | |
|     |   LPAREN param-type-list? RPAREN
 | |
| 
 | |
| .fi
 | |
| .SH BUGS
 | |
| The 'yy' and 'YY' prefixes cannot be changed.
 | |
| .PP
 | |
| Left recursion is detected in the input grammar but is not handled
 | |
| correctly in the generated parser.
 | |
| .PP
 | |
| Diagnostics for errors in the input grammar are obscure and not
 | |
| particularly helpful.
 | |
| .PP
 | |
| Several commonly-used
 | |
| .IR lex (1)
 | |
| features (yywrap(), yyin, etc.) are completely absent.
 | |
| .PP
 | |
| The generated parser foes not contain '#line' directives to direct C
 | |
| compiler errors back to the grammar description when appropriate.
 | |
| .IR lex (1)
 | |
| features (yywrap(), yyin, etc.) are completely absent.
 | |
| .SH SEE ALSO
 | |
| D. Val Schorre,
 | |
| .I META II, a syntax-oriented compiler writing language,
 | |
| 19th ACM National Conference, 1964, pp.\ 41.301--41.311.  Describes a
 | |
| self-implementing parser generator for analytic grammars with no
 | |
| backtracking.
 | |
| .PP
 | |
| Alexander Birman,
 | |
| .I The TMG Recognition Schema,
 | |
| Ph.D. dissertation, Princeton, 1970.  A mathematical treatment of the
 | |
| power and complexity of recursive-descent parsing with backtracking.
 | |
| .PP
 | |
| Bryan Ford,
 | |
| .I Parsing Expression Grammars: A Recognition-Based Syntactic Foundation,
 | |
| ACM SIGPLAN Symposium on Principles of Programming Languages, 2004.
 | |
| Defines PEGs and analyses them in relation to context-free and regular
 | |
| grammars.  Introduces the syntax adopted in
 | |
| .IR peg .
 | |
| .PP
 | |
| The standard Unix utilies
 | |
| .IR lex (1)
 | |
| and
 | |
| .IR yacc (1)
 | |
| which influenced the syntax and features of
 | |
| .IR leg .
 | |
| .PP
 | |
| The source code for
 | |
| .I peg
 | |
| and
 | |
| .I leg
 | |
| whose grammar parsers are written using themselves.
 | |
| .PP
 | |
| The latest version of this software and documentation:
 | |
| .nf
 | |
| 
 | |
|     http://piumarta.com/software/peg
 | |
| 
 | |
| .fi
 | |
| .SH AUTHOR
 | |
| .IR peg ,
 | |
| .I leg
 | |
| and this manual page were written by Ian Piumarta (first-name at
 | |
| last-name dot com) while investigating the viablility of regular- and
 | |
| parsing-expression grammars for efficiently extracting type and
 | |
| signature information from C header files.
 | |
| .PP
 | |
| Please send bug reports and suggestions for improvements to the author
 | |
| at the above address.
 |