emacs/doc/misc/bovine.texi

480 lines
14 KiB
Plaintext

\input texinfo @c -*-texinfo-*-
@c %**start of header
@setfilename ../../info/bovine.info
@set TITLE Bovine parser development
@set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim
@settitle @value{TITLE}
@include docstyle.texi
@c *************************************************************************
@c @ Header
@c *************************************************************************
@c Merge all indexes into a single index for now.
@c We can always separate them later into two or more as needed.
@syncodeindex vr cp
@syncodeindex fn cp
@syncodeindex ky cp
@syncodeindex pg cp
@syncodeindex tp cp
@c @footnotestyle separate
@c @paragraphindent 2
@c @@smallbook
@c %**end of header
@copying
Copyright @copyright{} 1999--2004, 2012--2024 Free Software Foundation,
Inc.
@quotation
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with no
Invariant Sections, with the Front-Cover Texts being ``A GNU Manual,''
and with the Back-Cover Texts as in (a) below. A copy of the license
is included in the section entitled ``GNU Free Documentation License''.
(a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
modify this GNU manual.''
@end quotation
@end copying
@dircategory Emacs misc features
@direntry
* Bovine: (bovine). Semantic bovine parser development.
@end direntry
@iftex
@finalout
@end iftex
@c @setchapternewpage odd
@c @setchapternewpage off
@titlepage
@sp 10
@title @value{TITLE}
@author by @value{AUTHOR}
@page
@vskip 0pt plus 1 fill
@insertcopying
@end titlepage
@page
@macro semantic{}
@i{Semantic}
@end macro
@c *************************************************************************
@c @ Document
@c *************************************************************************
@contents
@node top
@top @value{TITLE}
The @dfn{bovine} parser is the original @semantic{} parser, and is an
implementation of an @acronym{LL} parser. It is good for simple
languages. It has many conveniences making grammar writing easy. The
conveniences make it less powerful than a Bison-like @acronym{LALR}
parser. For more information, @pxref{Top,, Wisent Parser Development,
wisent}.
Bovine @acronym{LL} grammars are stored in files with a @file{.by}
extension. When compiled, the contents is converted into a file of
the form @file{NAME-by.el}. This, in turn is byte compiled.
@xref{top,, Grammar Framework Manual, grammar-fw}.
@ifnottex
@insertcopying
@end ifnottex
@menu
* Starting Rules:: The starting rules for the grammar.
* Bovine Grammar Rules:: Rules used to parse a language.
* Optional Lambda Expression:: Actions to take when a rule is matched.
* Bovine Examples:: Simple Samples.
* GNU Free Documentation License:: The license for this documentation.
@c * Index::
@end menu
@node Starting Rules
@chapter Starting Rules
In Bison, one and only one nonterminal is designated as the ``start''
symbol. In @semantic{}, one or more nonterminals can be designated as
the ``start'' symbol. They are declared following the @code{%start}
keyword separated by spaces. @xref{start Decl,, Grammar Framework
Manual, grammar-fw}.
If no @code{%start} keyword is used in a grammar, then the very first
is used. Internally the first start nonterminal is targeted by the
reserved symbol @code{bovine-toplevel}, so it can be found by the
parser harness.
To find locally defined variables, the local context handler needs to
parse the body of functional code. The @code{scopestart} declaration
specifies the name of a nonterminal used as the goal to parse a local
context, @pxref{scopestart Decl,, Grammar Framework Manual,
grammar-fw}. Internally the
scopestart nonterminal is targeted by the reserved symbol
@code{bovine-inner-scope}, so it can be found by the parser harness.
@node Bovine Grammar Rules
@chapter Bovine Grammar Rules
The rules are what allow the compiler to create tags from a language
file. Once the setup is done in the prologue, you can start writing
rules. @xref{Grammar Rules,, Grammar Framework Manual, grammar-fw}.
@example
@var{result} : @var{components1} @var{optional-semantic-action1})
| @var{components2} @var{optional-semantic-action2}
;
@end example
@var{result} is a nonterminal, that is a symbol synthesized in your grammar.
@var{components} is a list of elements that are to be matched if @var{result}
is to be made. @var{optional-semantic-action} is an optional sequence
of simplified Emacs Lisp expressions for concocting the parse tree.
In bison, each time an element of @var{components} is found, it is
@dfn{shifted} onto the parser stack. (The stack of matched elements.)
When all @var{components}' elements have been matched, it is
@dfn{reduced} to @var{result}. @xref{Algorithm,,, bison, The GNU Bison Manual}.
A particular @var{result} written into your grammar becomes
the parser's goal. It is designated by a @code{%start} statement
(@pxref{Starting Rules}). The value returned by the associated
@var{optional-semantic-action} is the parser's result. It should be
a tree of @semantic{} @dfn{tags}, @pxref{Semantic Tags,, Semantic
Application Development, semantic-appdev}.
@var{components} is made up of symbols. A symbol such as @code{FOO}
means that a syntactic token of class @code{FOO} must be matched.
@menu
* How Lexical Tokens Match::
* Grammar-to-Lisp Details::
* Order of components in rules::
@end menu
@node How Lexical Tokens Match
@section How Lexical Tokens Match
A lexical rule must be used to define how to match a lexical token.
For instance:
@example
%keyword FOO "foo"
@end example
Means that @code{FOO} is a reserved language keyword, matched as such
by looking up into a keyword table, @pxref{keyword Decl,, Grammar
Framework Manual, grammar-fw}. This is because @code{"foo"} will be
converted to
@code{FOO} in the lexical analysis stage. Thus the symbol @code{FOO}
won't be available any other way.
If we specify our token in this way:
@example
%token <symbol> FOO "foo"
@end example
then @code{FOO} will match the string @code{"foo"} explicitly, but it
won't do so at the lexical level, allowing use of the text
@code{"foo"} in other forms of regular expressions.
In that case, @code{FOO} is a @code{symbol}-type token. To match, a
@code{symbol} must first be encountered, and then it must
@code{string-match "foo"}.
@table @strong
@item Caution:
Be especially careful to remember that @code{"foo"}, and more
generally the %token's match-value string, is a regular expression!
@end table
Non symbol tokens are also allowed. For example:
@example
%token <punctuation> PERIOD "[.]"
filename : symbol PERIOD symbol
;
@end example
@code{PERIOD} is a @code{punctuation}-type token that will explicitly
match one period when used in the above rule.
@table @strong
@item Please Note:
@code{symbol}, @code{punctuation}, etc., are predefined lexical token
types, based on the @dfn{syntax class}-character associations
currently in effect.
@end table
@node Grammar-to-Lisp Details
@section Grammar-to-Lisp Details
For the bovinator, lexical token matching patterns are @emph{inlined}.
When the grammar-to-lisp converter encounters a lexical token
declaration of the form:
@example
%token <@var{type}> @var{token-name} @var{match-value}
@end example
It substitutes every occurrences of @var{token-name} in rules, by its
expanded form:
@example
@var{type} @var{match-value}
@end example
For example:
@example
%token <symbol> MOOSE "moose"
find_a_moose: MOOSE
;
@end example
Will generate this pseudo equivalent-rule:
@example
find_a_moose: symbol "moose" ;; invalid syntax!
;
@end example
Thus, from the bovinator point of view, the @var{components} part of a
rule is made up of symbols and strings. A string in the mix means
that the previous symbol must have the additional constraint of
exactly matching it, as described in @ref{How Lexical Tokens Match}.
@table @strong
@item Please Note:
For the bovinator, this task was mixed into the language definition to
simplify implementation, though Bison's technique is more efficient.
@end table
@node Order of components in rules
@section Order of components in rules
If a rule has multiple components, order is important, for example
@example
headerfile : symbol PERIOD symbol
| symbol
;
@end example
would match @samp{foo.h} or the @acronym{C++} header @samp{foo}.
The bovine parser will first attempt to match the long form, and then
the short form. If they were in reverse order, then the long form
would never be tested.
@c @xref{Default syntactic tokens}.
@node Optional Lambda Expression
@chapter Optional Lambda Expressions
The @acronym{OLE} (@dfn{Optional Lambda Expression}) is converted into
a bovine lambda. This lambda has special short-cuts to simplify
reading the semantic action definition. An @acronym{OLE} like this:
@example
( $1 )
@end example
results in a lambda return which consists entirely of the string
or object found by matching the first (zeroth) element of match.
An @acronym{OLE} like this:
@example
( ,(foo $1) )
@end example
executes @code{foo} on the first argument, and then splices its return
into the return list whereas:
@example
( (foo $1) )
@end example
executes @code{foo}, and that is placed in the return list.
Here are other things that can appear inline:
@table @code
@item $1
The first object matched.
@item ,$1
The first object spliced into the list (assuming it is a list from a
non-terminal).
@item '$1
The first object matched, placed in a list. I.e., @code{( $1 )}.
@item foo
The symbol @code{foo} (exactly as displayed).
@item (foo)
A function call to foo which is stuck into the return list.
@item ,(foo)
A function call to foo which is spliced into the return list.
@item '(foo)
A function call to foo which is stuck into the return list in a list.
@item (EXPAND @var{$1} @var{nonterminal} @var{depth})
A list starting with @code{EXPAND} performs a recursive parse on the
token passed to it (represented by @samp{$1} above.) The
@dfn{semantic list} is a common token to expand, as there are often
interesting things in the list. The @var{nonterminal} is a symbol in
your table which the bovinator will start with when parsing.
@var{nonterminal}'s definition is the same as any other nonterminal.
@var{depth} should be at least @samp{1} when descending into a
semantic list.
@item (EXPANDFULL @var{$1} @var{nonterminal} @var{depth})
Is like @code{EXPAND}, except that the parser will iterate over
@var{nonterminal} until there are no more matches. (The same way the
parser iterates over the starting rule (@pxref{Starting Rules}). This
lets you have much simpler rules in this specific case, and also lets
you have positional information in the returned tokens, and error
skipping.
@item (ASSOC @var{symbol1} @var{value1} @var{symbol2} @var{value2} @dots{})
This is used for creating an association list. Each @var{symbol} is
included in the list if the associated @var{value} is non-@code{nil}.
While the items are all listed explicitly, the created structure is an
association list of the form:
@example
((@var{symbol1} . @var{value1}) (@var{symbol2} . @var{value2}) @dots{})
@end example
@item (TAG @var{name} @var{class} [@var{attributes}])
This creates one tag in the current buffer.
@table @var
@item name
Is a string that represents the tag in the language.
@item class
Is the kind of tag being create, such as @code{function}, or
@code{variable}, though any symbol will work.
@item attributes
Is an optional set of labeled values such as @code{:constant-flag t :parent
"parenttype"}.
@end table
@item (TAG-VARIABLE @var{name} @var{type} @var{default-value} [@var{attributes}])
@itemx (TAG-FUNCTION @var{name} @var{type} @var{arg-list} [@var{attributes}])
@itemx (TAG-TYPE @var{name} @var{type} @var{members} @var{parents} [@var{attributes}])
@itemx (TAG-INCLUDE @var{name} @var{system-flag} [@var{attributes}])
@itemx (TAG-PACKAGE @var{name} @var{detail} [@var{attributes}])
@itemx (TAG-CODE @var{name} @var{detail} [@var{attributes}])
Create a tag with @var{name} of respectively the class
@code{variable}, @code{function}, @code{type}, @code{include},
@code{package}, and @code{code}.
See @ref{Creating Tags,, Semantic Application Development,
semantic-appdev}, for the lisp functions these translate into.
@end table
If the symbol @code{%quotemode backquote} is specified, then use
@code{,@@} to splice a list in, and @code{,} to evaluate the expression.
This lets you send @code{$1} as a symbol into a list instead of having
it expanded inline.
@node Bovine Examples
@chapter Examples
The rule:
@example
any-symbol: symbol
;
@end example
is equivalent to
@example
any-symbol: symbol
( $1 )
;
@end example
which, if it matched the string @samp{"A"}, would return
@example
( "A" )
@end example
If this rule were used like this:
@example
%token <punctuation> EQUAL "="
@dots{}
assign: any-symbol EQUAL any-symbol
( $1 $3 )
;
@end example
it would match @samp{"A=B"}, and return
@example
( ("A") ("B") )
@end example
The letters @samp{A} and @samp{B} come back in lists because
@samp{any-symbol} is a nonterminal, not an actual lexical element.
To get a better result with nonterminals, use @asis{,} to splice lists
in like this:
@example
%token <punctuation> EQUAL "="
@dots{}
assign: any-symbol EQUAL any-symbol
( ,$1 ,$3 )
;
@end example
which would return
@example
( "A" "B" )
@end example
@node GNU Free Documentation License
@appendix GNU Free Documentation License
@include doclicense.texi
@c There is nothing to index at the moment.
@ignore
@node Index
@unnumbered Index
@printindex cp
@end ignore
@iftex
@contents
@summarycontents
@end iftex
@bye
@c Following comments are for the benefit of ispell.
@c LocalWords: bovinator inlined