2321 lines
83 KiB
Plaintext
2321 lines
83 KiB
Plaintext
@c -*- mode: texinfo -*-
|
|
@c This is part of the GNU Emacs Lisp Reference Manual.
|
|
@c Copyright (C) 2021--2024 Free Software Foundation, Inc.
|
|
@c See the file elisp.texi for copying conditions.
|
|
@node Parsing Program Source
|
|
@chapter Parsing Program Source
|
|
@cindex parsing program source
|
|
|
|
@cindex syntax tree, from parsing program source
|
|
Emacs provides various ways to parse program source text and produce a
|
|
@dfn{syntax tree}. In a syntax tree, text is no longer considered a
|
|
one-dimensional stream of characters, but a structured tree of nodes,
|
|
where each node represents a piece of text. Thus, a syntax tree can
|
|
enable interesting features like precise fontification, indentation,
|
|
navigation, structured editing, etc.
|
|
|
|
Emacs has a simple facility for parsing balanced expressions
|
|
(@pxref{Parsing Expressions}). There is also the SMIE library for
|
|
generic navigation and indentation (@pxref{SMIE}).
|
|
|
|
In addition to those, Emacs also provides integration with
|
|
@uref{https://tree-sitter.github.io/tree-sitter, the tree-sitter
|
|
library} if support for it was compiled in. The tree-sitter library
|
|
implements an incremental parser and has support for a wide range of
|
|
programming languages.
|
|
|
|
@defun treesit-available-p
|
|
This function returns non-@code{nil} if tree-sitter features are
|
|
available for the current Emacs session.
|
|
@end defun
|
|
|
|
To be able to parse the program source using the tree-sitter library
|
|
and access the syntax tree of the program, a Lisp program needs to
|
|
load a language grammar library, and create a parser for that
|
|
language and the current buffer. After that, the Lisp program can
|
|
query the parser about specific nodes of the syntax tree. Then, it
|
|
can access various kinds of information about each node, and search
|
|
for nodes using a powerful pattern-matching syntax. This chapter
|
|
explains how to do all this, and also how a Lisp program can work with
|
|
source files that mix multiple programming languages.
|
|
|
|
@menu
|
|
* Language Grammar:: Loading tree-sitter language grammar.
|
|
* Using Parser:: Introduction to parsers.
|
|
* Retrieving Nodes:: Retrieving nodes from a syntax tree.
|
|
* Accessing Node Information:: Accessing node information.
|
|
* Pattern Matching:: Pattern matching with query patterns.
|
|
* User-defined Things:: User-defined ``Things'' and Navigation.
|
|
* Multiple Languages:: Parse text written in multiple languages.
|
|
* Tree-sitter Major Modes:: Develop major modes using tree-sitter.
|
|
* Tree-sitter C API:: Compare the C API and the ELisp API.
|
|
@end menu
|
|
|
|
@node Language Grammar
|
|
@section Tree-sitter Language Grammar
|
|
@cindex language grammar, for tree-sitter
|
|
|
|
@heading Loading a language grammar
|
|
@cindex loading language grammar for tree-sitter
|
|
|
|
@cindex language argument, for tree-sitter
|
|
Tree-sitter relies on language grammar to parse text in that
|
|
language. In Emacs, a language grammar is represented by a symbol.
|
|
For example, the C language grammar is represented as the symbol
|
|
@code{c}, and @code{c} can be passed to tree-sitter functions as the
|
|
@var{language} argument.
|
|
|
|
@vindex treesit-extra-load-path
|
|
@vindex treesit-load-language-error
|
|
Tree-sitter language grammars are distributed as dynamic libraries.
|
|
In order to use a language grammar in Emacs, you need to make sure
|
|
that the dynamic library is installed on the system. Emacs looks for
|
|
language grammars in several places, in the following order:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
first, in the list of directories specified by the variable
|
|
@code{treesit-extra-load-path};
|
|
@item
|
|
then, in the @file{tree-sitter} subdirectory of the directory
|
|
specified by @code{user-emacs-directory} (@pxref{Init File});
|
|
@item
|
|
and finally, in the system's default locations for dynamic libraries.
|
|
@end itemize
|
|
|
|
In each of these directories, Emacs looks for a file with file-name
|
|
extensions specified by the variable @code{dynamic-library-suffixes}.
|
|
|
|
If Emacs cannot find the library or has problems loading it, Emacs
|
|
signals the @code{treesit-load-language-error} error. The data of
|
|
that signal could be one of the following:
|
|
|
|
@table @code
|
|
@item (not-found @var{error-msg} @dots{})
|
|
This means that Emacs could not find the language grammar library.
|
|
@item (symbol-error @var{error-msg})
|
|
This means that Emacs could not find in the library the expected function
|
|
that every language grammar library should export.
|
|
@item (version-mismatch @var{error-msg})
|
|
This means that the version of the language grammar library is
|
|
incompatible with that of the tree-sitter library.
|
|
@end table
|
|
|
|
@noindent
|
|
In all of these cases, @var{error-msg} might provide additional
|
|
details about the failure.
|
|
|
|
@defun treesit-language-available-p language &optional detail
|
|
This function returns non-@code{nil} if the language grammar for
|
|
@var{language} exists and can be loaded.
|
|
|
|
If @var{detail} is non-@code{nil}, return @code{(t . nil)} when
|
|
@var{language} is available, and @code{(nil . @var{data})} when it's
|
|
unavailable. @var{data} is the signal data of
|
|
@code{treesit-load-language-error}.
|
|
@end defun
|
|
|
|
@vindex treesit-load-name-override-list
|
|
By convention, the file name of the dynamic library for @var{language} is
|
|
@file{libtree-sitter-@var{language}.@var{ext}}, where @var{ext} is the
|
|
system-specific extension for dynamic libraries. Also by convention,
|
|
the function provided by that library is named
|
|
@code{tree_sitter_@var{language}}. If a language grammar library
|
|
doesn't follow this convention, you should add an entry
|
|
|
|
@example
|
|
(@var{language} @var{library-base-name} @var{function-name})
|
|
@end example
|
|
|
|
to the list in the variable @code{treesit-load-name-override-list}, where
|
|
@var{library-base-name} is the basename of the dynamic library's file name
|
|
(usually, @file{libtree-sitter-@var{language}}), and
|
|
@var{function-name} is the function provided by the library
|
|
(usually, @code{tree_sitter_@var{language}}). For example,
|
|
|
|
@example
|
|
(cool-lang "libtree-sitter-coool" "tree_sitter_cooool")
|
|
@end example
|
|
|
|
@noindent
|
|
for a language that considers itself too ``cool'' to abide by
|
|
conventions.
|
|
|
|
@cindex language grammar version, compatibility
|
|
@defun treesit-library-abi-version &optional min-compatible
|
|
This function returns the version of the language grammar
|
|
Application Binary Interface (@acronym{ABI}) supported by the
|
|
tree-sitter library. By default, it returns the latest ABI version
|
|
supported by the library, but if @var{min-compatible} is
|
|
non-@code{nil}, it returns the oldest ABI version which the library
|
|
still can support. Language grammar libraries must be built for
|
|
ABI versions between the oldest and the latest versions supported by
|
|
the tree-sitter library, otherwise the library will be unable to load
|
|
them.
|
|
@end defun
|
|
|
|
@defun treesit-language-abi-version language
|
|
This function returns the @acronym{ABI} version of the language
|
|
grammar library loaded by Emacs for @var{language}. If @var{language}
|
|
is unavailable, this function returns @code{nil}.
|
|
@end defun
|
|
|
|
@heading Concrete syntax tree
|
|
@cindex syntax tree, concrete
|
|
|
|
A syntax tree is what a parser generates. In a syntax tree, each node
|
|
represents a piece of text, and is connected to each other by a
|
|
parent-child relationship. For example, if the source text is
|
|
|
|
@example
|
|
1 + 2
|
|
@end example
|
|
|
|
@noindent
|
|
its syntax tree could be
|
|
|
|
@example
|
|
@group
|
|
+--------------+
|
|
| root "1 + 2" |
|
|
+--------------+
|
|
|
|
|
+--------------------------------+
|
|
| expression "1 + 2" |
|
|
+--------------------------------+
|
|
| | |
|
|
+------------+ +--------------+ +------------+
|
|
| number "1" | | operator "+" | | number "2" |
|
|
+------------+ +--------------+ +------------+
|
|
@end group
|
|
@end example
|
|
|
|
We can also represent it as an s-expression:
|
|
|
|
@example
|
|
(root (expression (number) (operator) (number)))
|
|
@end example
|
|
|
|
@subheading Node types
|
|
@cindex node types, in a syntax tree
|
|
|
|
@cindex type of node, tree-sitter
|
|
@anchor{tree-sitter node type}
|
|
@cindex named node, tree-sitter
|
|
@anchor{tree-sitter named node}
|
|
@cindex anonymous node, tree-sitter
|
|
Names like @code{root}, @code{expression}, @code{number}, and
|
|
@code{operator} specify the @dfn{type} of the nodes. However, not all
|
|
nodes in a syntax tree have a type. Nodes that don't have a type are
|
|
known as @dfn{anonymous nodes}, and nodes with a type are @dfn{named
|
|
nodes}. Anonymous nodes are tokens with fixed spellings, including
|
|
punctuation characters like bracket @samp{]}, and keywords like
|
|
@code{return}.
|
|
|
|
@subheading Field names
|
|
|
|
@cindex field name, tree-sitter
|
|
@cindex tree-sitter node field name
|
|
@anchor{tree-sitter node field name}
|
|
To make the syntax tree easier to analyze, many language grammar
|
|
assign @dfn{field names} to child nodes. For example, a
|
|
@code{function_definition} node could have a @code{declarator} and a
|
|
@code{body}:
|
|
|
|
@example
|
|
@group
|
|
(function_definition
|
|
declarator: (declaration)
|
|
body: (compound_statement))
|
|
@end group
|
|
@end example
|
|
|
|
@heading Exploring the syntax tree
|
|
@cindex explore tree-sitter syntax tree
|
|
@cindex inspection of tree-sitter parse tree nodes
|
|
|
|
To aid in understanding the syntax of a language and in debugging Lisp
|
|
programs that use the syntax tree, Emacs provides an ``explore'' mode,
|
|
which displays the syntax tree of the source in the current buffer in
|
|
real time. Emacs also comes with an ``inspect mode'', which displays
|
|
information of the nodes at point in the mode-line.
|
|
|
|
@deffn Command treesit-explore-mode
|
|
This mode pops up a window displaying the syntax tree of the source in
|
|
the current buffer. Selecting text in the source buffer highlights
|
|
the corresponding nodes in the syntax tree display. Clicking
|
|
on nodes in the syntax tree highlights the corresponding text in the
|
|
source buffer.
|
|
@end deffn
|
|
|
|
@deffn Command treesit-inspect-mode
|
|
This minor mode displays on the mode-line the node that @emph{starts}
|
|
at point. For example, the mode-line can display
|
|
|
|
@example
|
|
@var{parent} @var{field}: (@var{node} (@var{child} (@dots{})))
|
|
@end example
|
|
|
|
@noindent
|
|
where @var{node}, @var{child}, etc., are nodes which begin at point.
|
|
@var{parent} is the parent of @var{node}. @var{node} is displayed in
|
|
a bold typeface. @var{field-name}s are field names of @var{node} and
|
|
of @var{child}, etc.
|
|
|
|
If no node starts at point, i.e., point is in the middle of a node,
|
|
then the mode line displays the earliest node that spans point, and
|
|
its immediate parent.
|
|
|
|
This minor mode doesn't create parsers on its own. It uses the first
|
|
parser in @code{(treesit-parser-list)} (@pxref{Using Parser}).
|
|
@end deffn
|
|
|
|
@heading Reading the grammar definition
|
|
@cindex reading grammar definition, tree-sitter
|
|
|
|
Authors of language grammars define the @dfn{grammar} of a
|
|
programming language, which determines how a parser constructs a
|
|
concrete syntax tree out of the program text. In order to use the
|
|
syntax tree effectively, you need to consult the @dfn{grammar file}.
|
|
|
|
The grammar file is usually @file{grammar.js} in a language
|
|
grammar's project repository. The link to a language grammar's
|
|
home page can be found on
|
|
@uref{https://tree-sitter.github.io/tree-sitter, tree-sitter's
|
|
homepage}.
|
|
|
|
The grammar definition is written in JavaScript. For example, the
|
|
rule matching a @code{function_definition} node may look like
|
|
|
|
@example
|
|
@group
|
|
function_definition: $ => seq(
|
|
$.declaration_specifiers,
|
|
field('declarator', $.declaration),
|
|
field('body', $.compound_statement)
|
|
)
|
|
@end group
|
|
@end example
|
|
|
|
@noindent
|
|
The rules are represented by functions that take a single argument
|
|
@var{$}, representing the whole grammar. The function itself is
|
|
constructed by other functions: the @code{seq} function puts together
|
|
a sequence of children; the @code{field} function annotates a child
|
|
with a field name. If we write the above definition in the so-called
|
|
@dfn{Backus-Naur Form} (@acronym{BNF}) syntax, it would look like
|
|
|
|
@example
|
|
@group
|
|
function_definition :=
|
|
<declaration_specifiers> <declaration> <compound_statement>
|
|
@end group
|
|
@end example
|
|
|
|
@noindent
|
|
and the node returned by the parser would look like
|
|
|
|
@example
|
|
@group
|
|
(function_definition
|
|
(declaration_specifier)
|
|
declarator: (declaration)
|
|
body: (compound_statement))
|
|
@end group
|
|
@end example
|
|
|
|
Below is a list of functions that one can see in a grammar definition.
|
|
Each function takes other rules as arguments and returns a new rule.
|
|
|
|
@table @code
|
|
@item seq(@var{rule1}, @var{rule2}, @dots{})
|
|
matches each rule one after another.
|
|
@item choice(@var{rule1}, @var{rule2}, @dots{})
|
|
matches one of the rules in its arguments.
|
|
@item repeat(@var{rule})
|
|
matches @var{rule} @emph{zero or more} times.
|
|
This is like the @samp{*} operator in regular expressions.
|
|
@item repeat1(@var{rule})
|
|
matches @var{rule} @emph{one or more} times.
|
|
This is like the @samp{+} operator in regular expressions.
|
|
@item optional(@var{rule})
|
|
matches @var{rule} @emph{zero or one} times.
|
|
This is like the @samp{?} operator in regular expressions.
|
|
@item field(@var{name}, @var{rule})
|
|
assigns field name @var{name} to the child node matched by @var{rule}.
|
|
@item alias(@var{rule}, @var{alias})
|
|
makes nodes matched by @var{rule} appear as @var{alias} in the syntax
|
|
tree generated by the parser. For example,
|
|
|
|
@example
|
|
alias(preprocessor_call_exp, call_expression)
|
|
@end example
|
|
|
|
@noindent
|
|
makes any node matched by @code{preprocessor_call_exp} appear as
|
|
@code{call_expression}.
|
|
@end table
|
|
|
|
Below are grammar functions of lesser importance for reading a
|
|
language grammar.
|
|
|
|
@table @code
|
|
@item token(@var{rule})
|
|
marks @var{rule} to produce a single leaf node. That is, instead of
|
|
generating a parent node with individual child nodes under it,
|
|
everything is combined into a single leaf node. @xref{Retrieving
|
|
Nodes}.
|
|
@item token.immediate(@var{rule})
|
|
Normally, grammar rules ignore preceding whitespace; this
|
|
changes @var{rule} to match only when there is no preceding
|
|
whitespace.
|
|
@item prec(@var{n}, @var{rule})
|
|
gives @var{rule} the level-@var{n} precedence.
|
|
@item prec.left([@var{n},] @var{rule})
|
|
marks @var{rule} as left-associative, optionally with level @var{n}.
|
|
@item prec.right([@var{n},] @var{rule})
|
|
marks @var{rule} as right-associative, optionally with level @var{n}.
|
|
@item prec.dynamic(@var{n}, @var{rule})
|
|
this is like @code{prec}, but the precedence is applied at runtime
|
|
instead.
|
|
@end table
|
|
|
|
The documentation of the tree-sitter project has
|
|
@uref{https://tree-sitter.github.io/tree-sitter/creating-parsers, more
|
|
about writing a grammar}. Read especially ``The Grammar DSL''
|
|
section.
|
|
|
|
@node Using Parser
|
|
@section Using Tree-sitter Parser
|
|
@cindex tree-sitter parser, using
|
|
|
|
This section describes how to create and configure a tree-sitter
|
|
parser. In Emacs, each tree-sitter parser is associated with a
|
|
buffer. As the user edits the buffer, the associated parser and
|
|
syntax tree are automatically kept up-to-date.
|
|
|
|
@defvar treesit-max-buffer-size
|
|
This variable contains the maximum size of buffers in which
|
|
tree-sitter can be activated. Major modes should check this value
|
|
when deciding whether to enable tree-sitter features.
|
|
@end defvar
|
|
|
|
@cindex creating tree-sitter parsers
|
|
@cindex tree-sitter parser, creating
|
|
@defun treesit-parser-create language &optional buffer no-reuse tag
|
|
Create a parser for the specified @var{buffer} and @var{language}
|
|
(@pxref{Language Grammar}), with @var{tag}. If @var{buffer} is
|
|
omitted or @code{nil}, it stands for the current buffer.
|
|
|
|
By default, this function reuses a parser if one already exists for
|
|
@var{language} with @var{tag} in @var{buffer}, but if @var{no-reuse}
|
|
is non-@code{nil}, this function always creates a new parser.
|
|
|
|
@var{tag} can be any symbol except @code{t}, and defaults to
|
|
@code{nil}. Different parsers can have the same tag.
|
|
@end defun
|
|
|
|
Given a parser, we can query information about it.
|
|
|
|
@defun treesit-parser-buffer parser
|
|
This function returns the buffer associated with @var{parser}.
|
|
@end defun
|
|
|
|
@defun treesit-parser-language parser
|
|
This function returns the language used by @var{parser}.
|
|
@end defun
|
|
|
|
@defun treesit-parser-p object
|
|
This function checks if @var{object} is a tree-sitter parser, and
|
|
returns non-@code{nil} if it is, and @code{nil} otherwise.
|
|
@end defun
|
|
|
|
There is no need to explicitly parse a buffer, because parsing is done
|
|
automatically and lazily. A parser only parses when a Lisp program
|
|
queries for a node in its syntax tree. Therefore, when a parser is
|
|
first created, it doesn't parse the buffer; it waits until the Lisp
|
|
program queries for a node for the first time. Similarly, when some
|
|
change is made in the buffer, a parser doesn't re-parse immediately.
|
|
|
|
@vindex treesit-buffer-too-large
|
|
When a parser does parse, it checks for the size of the buffer.
|
|
Tree-sitter can only handle buffers no larger than about 4GB@. If the
|
|
size exceeds that, Emacs signals the @code{treesit-buffer-too-large}
|
|
error with signal data being the buffer size.
|
|
|
|
Once a parser is created, Emacs automatically adds it to the
|
|
internal parser list. Every time a change is made to the buffer,
|
|
Emacs updates parsers in this list so they can update their syntax
|
|
tree incrementally.
|
|
|
|
@defun treesit-parser-list &optional buffer language tag
|
|
This function returns the parser list of @var{buffer}, filtered by
|
|
@var{language} and @var{tag}. If @var{buffer} is @code{nil} or
|
|
omitted, it defaults to the current buffer.
|
|
|
|
If @var{language} is non-@var{nil}, only include parsers for that
|
|
language, and only include parsers with @var{tag}. @var{tag} defaults
|
|
to @code{nil}. If @var{tag} is @code{t}, include parsers in the
|
|
returned list regardless of their tag.
|
|
@end defun
|
|
|
|
@defun treesit-parser-delete parser
|
|
This function deletes @var{parser}.
|
|
@end defun
|
|
|
|
@cindex tree-sitter narrowing
|
|
@anchor{tree-sitter narrowing}
|
|
Normally, a parser ``sees'' the whole buffer, but when the buffer is
|
|
narrowed (@pxref{Narrowing}), the parser will only see the accessible
|
|
portion of the buffer. As far as the parser can tell, the hidden
|
|
region was deleted. When the buffer is later widened, the parser
|
|
thinks text is inserted at the beginning and at the end. Although
|
|
parsers respect narrowing, modes should not use narrowing as a means
|
|
to handle a multi-language buffer; instead, set the ranges in which the
|
|
parser should operate. @xref{Multiple Languages}.
|
|
|
|
Because a parser parses lazily, when the user or a Lisp program
|
|
narrows the buffer, the parser is not affected immediately; as long as
|
|
the mode doesn't query for a node while the buffer is narrowed, the
|
|
parser is oblivious of the narrowing.
|
|
|
|
@cindex tree-sitter parse string
|
|
@cindex parse string, tree-sitter
|
|
Besides creating a parser for a buffer, a Lisp program can also parse a
|
|
string. Unlike a buffer, parsing a string is a one-off operation, and
|
|
there is no way to update the result.
|
|
|
|
@defun treesit-parse-string string language
|
|
This function parses @var{string} using @var{language}, and returns the
|
|
root node of the generated syntax tree. @emph{Do not} use this function
|
|
in a loop: this is a convenience function intended for one-off use, and
|
|
it isn't optimized; for heavy workload, use a temporary buffer instead.
|
|
@end defun
|
|
|
|
@heading Be notified by changes to the parse tree
|
|
@cindex update callback, for tree-sitter parse-tree
|
|
@cindex after-change notifier, for tree-sitter parse-tree
|
|
@cindex tree-sitter parse-tree, update and after-change callback
|
|
@cindex notifiers, tree-sitter
|
|
|
|
A Lisp program might want to be notified of text affected by
|
|
incremental parsing. For example, inserting a comment-closing token
|
|
converts text before that token into a comment. Even
|
|
though the text is not directly edited, it is deemed to be ``changed''
|
|
nevertheless.
|
|
|
|
Emacs lets a Lisp program register callback functions (a.k.a.@:
|
|
@dfn{notifiers}) for these kinds of changes. A notifier function
|
|
takes two arguments: @var{ranges} and @var{parser}. @var{ranges} is a
|
|
list of cons cells of the form @w{@code{(@var{start} . @var{end})}},
|
|
where @var{start} and @var{end} mark the start and the end positions
|
|
of a range. @var{parser} is the parser issuing the notification.
|
|
|
|
Every time a parser reparses a buffer, it compares the old and new
|
|
parse-tree, computes the ranges in which nodes have changed, and
|
|
passes the ranges to notifier functions. Note that the initial parse
|
|
is also considered a ``change'', so notifier functions are called on
|
|
the initial parse, with range being the whole buffer.
|
|
|
|
@defun treesit-parser-add-notifier parser function
|
|
This function adds @var{function} to @var{parser}'s list of
|
|
after-change notifier functions. @var{function} must be a function
|
|
symbol, not a lambda function (@pxref{Anonymous Functions}).
|
|
@end defun
|
|
|
|
@defun treesit-parser-remove-notifier parser function
|
|
This function removes @var{function} from the list of @var{parser}'s
|
|
after-change notifier functions. @var{function} must be a function
|
|
symbol, rather than a lambda function.
|
|
@end defun
|
|
|
|
@defun treesit-parser-notifiers parser
|
|
This function returns the list of @var{parser}'s notifier functions.
|
|
@end defun
|
|
|
|
@heading Substitute parser for another language
|
|
@cindex remap language grammar, tree-sitter
|
|
@cindex replace language grammar, tree-sitter
|
|
@cindex replace parser language, tree-sitter
|
|
@cindex extended grammar, tree-sitter
|
|
|
|
Sometimes, a grammar for language B is a strict superset of the grammar
|
|
of another language A. Then it makes sense to reuse configurations
|
|
(font-lock rules, indentation rules, etc.) of language A for language B.
|
|
For that purpose, @var{treesit-language-remap-alist} allows users to
|
|
remap language A into language B.
|
|
|
|
@defvar treesit-language-remap-alist
|
|
The value of this variable should be an alist of
|
|
@w{@code{(@var{language-a} . @var{language-b})}}. When such pair exists
|
|
in the alist, creating a parser for @var{language-a} actually creates a
|
|
parser for @var{language-b}. By extension, anything that creates a node
|
|
or makes a query of @var{language-a} will be redirected to use
|
|
@var{language-b} instead.
|
|
|
|
Note that calling @code{treesit-parser-language} on a parser for
|
|
@var{language-a} still returns @var{language-a}.
|
|
@end defvar
|
|
|
|
@node Retrieving Nodes
|
|
@section Retrieving Nodes
|
|
@cindex retrieve node, tree-sitter
|
|
@cindex tree-sitter, find node
|
|
@cindex get node, tree-sitter
|
|
|
|
@cindex terminology, for tree-sitter functions
|
|
Here are some terms and conventions we use when documenting
|
|
tree-sitter functions.
|
|
|
|
A node in a syntax tree spans some portion of the program text in the
|
|
buffer. We say that a node is ``smaller'' or ``larger'' than another
|
|
if it spans, respectively, a smaller or larger portion of buffer text
|
|
than the other node. Since nodes that are deeper (``lower'') in the
|
|
tree are children of the nodes that are ``higher'' in the tree, it
|
|
follows that a lower node will always be smaller than a node that is
|
|
higher in the node hierarchy. A node that is higher up in the syntax
|
|
tree contains one or more smaller nodes as its children, and therefore
|
|
spans a larger portion of buffer text.
|
|
|
|
When a function cannot find a node, it returns @code{nil}. For
|
|
convenience, all functions that take a node as argument and return
|
|
a node, also accept the node argument of @code{nil} and in that case
|
|
just return @code{nil}.
|
|
|
|
@vindex treesit-node-outdated
|
|
Nodes are not automatically updated when the associated buffer is
|
|
modified, and there is no way to update a node once it is retrieved.
|
|
Using an outdated node signals the @code{treesit-node-outdated} error.
|
|
|
|
@heading Retrieving nodes from syntax tree
|
|
@cindex retrieving tree-sitter nodes
|
|
@cindex syntax tree, retrieving nodes
|
|
|
|
@cindex leaf node, of tree-sitter parse tree
|
|
@cindex tree-sitter parse tree, leaf node
|
|
@defun treesit-node-at pos &optional parser-or-lang named
|
|
This function returns a @dfn{leaf} node at buffer position @var{pos}.
|
|
A leaf node is a node that doesn't have any child nodes.
|
|
|
|
This function tries to return a node whose span covers @var{pos}: the
|
|
node's beginning position is less than or equal to @var{pos}, and the
|
|
node's end position is greater than or equal to @var{pos}.
|
|
|
|
If no leaf node's span covers @var{pos} (e.g., @var{pos} is in the
|
|
whitespace between two leaf nodes), this function returns the first
|
|
leaf node after @var{pos}.
|
|
|
|
Finally, if there is no leaf node after @var{pos}, return the first
|
|
leaf node before @var{pos}.
|
|
|
|
If @var{parser-or-lang} is a parser object, this function uses that
|
|
parser; if @var{parser-or-lang} is a language, this function uses the
|
|
first parser for that language in the current buffer, or creates one
|
|
if none exists; if @var{parser-or-lang} is @code{nil}, this function
|
|
tries to guess the language at @var{pos} by calling
|
|
@code{treesit-language-at} (@pxref{Multiple Languages}).
|
|
|
|
If this function cannot find a suitable node to return, it returns
|
|
@code{nil}.
|
|
|
|
If @var{named} is non-@code{nil}, this function looks only for named
|
|
nodes (@pxref{tree-sitter named node, named node}).
|
|
|
|
Example:
|
|
|
|
@example
|
|
@group
|
|
;; Find the node at point in a C parser's syntax tree.
|
|
(treesit-node-at (point) 'c)
|
|
@result{} #<treesit-node (primitive_type) in 23-27>
|
|
@end group
|
|
@end example
|
|
@end defun
|
|
|
|
@defun treesit-node-on beg end &optional parser-or-lang named
|
|
This function returns the @emph{smallest} node that covers the region
|
|
of buffer text between @var{beg} and @var{end}. In other words, the
|
|
start of the node is before or at @var{beg}, and the end of the node
|
|
is at or after @var{end}.
|
|
|
|
@emph{Beware:} calling this function on an empty line that is not
|
|
inside any top-level construct (function definition, etc.@:) most
|
|
probably will give you the root node, because the root node is the
|
|
smallest node that covers that empty line. Most of the time, you want
|
|
to use @code{treesit-node-at} instead.
|
|
|
|
If @var{parser-or-lang} is a parser object, this function uses that
|
|
parser; if @var{parser-or-lang} is a language, this function uses the
|
|
first parser for that language in the current buffer, or creates one
|
|
if none exists; if @var{parser-or-lang} is @code{nil}, this function
|
|
tries to guess the language at @var{beg} by calling
|
|
@code{treesit-language-at}.
|
|
|
|
If @var{named} is non-@code{nil}, this function looks for a named node
|
|
only (@pxref{tree-sitter named node, named node}).
|
|
@end defun
|
|
|
|
@defun treesit-parser-root-node parser
|
|
This function returns the root node of the syntax tree generated by
|
|
@var{parser}.
|
|
@end defun
|
|
|
|
@defun treesit-buffer-root-node &optional language
|
|
This function finds the first parser for @var{language} in the current
|
|
buffer, or creates one if none exists, and returns the root node
|
|
generated by that parser. If @var{language} is omitted, it uses the
|
|
first parser in the parser list. If it cannot find an appropriate
|
|
parser, it returns @code{nil}.
|
|
@end defun
|
|
|
|
Given a node, a Lisp program can retrieve other nodes starting from
|
|
it, or query for information about this node.
|
|
|
|
@heading Retrieving nodes from other nodes
|
|
@cindex syntax tree nodes, retrieving from other nodes
|
|
|
|
@subheading By kinship
|
|
@cindex kinship, syntax tree nodes
|
|
@cindex nodes, by kinship
|
|
@cindex syntax tree nodes, by kinship
|
|
|
|
@defun treesit-node-parent node
|
|
This function returns the immediate parent of @var{node}.
|
|
|
|
If @var{node} is more than 1000 levels deep in a parse tree, the
|
|
return value is undefined. Currently it returns @code{nil}, but that
|
|
could change in the future.
|
|
@end defun
|
|
|
|
@defun treesit-node-child node n &optional named
|
|
This function returns the @var{n}'th child of @var{node}. If
|
|
@var{named} is non-@code{nil}, it counts only named nodes
|
|
(@pxref{tree-sitter named node, named node}).
|
|
|
|
For example, in a node that represents a string @code{"text"}, there
|
|
are three children nodes: the opening quote @code{"}, the string text
|
|
@code{text}, and the closing quote @code{"}. Among these nodes, the
|
|
first child is the opening quote @code{"}, and the first named child
|
|
is the string text.
|
|
|
|
This function returns @code{nil} if there is no @var{n}'th child.
|
|
@var{n} could be negative, e.g., @minus{}1 represents the last child.
|
|
@end defun
|
|
|
|
@defun treesit-node-children node &optional named
|
|
This function returns all of @var{node}'s children as a list. If
|
|
@var{named} is non-@code{nil}, it retrieves only named nodes.
|
|
@end defun
|
|
|
|
@defun treesit-node-next-sibling node &optional named
|
|
This function finds the next sibling of @var{node}. If @var{named} is
|
|
non-@code{nil}, it finds the next named sibling.
|
|
@end defun
|
|
|
|
@defun treesit-node-prev-sibling node &optional named
|
|
This function finds the previous sibling of @var{node}. If
|
|
@var{named} is non-@code{nil}, it finds the previous named sibling.
|
|
@end defun
|
|
|
|
@subheading By field name
|
|
@cindex nodes, by field name
|
|
@cindex syntax tree nodes, by field name
|
|
|
|
To make the syntax tree easier to analyze, many language grammars
|
|
assign @dfn{field names} to child nodes (@pxref{tree-sitter node field
|
|
name, field name}). For example, a @code{function_definition} node
|
|
could have a @code{declarator} child and a @code{body} child.
|
|
|
|
@defun treesit-node-child-by-field-name node field-name
|
|
This function finds the child of @var{node} whose field name is
|
|
@var{field-name}, a string.
|
|
|
|
@example
|
|
@group
|
|
;; Get the child that has "body" as its field name.
|
|
(treesit-node-child-by-field-name node "body")
|
|
@result{} #<treesit-node (compound_statement) in 45-89>
|
|
@end group
|
|
@end example
|
|
@end defun
|
|
|
|
@subheading By position
|
|
@cindex nodes, by position
|
|
@cindex syntax tree nodes, by position
|
|
|
|
@defun treesit-node-first-child-for-pos node pos &optional named
|
|
This function finds the first child of @var{node} that extends beyond
|
|
buffer position @var{pos}. ``Extends beyond'' means the end of the
|
|
child node is greater or equal to @var{pos}. This function only looks
|
|
for immediate children of @var{node}, and doesn't look in its
|
|
grandchildren. If @var{named} is non-@code{nil}, it looks for the
|
|
first named child (@pxref{tree-sitter named node, named node}).
|
|
@end defun
|
|
|
|
@defun treesit-node-descendant-for-range node beg end &optional named
|
|
This function finds the @emph{smallest} descendant node of @var{node}
|
|
that spans the region of text between positions @var{beg} and
|
|
@var{end}. It is similar to @code{treesit-node-at}. If @var{named}
|
|
is non-@code{nil}, it looks for the smallest named child.
|
|
@end defun
|
|
|
|
@heading Searching for node
|
|
|
|
@defun treesit-search-subtree node predicate &optional backward all depth
|
|
This function traverses the subtree of @var{node} (including @var{node}
|
|
itself), looking for a node for which @var{predicate} returns
|
|
non-@code{nil}. @var{predicate} is a regexp that is matched against
|
|
each node's type, or a predicate function that takes a node and returns
|
|
non-@code{nil} if the node matches. @var{predicate} can also be a thing
|
|
symbol or thing definition (@pxref{User-defined Things}). Using an
|
|
undefined thing doesn't raise an error, the function simply returns
|
|
@code{nil}.
|
|
|
|
This function returns the first node that matches, or @code{nil} if none
|
|
matches @var{predicate}.
|
|
|
|
By default, this function only traverses named nodes, but if @var{all}
|
|
is non-@code{nil}, it traverses all the nodes. If @var{backward} is
|
|
non-@code{nil}, it traverses backwards (i.e., it visits the last child
|
|
first when traversing down the tree). If @var{depth} is
|
|
non-@code{nil}, it must be a number that limits the tree traversal to
|
|
that many levels down the tree. If @var{depth} is @code{nil}, it
|
|
defaults to 1000.
|
|
@end defun
|
|
|
|
@defun treesit-search-forward start predicate &optional backward all
|
|
Like @code{treesit-search-subtree}, this function also traverses the
|
|
parse tree and matches each node with @var{predicate} (except for
|
|
@var{start}), where @var{predicate} can be a regexp or a predicate
|
|
function. @var{predicate} can also be a thing symbol or thing
|
|
definition (@pxref{User-defined Things}). Using an undefined thing
|
|
doesn't raise an error, the function simply returns @code{nil}.
|
|
|
|
For a tree like the one below where @var{start} is marked @samp{S}, this
|
|
function traverses as numbered from 1 to 12:
|
|
|
|
@example
|
|
@group
|
|
12
|
|
|
|
|
S--------3----------11
|
|
| | |
|
|
o--o-+--o 1--+--2 6--+-----10
|
|
| | | |
|
|
o o +-+-+ +--+--+
|
|
| | | | |
|
|
4 5 7 8 9
|
|
@end group
|
|
@end example
|
|
|
|
Note that this function doesn't traverse the subtree of @var{start},
|
|
and it always traverses leaf nodes first, before moving upwards.
|
|
|
|
Like @code{treesit-search-subtree}, this function only searches for
|
|
named nodes by default, but if @var{all} is non-@code{nil}, it
|
|
searches for all nodes. If @var{backward} is non-@code{nil}, it
|
|
searches backwards.
|
|
|
|
While @code{treesit-search-subtree} traverses the subtree of a node,
|
|
this function starts with node @var{start} and traverses every node
|
|
that comes after it in the buffer position order, i.e., nodes with
|
|
start positions greater than the end position of @var{start}.
|
|
|
|
In the tree shown above, @code{treesit-search-subtree} traverses node
|
|
@samp{S} (@var{start}) and nodes marked with @code{o}, whereas this
|
|
function traverses the nodes marked with numbers. This function is
|
|
useful for answering questions like ``what is the first node after
|
|
@var{start} in the buffer that satisfies some condition?''
|
|
@end defun
|
|
|
|
@defun treesit-search-forward-goto node predicate &optional start backward all
|
|
This function moves point to the start or end of the next node after
|
|
@var{node} in the buffer that matches @var{predicate}. If @var{start}
|
|
is non-@code{nil}, stop at the beginning rather than the end of a node.
|
|
|
|
This function guarantees that the matched node it returns makes
|
|
progress in terms of buffer position: the start/end position of the
|
|
returned node is always greater than that of @var{node}.
|
|
|
|
Arguments @var{predicate}, @var{backward}, and @var{all} are the same
|
|
as in @code{treesit-search-forward}.
|
|
@end defun
|
|
|
|
@defun treesit-induce-sparse-tree root predicate &optional process-fn depth
|
|
This function creates a sparse tree from @var{root}'s subtree.
|
|
|
|
It takes the subtree under @var{root}, and combs it so only the nodes
|
|
that match @var{predicate} are left. Like previous functions, the
|
|
@var{predicate} can be a regexp string that matches against each node's
|
|
type, or a function that takes a node and returns non-@code{nil} if it
|
|
matches. @var{predicate} can also be a thing symbol or thing definition
|
|
(@pxref{User-defined Things}). Using an undefined thing doesn't raise
|
|
an error, the function simply returns @code{nil}.
|
|
|
|
For example, given the subtree on the left that consists of both
|
|
numbers and letters, if @var{predicate} is ``letter only'', the
|
|
returned tree is the one on the right.
|
|
|
|
@example
|
|
@group
|
|
a a a
|
|
| | |
|
|
+---+---+ +---+---+ +---+---+
|
|
| | | | | | | | |
|
|
b 1 2 b | | b c d
|
|
| | => | | => |
|
|
c +--+ c + e
|
|
| | | | |
|
|
+--+ d 4 +--+ d
|
|
| | |
|
|
e 5 e
|
|
@end group
|
|
@end example
|
|
|
|
If @var{process-fn} is non-@code{nil}, instead of returning the
|
|
matched nodes, this function passes each node to @var{process-fn} and
|
|
uses the returned value instead. If non-@code{nil}, @var{depth}
|
|
limits the number of levels to go down from @var{root}. If
|
|
@var{depth} is @code{nil}, it defaults to 1000.
|
|
|
|
Each node in the returned tree looks like
|
|
@w{@code{(@var{tree-sitter-node} . (@var{child} @dots{}))}}. The
|
|
@var{tree-sitter-node} of the root of this tree will be @code{nil} if
|
|
@var{root} doesn't match @var{predicate}. If no node matches
|
|
@var{predicate}, the function returns @code{nil}.
|
|
@end defun
|
|
|
|
@heading More convenience functions
|
|
|
|
@defun treesit-node-get node instructions
|
|
This is a convenience function that chains together multiple node
|
|
accessor functions together. For example, to get @var{node}'s
|
|
parent's next sibling's second child's text:
|
|
|
|
@example
|
|
@group
|
|
(treesit-node-get node
|
|
'((parent 1)
|
|
(sibling 1 nil)
|
|
(child 1 nil)
|
|
(text nil)))
|
|
@end group
|
|
@end example
|
|
|
|
@var{instruction} is a list of INSTRUCTIONs of the form
|
|
@w{@code{(@var{fn} @var{arg}...)}}. The following @var{fn}'s are
|
|
supported:
|
|
|
|
@table @code
|
|
@item (child @var{idx} @var{named})
|
|
Get the @var{idx}'th child.
|
|
|
|
@item (parent @var{n})
|
|
Go to parent @var{n} times.
|
|
|
|
@item (field-name)
|
|
Get the field name of the current node.
|
|
|
|
@item (type)
|
|
Get the type of the current node.
|
|
|
|
@item (text @var{no-property})
|
|
Get the text of the current node.
|
|
|
|
@item (children @var{named})
|
|
Get a list of children.
|
|
|
|
@item (sibling @var{step} @var{named})
|
|
Get the nth prev/next sibling, negative @var{step} means prev sibling,
|
|
positive means next sibling.
|
|
@end table
|
|
|
|
Note that arguments like @var{named} and @var{no-property} can't be
|
|
omitted, unlike in their original functions.
|
|
@end defun
|
|
|
|
@defun treesit-filter-child node predicate &optional named
|
|
This function finds immediate children of @var{node} that satisfy
|
|
@var{predicate}.
|
|
|
|
The @var{predicate} function takes a node as argument and should
|
|
return non-@code{nil} to indicate that the node should be kept. If
|
|
@var{named} is non-@code{nil}, this function only examines named
|
|
nodes.
|
|
@end defun
|
|
|
|
@defun treesit-parent-until node predicate &optional include-node
|
|
This function repeatedly finds the parents of @var{node}, and returns
|
|
the parent that satisfies @var{predicate}. @var{predicate} can be
|
|
either a function that takes a node as argument and returns @code{t}
|
|
or @code{nil}, or a regexp matching node type names, or other valid
|
|
predicates described in @code{treesit-thing-settings}. If no parent
|
|
satisfies @var{predicates}, this function returns @code{nil}.
|
|
|
|
Normally this function only looks at the parents of @var{node} but not
|
|
@var{node} itself. But if @var{include-node} is non-@code{nil}, this
|
|
function returns @var{node} if @var{node} satisfies @var{predicate}.
|
|
@end defun
|
|
|
|
@defun treesit-parent-while node predicate
|
|
This function goes up the tree starting from @var{node}, and keeps
|
|
doing so as long as the nodes satisfy @var{predicate}, a function that
|
|
takes a node as argument. That is, this function returns the highest
|
|
parent of @var{node} that still satisfies @var{predicate}. Note that if
|
|
@var{node} satisfies @var{predicate} but its immediate parent doesn't,
|
|
@var{node} itself is returned.
|
|
@end defun
|
|
|
|
@defun treesit-node-top-level node &optional predicate include-node
|
|
This function returns the highest parent of @var{node} that has the
|
|
same type as @var{node}. If no such parent exists, it returns
|
|
@code{nil}. Therefore this function is also useful for testing
|
|
whether @var{node} is top-level.
|
|
|
|
If @var{predicate} is @code{nil}, this function uses @var{node}'s type
|
|
to find the parent. If @var{predicate} is non-@code{nil}, this
|
|
function searches the parent that satisfies @var{predicate}. If
|
|
@var{include-node} is non-@code{nil}, this function returns @var{node}
|
|
if @var{node} satisfies @var{predicate}.
|
|
@end defun
|
|
|
|
@node Accessing Node Information
|
|
@section Accessing Node Information
|
|
@cindex information of node, syntax trees
|
|
@cindex syntax trees, node information
|
|
|
|
@heading Basic information of Node
|
|
|
|
Every node is associated with a parser, and that parser is associated
|
|
with a buffer. The following functions retrieve them.
|
|
|
|
@defun treesit-node-parser node
|
|
This function returns @var{node}'s associated parser.
|
|
@end defun
|
|
|
|
@defun treesit-node-buffer node
|
|
This function returns @var{node}'s parser's associated buffer.
|
|
@end defun
|
|
|
|
@defun treesit-node-language node
|
|
This function returns @var{node}'s parser's associated language.
|
|
@end defun
|
|
|
|
Each node represents a portion of text in the buffer. Functions below
|
|
find relevant information about that text.
|
|
|
|
@defun treesit-node-start node
|
|
Return the start position of @var{node}.
|
|
@end defun
|
|
|
|
@defun treesit-node-end node
|
|
Return the end position of @var{node}.
|
|
@end defun
|
|
|
|
@defun treesit-node-text node &optional object
|
|
Return the buffer text that @var{node} represents, as a string. (If
|
|
@var{node} is retrieved from parsing a string, it will be the text
|
|
from that string.)
|
|
@end defun
|
|
|
|
@cindex predicates for syntax tree nodes
|
|
Here are some predicates on tree-sitter nodes:
|
|
|
|
@defun treesit-node-p object
|
|
Checks if @var{object} is a tree-sitter syntax node.
|
|
@end defun
|
|
|
|
@cindex compare tree-sitter syntax nodes
|
|
@cindex tree-sitter nodes, comparing
|
|
@defun treesit-node-eq node1 node2
|
|
Checks if @var{node1} and @var{node2} refer to the same node in a
|
|
tree-sitter syntax tree. This function uses the same equivalence
|
|
metric as @code{equal}. You can also compare nodes using @code{equal}
|
|
(@pxref{Equality Predicates}).
|
|
@end defun
|
|
|
|
@heading Property information
|
|
|
|
In general, nodes in a concrete syntax tree fall into two categories:
|
|
@dfn{named nodes} and @dfn{anonymous nodes}. Whether a node is named
|
|
or anonymous is determined by the language grammar
|
|
(@pxref{tree-sitter named node, named node}).
|
|
|
|
@cindex tree-sitter missing node
|
|
@cindex missing node, tree-sitter
|
|
Apart from being named or anonymous, a node can have other properties.
|
|
A node can be ``missing'': such nodes are inserted by the parser in
|
|
order to recover from certain kinds of syntax errors, i.e., something
|
|
should probably be there according to the grammar, but is not there.
|
|
This can happen during editing of the program source, when the source
|
|
is not yet in its final form.
|
|
|
|
@cindex tree-sitter extra node
|
|
@cindex extra node, tree-sitter
|
|
A node can be ``extra'': such nodes represent things like comments,
|
|
which can appear anywhere in the text.
|
|
|
|
@cindex tree-sitter outdated node
|
|
@cindex outdated node, tree-sitter
|
|
A node can be ``outdated'', if its parser has reparsed at least once
|
|
after the node was created.
|
|
|
|
@cindex tree-sitter node that has error
|
|
@cindex has error, tree-sitter node
|
|
A node ``has error'' if the text it spans contains a syntax error. It
|
|
can be that the node itself has an error, or one of its descendants
|
|
has an error.
|
|
|
|
@cindex tree-sitter, live parsing node
|
|
@cindex live node, tree-sitter
|
|
A node is considered @dfn{live} if its parser is not deleted, and the
|
|
buffer to which it belongs is a live buffer (@pxref{Killing Buffers}).
|
|
|
|
@defun treesit-node-check node property
|
|
This function returns non-@code{nil} if @var{node} has the specified
|
|
@var{property}. @var{property} can be @code{named}, @code{missing},
|
|
@code{extra}, @code{outdated}, @code{has-error}, or @code{live}.
|
|
@end defun
|
|
|
|
@defun treesit-node-type node
|
|
Named nodes have ``types'' (@pxref{tree-sitter node type, node type}).
|
|
For example, a named node can be a @code{string_literal} node, where
|
|
@code{string_literal} is its type. The type of an anonymous node is
|
|
just the text that the node represents; e.g., the type of a @samp{,}
|
|
node is just @samp{,}.
|
|
|
|
This function returns @var{node}'s type as a string.
|
|
@end defun
|
|
|
|
@heading Information as a child or parent
|
|
|
|
@defun treesit-node-index node &optional named
|
|
This function returns the index of @var{node} as a child node of its
|
|
parent. If @var{named} is non-@code{nil}, it only counts named nodes
|
|
(@pxref{tree-sitter named node, named node}).
|
|
@end defun
|
|
|
|
@defun treesit-node-field-name node
|
|
A child of a parent node could have a field name (@pxref{tree-sitter
|
|
node field name, field name}). This function returns the field name
|
|
of @var{node} as a child of its parent.
|
|
@end defun
|
|
|
|
@defun treesit-node-field-name-for-child node n
|
|
This function returns the field name of the @var{n}'th child of
|
|
@var{node}. It returns @code{nil} if there is no @var{n}'th child, or
|
|
the @var{n}'th child doesn't have a field name.
|
|
|
|
Note that @var{n} counts both named and anonymous children, and
|
|
@var{n} can be negative, e.g., @minus{}1 represents the last child.
|
|
@end defun
|
|
|
|
@defun treesit-node-child-count node &optional named
|
|
This function returns the number of children of @var{node}. If
|
|
@var{named} is non-@code{nil}, it only counts named children
|
|
(@pxref{tree-sitter named node, named node}).
|
|
@end defun
|
|
|
|
@heading Convenience functions
|
|
|
|
@defun treesit-node-enclosed-p smaller larger &optional strict
|
|
This function returns non-@code{nil} if @var{smaller} is enclosed in
|
|
@var{larger}. @var{smaller} and @var{larger} can be either a cons
|
|
@code{(@var{beg} . @var{end})} or a node.
|
|
|
|
Return non-@code{nil} if @var{larger}'s start <= @var{smaller}'s start
|
|
and @var{larger}'s end <= @var{smaller}'s end.
|
|
|
|
If @var{strict} is @code{t}, compare with < rather than <=.
|
|
|
|
If @var{strict} is @code{partial}, consider @var{larger} encloses
|
|
@var{smaller} when at least one side is strictly enclosing.
|
|
@end defun
|
|
|
|
@node Pattern Matching
|
|
@section Pattern Matching Tree-sitter Nodes
|
|
@cindex pattern matching with tree-sitter nodes
|
|
|
|
@cindex capturing, tree-sitter node
|
|
Tree-sitter lets Lisp programs match patterns using a small
|
|
declarative language. This pattern matching consists of two steps:
|
|
first tree-sitter matches a @dfn{pattern} against nodes in the syntax
|
|
tree, then it @dfn{captures} specific nodes that matched the pattern
|
|
and returns the captured nodes.
|
|
|
|
We describe first how to write the most basic query pattern and how to
|
|
capture nodes in a pattern, then the pattern-matching function, and
|
|
finally the more advanced pattern syntax.
|
|
|
|
@heading Basic query syntax
|
|
|
|
@cindex tree-sitter query pattern syntax
|
|
@cindex pattern syntax, tree-sitter query
|
|
@cindex query, tree-sitter
|
|
A @dfn{query} consists of multiple @dfn{patterns}. Each pattern is an
|
|
s-expression that matches a certain node in the syntax node. A
|
|
pattern has the form @w{@code{(@var{type} (@var{child}@dots{}))}}.
|
|
|
|
For example, a pattern that matches a @code{binary_expression} node that
|
|
contains @code{number_literal} child nodes would look like
|
|
|
|
@example
|
|
(binary_expression (number_literal))
|
|
@end example
|
|
|
|
To @dfn{capture} a node using the query pattern above, append
|
|
@code{@@@var{capture-name}} after the node pattern you want to
|
|
capture. For example,
|
|
|
|
@example
|
|
(binary_expression (number_literal) @@number-in-exp)
|
|
@end example
|
|
|
|
@noindent
|
|
captures @code{number_literal} nodes that are inside a
|
|
@code{binary_expression} node with the capture name
|
|
@code{number-in-exp}.
|
|
|
|
We can capture the @code{binary_expression} node as well, with, for
|
|
example, the capture name @code{biexp}:
|
|
|
|
@example
|
|
(binary_expression
|
|
(number_literal) @@number-in-exp) @@biexp
|
|
@end example
|
|
|
|
@heading Query function
|
|
|
|
@cindex query functions, tree-sitter
|
|
Now we can introduce the @dfn{query functions}.
|
|
|
|
@defun treesit-query-capture node query &optional beg end node-only
|
|
This function matches patterns in @var{query} within @var{node}. The
|
|
argument @var{query} can be either an s-expression, a string, or a
|
|
compiled query object. For now, we focus on the s-expression syntax;
|
|
string syntax and compiled queries are described at the end of
|
|
the section.
|
|
|
|
The argument @var{node} can also be a parser or a language symbol. A
|
|
parser means use its root node, a language symbol means find or create
|
|
a parser for that language in the current buffer, and use the root
|
|
node.
|
|
|
|
The function returns all the captured nodes in an alist with elements
|
|
of the form @w{@code{(@var{capture_name} . @var{node})}}. If
|
|
@var{node-only} is non-@code{nil}, it returns the list of @var{node}s
|
|
instead. By default the entire text of @var{node} is searched, but if
|
|
@var{beg} and @var{end} are both non-@code{nil}, they specify the
|
|
region of buffer text where this function should match nodes. Any
|
|
matching node whose span overlaps with the region between @var{beg}
|
|
and @var{end} is captured; it doesn't have to be completely contained
|
|
in the region.
|
|
|
|
@vindex treesit-query-error
|
|
@findex treesit-query-validate
|
|
This function raises the @code{treesit-query-error} error if
|
|
@var{query} is malformed. The signal data contains a description of
|
|
the specific error. You can use @code{treesit-query-validate} to
|
|
validate and debug the query.
|
|
@end defun
|
|
|
|
For example, suppose @var{node}'s text is @code{1 + 2}, and
|
|
@var{query} is
|
|
|
|
@example
|
|
@group
|
|
(setq query
|
|
'((binary_expression
|
|
(number_literal) @@number-in-exp) @@biexp)
|
|
@end group
|
|
@end example
|
|
|
|
Matching that query would return
|
|
|
|
@example
|
|
@group
|
|
(treesit-query-capture node query)
|
|
@result{} ((biexp . @var{<node for "1 + 2">})
|
|
(number-in-exp . @var{<node for "1">})
|
|
(number-in-exp . @var{<node for "2">}))
|
|
@end group
|
|
@end example
|
|
|
|
As mentioned earlier, @var{query} could contain multiple patterns.
|
|
For example, it could have two top-level patterns:
|
|
|
|
@example
|
|
@group
|
|
(setq query
|
|
'((binary_expression) @@biexp
|
|
(number_literal) @@number @@biexp)
|
|
@end group
|
|
@end example
|
|
|
|
@defun treesit-query-string string query language
|
|
This function parses @var{string} as @var{language}, matches its root
|
|
node with @var{query}, and returns the result.
|
|
@end defun
|
|
|
|
@heading More query syntax
|
|
|
|
Besides node type and capture name, tree-sitter's pattern syntax can
|
|
express anonymous node, field name, wildcard, quantification,
|
|
grouping, alternation, anchor, and predicate.
|
|
|
|
@subheading Anonymous node
|
|
|
|
An anonymous node is written verbatim, surrounded by quotes. A
|
|
pattern matching (and capturing) keyword @code{return} would be
|
|
|
|
@example
|
|
"return" @@keyword
|
|
@end example
|
|
|
|
@subheading Wild card
|
|
|
|
In a pattern, @samp{(_)} matches any named node, and @samp{_} matches
|
|
any named or anonymous node. For example, to capture any named child
|
|
of a @code{binary_expression} node, the pattern would be
|
|
|
|
@example
|
|
(binary_expression (_) @@in-biexp)
|
|
@end example
|
|
|
|
@subheading Field name
|
|
|
|
It is possible to capture child nodes that have specific field names.
|
|
In the pattern below, @code{declarator} and @code{body} are field
|
|
names, indicated by the colon following them.
|
|
|
|
@example
|
|
@group
|
|
(function_definition
|
|
declarator: (_) @@func-declarator
|
|
body: (_) @@func-body)
|
|
@end group
|
|
@end example
|
|
|
|
It is also possible to capture a node that doesn't have a certain
|
|
field, say, a @code{function_definition} without a @code{body} field:
|
|
|
|
@example
|
|
(function_definition !body) @@func-no-body
|
|
@end example
|
|
|
|
@subheading Quantify node
|
|
|
|
@cindex quantify node, tree-sitter
|
|
Tree-sitter recognizes quantification operators @samp{:*}, @samp{:+},
|
|
and @samp{:?}. Their meanings are the same as in regular expressions:
|
|
@samp{:*} matches the preceding pattern zero or more times, @samp{:+}
|
|
matches one or more times, and @samp{:?} matches zero or one times.
|
|
|
|
For example, the following pattern matches @code{type_declaration}
|
|
nodes that have @emph{zero or more} @code{long} keywords.
|
|
|
|
@example
|
|
(type_declaration "long" :*) @@long-type
|
|
@end example
|
|
|
|
The following pattern matches a type declaration that may or may not
|
|
have a @code{long} keyword:
|
|
|
|
@example
|
|
(type_declaration "long" :?) @@long-type
|
|
@end example
|
|
|
|
@subheading Grouping
|
|
|
|
Similar to groups in regular expressions, we can bundle patterns into
|
|
groups and apply quantification operators to them. For example, to
|
|
express a comma-separated list of identifiers, one could write
|
|
|
|
@example
|
|
(identifier) ("," (identifier)) :*
|
|
@end example
|
|
|
|
@subheading Alternation
|
|
|
|
Again, similar to regular expressions, we can express ``match any one
|
|
of these patterns'' in a pattern. The syntax is a vector of patterns.
|
|
For example, to capture some keywords in C, the pattern would be
|
|
|
|
@example
|
|
@group
|
|
[
|
|
"return"
|
|
"break"
|
|
"if"
|
|
"else"
|
|
] @@keyword
|
|
@end group
|
|
@end example
|
|
|
|
@subheading Anchor
|
|
|
|
The anchor operator @code{:anchor} can be used to enforce juxtaposition,
|
|
i.e., to enforce two things to be directly next to each other. The
|
|
two ``things'' can be two nodes, or a child and the end of its parent.
|
|
For example, to capture the first child, the last child, or two
|
|
adjacent children:
|
|
|
|
@example
|
|
@group
|
|
;; Anchor the child with the end of its parent.
|
|
(compound_expression (_) @@last-child :anchor)
|
|
@end group
|
|
|
|
@group
|
|
;; Anchor the child with the beginning of its parent.
|
|
(compound_expression :anchor (_) @@first-child)
|
|
@end group
|
|
|
|
@group
|
|
;; Anchor two adjacent children.
|
|
(compound_expression
|
|
(_) @@prev-child
|
|
:anchor
|
|
(_) @@next-child)
|
|
@end group
|
|
@end example
|
|
|
|
Note that the enforcement of juxtaposition ignores any anonymous
|
|
nodes.
|
|
|
|
@subheading Predicate
|
|
|
|
It is possible to add predicate constraints to a pattern. For
|
|
example, with the following pattern:
|
|
|
|
@example
|
|
@group
|
|
(
|
|
(array :anchor (_) @@first (_) @@last :anchor)
|
|
(:equal @@first @@last)
|
|
)
|
|
@end group
|
|
@end example
|
|
|
|
@noindent
|
|
tree-sitter only matches arrays where the first element is equal to
|
|
the last element. To attach a predicate to a pattern, we need to
|
|
group them together. Currently there are three predicates:
|
|
@code{:equal}, @code{:match}, and @code{:pred}.
|
|
|
|
@deffn Predicate :equal arg1 arg2
|
|
Matches if @var{arg1} is equal to @var{arg2}. Arguments can be either
|
|
strings or capture names. Capture names represent the text that the
|
|
captured node spans in the buffer.
|
|
@end deffn
|
|
|
|
@deffn Predicate :match regexp capture-name
|
|
Matches if the text that @var{capture-name}'s node spans in the buffer
|
|
matches regular expression @var{regexp}, given as a string literal.
|
|
Matching is case-sensitive.
|
|
@end deffn
|
|
|
|
@deffn Predicate :pred fn &rest nodes
|
|
Matches if function @var{fn} returns non-@code{nil} when passed each
|
|
node in @var{nodes} as arguments. The function runs with the current
|
|
buffer set to the buffer of node being queried.
|
|
@end deffn
|
|
|
|
Note that a predicate can only refer to capture names that appear in
|
|
the same pattern. Indeed, it makes little sense to refer to capture
|
|
names in other patterns.
|
|
|
|
@heading String patterns
|
|
|
|
@cindex tree-sitter patterns as strings
|
|
@cindex patterns, tree-sitter, in string form
|
|
Besides s-expressions, Emacs allows the tree-sitter's native query
|
|
syntax to be used by writing them as strings. It largely resembles
|
|
the s-expression syntax. For example, the following query
|
|
|
|
@example
|
|
@group
|
|
(treesit-query-capture
|
|
node '((addition_expression
|
|
left: (_) @@left
|
|
"+" @@plus-sign
|
|
right: (_) @@right) @@addition
|
|
|
|
["return" "break"] @@keyword))
|
|
@end group
|
|
@end example
|
|
|
|
@noindent
|
|
is equivalent to
|
|
|
|
@example
|
|
@group
|
|
(treesit-query-capture
|
|
node "(addition_expression
|
|
left: (_) @@left
|
|
\"+\" @@plus-sign
|
|
right: (_) @@right) @@addition
|
|
|
|
[\"return\" \"break\"] @@keyword")
|
|
@end group
|
|
@end example
|
|
|
|
Most patterns can be written directly as s-expressions inside a string.
|
|
Only a few of them need modification:
|
|
|
|
@itemize
|
|
@item
|
|
Anchor @code{:anchor} is written as @samp{.}.
|
|
@item
|
|
@samp{:?} is written as @samp{?}.
|
|
@item
|
|
@samp{:*} is written as @samp{*}.
|
|
@item
|
|
@samp{:+} is written as @samp{+}.
|
|
@item
|
|
@code{:equal}, @code{:match} and @code{:pred} are written as
|
|
@code{#equal}, @code{#match} and @code{#pred}, respectively.
|
|
In general, predicates change their @samp{:} to @samp{#}.
|
|
@end itemize
|
|
|
|
For example,
|
|
|
|
@example
|
|
@group
|
|
'((
|
|
(compound_expression :anchor (_) @@first (_) :* @@rest)
|
|
(:match "love" @@first)
|
|
))
|
|
@end group
|
|
@end example
|
|
|
|
@noindent
|
|
is written in string form as
|
|
|
|
@example
|
|
@group
|
|
"(
|
|
(compound_expression . (_) @@first (_)* @@rest)
|
|
(#match \"love\" @@first)
|
|
)"
|
|
@end group
|
|
@end example
|
|
|
|
@heading Compiling queries
|
|
|
|
@cindex compiling tree-sitter queries
|
|
@cindex queries, compiling
|
|
If a query is intended to be used repeatedly, especially in tight
|
|
loops, it is important to compile that query, because a compiled query
|
|
is much faster than an uncompiled one. A compiled query can be used
|
|
anywhere a query is accepted.
|
|
|
|
@defun treesit-query-compile language query
|
|
This function compiles @var{query} for @var{language} into a compiled
|
|
query object and returns it.
|
|
|
|
This function raises the @code{treesit-query-error} error if
|
|
@var{query} is malformed. The signal data contains a description of
|
|
the specific error. You can use @code{treesit-query-validate} to
|
|
validate and debug the query.
|
|
@end defun
|
|
|
|
@defun treesit-query-language query
|
|
This function returns the language of @var{query}.
|
|
@end defun
|
|
|
|
@defun treesit-query-expand query
|
|
This function converts the s-expression @var{query} into the string
|
|
format.
|
|
@end defun
|
|
|
|
@defun treesit-pattern-expand pattern
|
|
This function converts the s-expression @var{pattern} into the string
|
|
format.
|
|
@end defun
|
|
|
|
For more details, read the tree-sitter project's documentation about
|
|
pattern-matching, which can be found at
|
|
@uref{https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries}.
|
|
|
|
@node User-defined Things
|
|
@section User-defined ``Things'' and Navigation
|
|
@cindex user-defined things, with tree-sitter parsing
|
|
|
|
It's often useful to be able to identify and find certain @dfn{things} in
|
|
a buffer, like function and class definitions, statements, code blocks,
|
|
strings, comments, etc. Emacs allows users to define what kind of
|
|
tree-sitter node corresponds to a ``thing''. This enables handy
|
|
features like jumping to the next function, marking the code block at
|
|
point, or transposing two function arguments.
|
|
|
|
The ``things'' feature in Emacs is independent of the pattern matching
|
|
feature of tree-sitter, and comparatively less powerful, but more
|
|
suitable for navigation and traversing the parse tree.
|
|
|
|
You can define things with @code{treesit-thing-settings}.
|
|
|
|
@defvar treesit-thing-settings
|
|
This is an alist of thing definitions for each language. The key of
|
|
each entry is a language symbol, and the value is a list of thing
|
|
definitions of the form @w{@code{(@var{thing} @var{pred})}}, where
|
|
@var{thing} is a symbol representing the thing, like @code{defun},
|
|
@code{sexp}, or @code{sentence}; and @var{pred} specifies what kind of
|
|
tree-sitter node is this @var{thing}.
|
|
|
|
@var{pred} can be a regexp string that matches the type of the node; it
|
|
can be a function that takes a node as the argument and returns a
|
|
boolean that indicates whether the node qualifies as the thing; or it can
|
|
be a cons @w{@code{(@var{regexp} . @var{fn})}}, which is a combination
|
|
of a regular expression @var{regexp} and a function @var{fn}---the node
|
|
has to match both the @var{regexp} and to satisfy @var{fn} to qualify as
|
|
the thing.
|
|
|
|
@var{pred} can also be recursively defined. It can be @w{@code{(or
|
|
@var{pred}@dots{})}}, meaning that satisfying any one of the @var{pred}s
|
|
qualifies the node as the thing. It can be @w{@code{(not @var{pred})}},
|
|
meaning that not satisfying @var{pred} qualifies the node.
|
|
|
|
Finally, @var{pred} can refer to other @var{thing}s defined in this
|
|
list. For example, @w{@code{(or sexp sentence)}} defines something
|
|
that's either a @code{sexp} thing or a @code{sentence} thing, as defined
|
|
by some other rule in the alist.
|
|
|
|
Here's an example @code{treesit-thing-settings} for C and C++:
|
|
|
|
@example
|
|
@group
|
|
((c
|
|
(defun "function_definition")
|
|
(sexp (not "[](),[@{@}]"))
|
|
(comment "comment")
|
|
(string "raw_string_literal")
|
|
(text (or comment string)))
|
|
(cpp
|
|
(defun ("function_definition" . cpp-ts-mode-defun-valid-p))
|
|
(defclass "class_specifier")
|
|
(comment "comment")))
|
|
@end group
|
|
@end example
|
|
|
|
@noindent
|
|
Note that this example is modified for didactic purposes, and isn't
|
|
exactly how C and C@t{++} modes define things.
|
|
@end defvar
|
|
|
|
The rest of this section lists a few functions that take advantage of
|
|
the thing definitions. Besides the functions below, some other
|
|
functions listed elsewhere also utilize the thing feature, e.g.,
|
|
tree-traversing functions like @code{treesit-search-forward},
|
|
@code{treesit-induce-sparse-tree}, etc. @xref{Retrieving Nodes}.
|
|
|
|
@defun treesit-thing-prev position thing
|
|
This function returns the first node before @var{position} that is the
|
|
specified @var{thing}. If no such node exists, it returns @code{nil}.
|
|
It's guaranteed that, if a node is returned, the node's end position is
|
|
less or equal to @var{position}. In other words, this function never
|
|
returns a node that encloses @var{position}.
|
|
|
|
@var{thing} can be either a thing symbol like @code{defun}, or simply a
|
|
thing definition like @code{"function_definition"}.
|
|
@end defun
|
|
|
|
@defun treesit-thing-next position thing
|
|
This function is similar to @code{treesit-thing-prev}, only it returns
|
|
the first node @emph{after} @var{position} that's the @var{thing}. It
|
|
also guarantees that if a node is returned, the node's start position is
|
|
greater or equal to @var{position}.
|
|
@end defun
|
|
|
|
@defun treesit-navigate-thing position arg side thing &optional tactic
|
|
This function builds upon @code{treesit-thing-prev} and
|
|
@code{treesit-thing-next} and provides functionality that a navigation
|
|
command would find useful. It returns the position after moving across
|
|
@var{arg} instances of @var{thing} from @var{position}. If
|
|
there aren't enough things to navigate across, it returns nil. The
|
|
function doesn't move point.
|
|
|
|
A positive @var{arg} means moving forward that many instances of
|
|
@var{thing}; negative @var{arg} means moving backward. If @var{side} is
|
|
@code{beg}, this function stops at the beginning of @var{thing}; if
|
|
@code{end}, stop at the end of @var{thing}.
|
|
|
|
Like in @code{treesit-thing-prev}, @var{thing} can be a thing symbol
|
|
defined in @code{treesit-thing-settings}, or a thing definition.
|
|
|
|
@var{tactic} determines how this function moves between things. It can
|
|
be @code{nested}, @code{top-level}, @code{restricted}, or @code{nil}.
|
|
@code{nested} or @code{nil} means normal nested navigation: first try to
|
|
move across siblings; if there aren't any siblings left in the current
|
|
level, move to the parent, then its siblings, and so on.
|
|
@code{top-level} means only navigate across top-level things and ignore
|
|
nested things. @code{restricted} means movement is restricted within
|
|
the thing that encloses @var{position}, if there is such a thing. This
|
|
tactic is useful for commands that want to stop at the current nesting
|
|
level and not move up.
|
|
@end defun
|
|
|
|
@defun treesit-thing-at position thing &optional strict
|
|
This function returns the smallest node that's the @var{thing} and
|
|
encloses @var{position}; if there's no such node, it returns @code{nil}.
|
|
|
|
The returned node must enclose @var{position}, i.e., its start position is
|
|
less or equal to @var{position}, and it's end position is greater or equal to
|
|
@var{position}.
|
|
|
|
If @var{strict} is non-@code{nil}, this function uses strict comparison,
|
|
i.e., start position must be strictly greater than @var{position}, and end
|
|
position must be strictly less than @var{position}.
|
|
|
|
@var{thing} can be either a thing symbol defined in
|
|
@code{treesit-thing-settings}, or a thing definition.
|
|
@end defun
|
|
|
|
@findex treesit-beginning-of-thing
|
|
@findex treesit-end-of-thing
|
|
@findex treesit-thing-at-point
|
|
There are also some convenient wrapper functions.
|
|
@code{treesit-beginning-of-thing} moves point to the beginning of a
|
|
thing, @code{treesit-end-of-thing} moves to the end of a thing, and
|
|
@code{treesit-thing-at-point} returns the thing at point.
|
|
|
|
There are also defun commands that specifically use the @code{defun}
|
|
definition (as a fallback of @code{treesit-defun-type-regexp}), like
|
|
@code{treesit-beginning-of-defun}, @code{treesit-end-of-defun}, and
|
|
@code{treesit-defun-at-point}. In addition, these functions use
|
|
@code{treesit-defun-tactic} as the navigation tactic. They are
|
|
described in more detail in other sections (@pxref{Tree-sitter Major
|
|
Modes}).
|
|
|
|
@node Multiple Languages
|
|
@section Parsing Text in Multiple Languages
|
|
@cindex multiple languages, parsing with tree-sitter
|
|
@cindex parsing multiple languages with tree-sitter
|
|
Sometimes, the source of a programming language could contain snippets
|
|
of other languages; @acronym{HTML} + @acronym{CSS} + JavaScript is one
|
|
example. In that case, text segments written in different languages
|
|
need to be assigned different parsers. Traditionally, this is
|
|
achieved by using narrowing. While tree-sitter works with narrowing
|
|
(@pxref{tree-sitter narrowing, narrowing}), the recommended way is
|
|
instead to specify regions of buffer text (i.e., ranges) in which a
|
|
parser will operate. This section describes functions for setting and
|
|
getting ranges for a parser.
|
|
|
|
@cindex primary parser
|
|
Generally when there are multiple languages at play, there is a
|
|
``primary'', or ``host'' language. The parser for this language---the
|
|
@dfn{primary parser}, parses the entire buffer. Parsers for other
|
|
languages are ``embedded'' or ``guest'' parsers, which only work on part
|
|
of the buffer. The parse tree of the primary parser is usually used to
|
|
determine the ranges in which the embedded parsers operate.
|
|
|
|
@vindex treesit-primary-parser
|
|
Major modes should set @code{treesit-primary-parser} to the primary
|
|
parser before calling @code{treesit-major-mode-setup}, so that Emacs can
|
|
configure the primary parser correctly for font-lock and other features.
|
|
|
|
Lisp programs should call @code{treesit-update-ranges} to make sure the
|
|
ranges for each parser are correct before using parsers in a buffer, and
|
|
call @code{treesit-language-at} to figure out the language responsible
|
|
for the text at some position. These two functions don't work by
|
|
themselves; they need major modes to set @code{treesit-range-settings}
|
|
and @code{treesit-language-at-point-function}, which do the actual work.
|
|
These functions and variables are explained in more detail towards the
|
|
end of the section.
|
|
|
|
In short, multi-language major modes should set
|
|
@code{treesit-primary-parser}, @code{treesit-range-settings}, and
|
|
@code{treesit-language-at-point-function} before calling
|
|
@code{treesit-major-mode-setup}.
|
|
|
|
@heading Getting and setting ranges
|
|
|
|
@defun treesit-parser-set-included-ranges parser ranges
|
|
This function sets up @var{parser} to operate on @var{ranges}. The
|
|
@var{parser} will only read the text of the specified ranges. Each
|
|
range in @var{ranges} is a pair of the form @w{@code{(@var{beg}
|
|
. @var{end})}}.
|
|
|
|
The ranges in @var{ranges} must come in order and must not overlap.
|
|
That is, in pseudo code:
|
|
|
|
@example
|
|
@group
|
|
(cl-loop for idx from 1 to (1- (length ranges))
|
|
for prev = (nth (1- idx) ranges)
|
|
for next = (nth idx ranges)
|
|
should (<= (car prev) (cdr prev)
|
|
(car next) (cdr next)))
|
|
@end group
|
|
@end example
|
|
|
|
@vindex treesit-range-invalid
|
|
If @var{ranges} violates this constraint, or something else went
|
|
wrong, this function signals the @code{treesit-range-invalid} error.
|
|
The signal data contains a specific error message and the ranges we
|
|
are trying to set.
|
|
|
|
This function can also be used for disabling ranges. If @var{ranges}
|
|
is @code{nil}, the parser is set to parse the whole buffer.
|
|
|
|
Example:
|
|
|
|
@example
|
|
@group
|
|
(treesit-parser-set-included-ranges
|
|
parser '((1 . 9) (16 . 24) (24 . 25)))
|
|
@end group
|
|
@end example
|
|
@end defun
|
|
|
|
@defun treesit-parser-included-ranges parser
|
|
This function returns the ranges set for @var{parser}. The return
|
|
value is the same as the @var{ranges} argument of
|
|
@code{treesit-parser-included-ranges}: a list of cons cells of the form
|
|
@w{@code{(@var{beg} . @var{end})}}. If @var{parser} doesn't have any
|
|
ranges, the return value is @code{nil}.
|
|
|
|
@example
|
|
@group
|
|
(treesit-parser-included-ranges parser)
|
|
@result{} ((1 . 9) (16 . 24) (24 . 25))
|
|
@end group
|
|
@end example
|
|
@end defun
|
|
|
|
@defun treesit-query-range source query &optional beg end
|
|
This function matches @var{source} with @var{query} and returns the
|
|
ranges of captured nodes. The return value is a list of cons cells of
|
|
the form @w{@code{(@var{beg} . @var{end})}}, where @var{beg} and
|
|
@var{end} specify the beginning and the end of a region of text.
|
|
|
|
For convenience, @var{source} can be a language symbol, a parser, or a
|
|
node. If it's a language symbol, this function matches in the root
|
|
node of the first parser using that language; if a parser, this
|
|
function matches in the root node of that parser; if a node, this
|
|
function matches in that node.
|
|
|
|
The argument @var{query} is the query used to capture nodes
|
|
(@pxref{Pattern Matching}). The capture names don't matter. The
|
|
arguments @var{beg} and @var{end}, if both non-@code{nil}, limit the
|
|
range in which this function queries.
|
|
|
|
Like other query functions, this function raises the
|
|
@code{treesit-query-error} error if @var{query} is malformed.
|
|
@end defun
|
|
|
|
@heading Supporting multiple languages in Lisp programs
|
|
|
|
It should suffice for general Lisp programs to call the following two
|
|
functions in order to support program sources that mix multiple
|
|
languages.
|
|
|
|
@defun treesit-update-ranges &optional beg end
|
|
This function updates ranges for parsers in the buffer. It makes sure
|
|
the parsers' ranges are set correctly between @var{beg} and @var{end},
|
|
according to @code{treesit-range-settings}. If omitted, @var{beg}
|
|
defaults to the beginning of the buffer, and @var{end} defaults to the
|
|
end of the buffer.
|
|
|
|
For example, fontification functions use this function before querying
|
|
for nodes in a region.
|
|
@end defun
|
|
|
|
@defun treesit-language-at pos
|
|
This function returns the language of the text at buffer position
|
|
@var{pos}. Under the hood it calls
|
|
@code{treesit-language-at-point-function} and returns its return
|
|
value. If @code{treesit-language-at-point-function} is @code{nil},
|
|
this function returns the language of the first parser in the returned
|
|
value of @code{treesit-parser-list}. If there is no parser in the
|
|
buffer, it returns @code{nil}.
|
|
@end defun
|
|
|
|
@heading Supporting multiple languages in major modes
|
|
|
|
@cindex host language, tree-sitter
|
|
@cindex tree-sitter host and embedded languages
|
|
@cindex embedded language, tree-sitter
|
|
Normally, in a set of languages that can be mixed together, there is a
|
|
@dfn{host language} and one or more @dfn{embedded languages}. A Lisp
|
|
program usually first parses the whole document with the host
|
|
language's parser, retrieves some information, sets ranges for the
|
|
embedded languages with that information, and then parses the embedded
|
|
languages.
|
|
|
|
Take a buffer containing @acronym{HTML}, @acronym{CSS}, and JavaScript
|
|
as an example. A Lisp program will first parse the whole buffer with
|
|
an @acronym{HTML} parser, then query the parser for
|
|
@code{style_element} and @code{script_element} nodes, which correspond
|
|
to @acronym{CSS} and JavaScript text, respectively. Then it sets the
|
|
range of the @acronym{CSS} and JavaScript parsers to the range which
|
|
their corresponding nodes span.
|
|
|
|
Given a simple @acronym{HTML} document:
|
|
|
|
@example
|
|
@group
|
|
<html>
|
|
<script>1 + 2</script>
|
|
<style>body @{ color: "blue"; @}</style>
|
|
</html>
|
|
@end group
|
|
@end example
|
|
|
|
@noindent
|
|
a Lisp program will first parse with a @acronym{HTML} parser, then set
|
|
ranges for @acronym{CSS} and JavaScript parsers:
|
|
|
|
@example
|
|
@group
|
|
;; Create parsers.
|
|
(setq html (treesit-parser-create 'html))
|
|
(setq css (treesit-parser-create 'css))
|
|
(setq js (treesit-parser-create 'javascript))
|
|
@end group
|
|
|
|
@group
|
|
;; Set CSS ranges.
|
|
(setq css-range
|
|
(treesit-query-range
|
|
'html
|
|
'((style_element (raw_text) @@capture))))
|
|
(treesit-parser-set-included-ranges css css-range)
|
|
@end group
|
|
|
|
@group
|
|
;; Set JavaScript ranges.
|
|
(setq js-range
|
|
(treesit-query-range
|
|
'html
|
|
'((script_element (raw_text) @@capture))))
|
|
(treesit-parser-set-included-ranges js js-range)
|
|
@end group
|
|
@end example
|
|
|
|
Emacs automates this process in @code{treesit-update-ranges}. A
|
|
multi-language major mode should set @code{treesit-range-settings} so
|
|
that @code{treesit-update-ranges} knows how to perform this process
|
|
automatically. Major modes should use the helper function
|
|
@code{treesit-range-rules} to generate a value that can be assigned to
|
|
@code{treesit-range-settings}. The settings in the following example
|
|
directly translate into operations shown above.
|
|
|
|
@example
|
|
@group
|
|
(setq treesit-range-settings
|
|
(treesit-range-rules
|
|
:embed 'javascript
|
|
:host 'html
|
|
'((script_element (raw_text) @@capture))
|
|
@end group
|
|
@group
|
|
:embed 'css
|
|
:host 'html
|
|
'((style_element (raw_text) @@capture))))
|
|
@end group
|
|
|
|
@group
|
|
;; Major modes with multiple languages should always set
|
|
;; `treesit-language-at-point-function' (which see).
|
|
(setq treesit-language-at-point-function
|
|
(lambda (pos)
|
|
(let* ((node (treesit-node-at pos 'html))
|
|
(parent (treesit-node-parent node)))
|
|
(cond
|
|
((and node parent
|
|
(equal (treesit-node-type node) "raw_text")
|
|
(equal (treesit-node-type parent) "script_element"))
|
|
'javascript)
|
|
((and node parent
|
|
(equal (treesit-node-type node) "raw_text")
|
|
(equal (treesit-node-type parent) "style_element"))
|
|
'css)
|
|
(t 'html)))))
|
|
@end group
|
|
@end example
|
|
|
|
@defun treesit-range-rules &rest query-specs
|
|
This function is used to set @code{treesit-range-settings}. It takes
|
|
care of compiling queries and other post-processing, and outputs a
|
|
value that @code{treesit-range-settings} can have.
|
|
|
|
It takes a series of @var{query-spec}s, where each @var{query-spec} is
|
|
a @var{query} preceded by zero or more @var{keyword}/@var{value}
|
|
pairs. Each @var{query} is a tree-sitter query in either the string,
|
|
s-expression, or compiled form, or a function.
|
|
|
|
If @var{query} is a tree-sitter query, it should be preceded by two
|
|
@var{keyword}/@var{value} pairs, where the @code{:embed} keyword
|
|
specifies the embedded language, and the @code{:host} keyword
|
|
specifies the host language.
|
|
|
|
@cindex local parser
|
|
If the query is given the @code{:local} keyword whose value is
|
|
@code{t}, the range set by this query has a dedicated local parser;
|
|
otherwise the range shares a parser with other ranges for the same
|
|
language.
|
|
|
|
By default, a parser sees its ranges as a continuum, rather than
|
|
treating them as separate independent segments. Therefore, if the
|
|
embedded ranges are semantically independent segments, they should be
|
|
processed by local parsers, described below.
|
|
|
|
Local parser set to a range can be retrieved by
|
|
@code{treesit-local-parsers-at} and @code{treesit-local-parsers-on}.
|
|
|
|
@code{treesit-update-ranges} uses @var{query} to figure out how to set
|
|
the ranges for parsers for the embedded language. It queries
|
|
@var{query} in a host language parser, computes the ranges which the
|
|
captured nodes span, and applies these ranges to embedded language
|
|
parsers.
|
|
|
|
If @var{query} is a function, it doesn't need any @var{keyword} and
|
|
@var{value} pair. It should be a function that takes 2 arguments,
|
|
@var{start} and @var{end}, and sets the ranges for parsers in the
|
|
current buffer in the region between @var{start} and @var{end}. It is
|
|
fine for this function to set ranges in a larger region that
|
|
encompasses the region between @var{start} and @var{end}.
|
|
@end defun
|
|
|
|
@defvar treesit-range-settings
|
|
This variable helps @code{treesit-update-ranges} in updating the
|
|
ranges for parsers in the buffer. It is a list of @var{setting}s
|
|
where the exact format of a @var{setting} is considered internal. You
|
|
should use @code{treesit-range-rules} to generate a value that this
|
|
variable can have.
|
|
|
|
@c Because the format is internal, we don't document them here. Though
|
|
@c we do have it explained in the docstring. We also expose the fact
|
|
@c that it is a list of settings, so one could combine two of them with
|
|
@c append.
|
|
@end defvar
|
|
|
|
|
|
@defvar treesit-language-at-point-function
|
|
This variable's value should be a function that takes a single
|
|
argument, @var{pos}, which is a buffer position, and returns the
|
|
language of the buffer text at @var{pos}. This variable is used by
|
|
@code{treesit-language-at}.
|
|
@end defvar
|
|
|
|
@defun treesit-local-parsers-at &optional pos language
|
|
This function returns all the local parsers at @var{pos} in the
|
|
current buffer. @var{pos} defaults to point.
|
|
|
|
Local parsers are those which only parse a limited region marked by an
|
|
overlay with a non-@code{nil} @code{treesit-parser} property. If
|
|
@var{language} is non-@code{nil}, only return parsers for that
|
|
language.
|
|
@end defun
|
|
|
|
@defun treesit-local-parsers-on &optional beg end language
|
|
This function is the same as @code{treesit-local-parsers-at}, but it
|
|
returns the local parsers in the range between @var{beg} and @var{end}
|
|
instead of at point.
|
|
|
|
@var{beg} and @var{end} default to the entire accessible portion of
|
|
the buffer.
|
|
@end defun
|
|
|
|
@node Tree-sitter Major Modes
|
|
@section Developing major modes with tree-sitter
|
|
@cindex major mode, developing with tree-sitter
|
|
|
|
This section covers some general guidelines on developing tree-sitter
|
|
integration for a major mode.
|
|
|
|
A major mode supporting tree-sitter features should roughly follow
|
|
this pattern:
|
|
|
|
@example
|
|
@group
|
|
(define-derived-mode woomy-mode prog-mode "Woomy"
|
|
"A mode for Woomy programming language."
|
|
(when (treesit-ready-p 'woomy)
|
|
(setq-local treesit-variables ...)
|
|
...
|
|
(treesit-major-mode-setup)))
|
|
@end group
|
|
@end example
|
|
|
|
@code{treesit-ready-p} automatically emits a warning if conditions for
|
|
enabling tree-sitter aren't met.
|
|
|
|
If a tree-sitter major mode shares setup with its ``native''
|
|
counterpart, one can create a ``base mode'' that contains the common
|
|
setup, like this:
|
|
|
|
@example
|
|
@group
|
|
(define-derived-mode woomy--base-mode prog-mode "Woomy"
|
|
"An internal mode for Woomy programming language."
|
|
(common-setup)
|
|
...)
|
|
@end group
|
|
|
|
@group
|
|
(define-derived-mode woomy-mode woomy--base-mode "Woomy"
|
|
"A mode for Woomy programming language."
|
|
(native-setup)
|
|
...)
|
|
@end group
|
|
|
|
@group
|
|
(define-derived-mode woomy-ts-mode woomy--base-mode "Woomy"
|
|
"A mode for Woomy programming language."
|
|
(when (treesit-ready-p 'woomy)
|
|
(setq-local treesit-variables ...)
|
|
...
|
|
(treesit-major-mode-setup)))
|
|
@end group
|
|
@end example
|
|
|
|
@defun treesit-ready-p language &optional quiet
|
|
This function checks for conditions for activating tree-sitter. It
|
|
checks whether Emacs was built with tree-sitter, whether the buffer's
|
|
size is not too large for tree-sitter to handle, and whether the
|
|
grammar for @var{language} is available on the system (@pxref{Language
|
|
Grammar}).
|
|
|
|
This function emits a warning if tree-sitter cannot be activated. If
|
|
@var{quiet} is @code{message}, the warning is turned into a message;
|
|
if @var{quiet} is @code{t}, no warning or message is displayed.
|
|
|
|
If all the necessary conditions are met, this function returns
|
|
non-@code{nil}; otherwise it returns @code{nil}.
|
|
@end defun
|
|
|
|
@defun treesit-major-mode-setup
|
|
This function activates some tree-sitter features for a major mode.
|
|
|
|
Currently, it sets up the following features:
|
|
@itemize
|
|
@item
|
|
If @code{treesit-font-lock-settings} (@pxref{Parser-based Font Lock})
|
|
is non-@code{nil}, it sets up fontification.
|
|
|
|
@item
|
|
If either @code{treesit-simple-indent-rules} or
|
|
@code{treesit-indent-function} (@pxref{Parser-based Indentation}) is
|
|
non-@code{nil}, it sets up indentation.
|
|
|
|
@item
|
|
If @code{treesit-defun-type-regexp} is non-@code{nil}, it sets up
|
|
navigation functions for @code{beginning-of-defun} and
|
|
@code{end-of-defun}.
|
|
|
|
@item
|
|
If @code{treesit-defun-name-function} is non-@code{nil}, it sets up
|
|
add-log functions used by @code{add-log-current-defun}.
|
|
|
|
@item
|
|
If @code{treesit-simple-imenu-settings} (@pxref{Imenu}) is
|
|
non-@code{nil}, it sets up Imenu.
|
|
|
|
@item
|
|
If @code{treesit-outline-predicate} (@pxref{Outline Minor Mode}) is
|
|
non-@code{nil}, it sets up Outline minor mode.
|
|
|
|
@item
|
|
If @code{sexp} and/or @code{sentence} are defined in
|
|
@code{treesit-thing-settings} (@pxref{User-defined Things}), it enables
|
|
navigation commands that move, respectively, by sexps and sentences by
|
|
defining variables such as @code{forward-sexp-function} and
|
|
@code{forward-sentence-function}.
|
|
@end itemize
|
|
|
|
@c TODO: Add treesit-thing-settings stuff once we finalize it.
|
|
@end defun
|
|
|
|
For more information on these built-in tree-sitter features,
|
|
@pxref{Parser-based Font Lock}, @pxref{Parser-based Indentation}, and
|
|
@pxref{List Motion}.
|
|
|
|
For supporting mixing of multiple languages in a major mode,
|
|
@pxref{Multiple Languages}.
|
|
|
|
Besides @code{beginning-of-defun} and @code{end-of-defun}, Emacs
|
|
provides some additional functions for working with defuns:
|
|
@code{treesit-defun-at-point} returns the defun node at point, and
|
|
@code{treesit-defun-name} returns the name of a defun node.
|
|
|
|
@c FIXME: Cross-reference to treesit-defun-tactic once we have it in
|
|
@c the user manual.
|
|
@defun treesit-defun-at-point
|
|
This function returns the defun node at point, or @code{nil} if none
|
|
is found. It respects @code{treesit-defun-tactic}: if its value is
|
|
@code{top-level}, this function returns the top-level defun, and if
|
|
its value is @code{nested}, it returns the immediate enclosing defun.
|
|
|
|
This function requires @code{treesit-defun-type-regexp} to work. If
|
|
it is @code{nil}, this function simply returns @code{nil}.
|
|
@end defun
|
|
|
|
@defun treesit-defun-name node
|
|
This function returns the defun name of @var{node}. It returns
|
|
@code{nil} if there is no defun name for @var{node}, or if @var{node}
|
|
is not a defun node, or if @var{node} is @code{nil}.
|
|
|
|
Depending on the language and major mode, the defun names are names
|
|
like function name, class name, struct name, etc.
|
|
|
|
If @code{treesit-defun-name-function} is @code{nil}, this function
|
|
always returns @code{nil}.
|
|
@end defun
|
|
|
|
@defvar treesit-defun-name-function
|
|
If non-@code{nil}, this variable's value should be a function that is
|
|
called with a node as its argument, and returns the defun name of the
|
|
node. The function should have the same semantics as
|
|
@code{treesit-defun-name}: if the node is not a defun node, or the
|
|
node is a defun node but doesn't have a name, or the node is
|
|
@code{nil}, it should return @code{nil}.
|
|
@end defvar
|
|
|
|
@node Tree-sitter C API
|
|
@section Tree-sitter C API Correspondence
|
|
|
|
Emacs's tree-sitter integration doesn't expose every feature
|
|
provided by tree-sitter's C API@. Missing features include:
|
|
|
|
@itemize
|
|
@item
|
|
Creating a tree cursor and navigating the syntax tree with it.
|
|
@item
|
|
Setting timeout and cancellation flag for a parser.
|
|
@item
|
|
Setting the logger for a parser.
|
|
@item
|
|
Printing a @acronym{DOT} graph of the syntax tree to a file.
|
|
@item
|
|
Copying and modifying a syntax tree. (Emacs doesn't expose a tree
|
|
object.)
|
|
@item
|
|
Using (row, column) coordinates as position.
|
|
@item
|
|
Updating a node with changes. (In Emacs, retrieve a new node instead
|
|
of updating the existing one.)
|
|
@item
|
|
Querying statics of a language grammar.
|
|
@end itemize
|
|
|
|
In addition, Emacs makes some changes to the C API to make the API more
|
|
convenient and idiomatic:
|
|
|
|
@itemize
|
|
@item
|
|
Instead of using byte positions, the Emacs Lisp API uses character
|
|
positions.
|
|
@item
|
|
Null nodes are converted to @code{nil}.
|
|
@end itemize
|
|
|
|
Below is the correspondence between all C API functions and their
|
|
ELisp counterparts. Sometimes one ELisp function corresponds to
|
|
multiple C functions, and many C functions don't have an ELisp
|
|
counterpart.
|
|
|
|
@example
|
|
ts_parser_new treesit-parser-create
|
|
ts_parser_delete
|
|
ts_parser_set_language
|
|
ts_parser_language treesit-parser-language
|
|
ts_parser_set_included_ranges treesit-parser-set-included-ranges
|
|
ts_parser_included_ranges treesit-parser-included-ranges
|
|
ts_parser_parse
|
|
ts_parser_parse_string treesit-parse-string
|
|
ts_parser_parse_string_encoding
|
|
ts_parser_reset
|
|
ts_parser_set_timeout_micros
|
|
ts_parser_timeout_micros
|
|
ts_parser_set_cancellation_flag
|
|
ts_parser_cancellation_flag
|
|
ts_parser_set_logger
|
|
ts_parser_logger
|
|
ts_parser_print_dot_graphs
|
|
ts_tree_copy
|
|
ts_tree_delete
|
|
ts_tree_root_node
|
|
ts_tree_language
|
|
ts_tree_edit
|
|
ts_tree_get_changed_ranges
|
|
ts_tree_print_dot_graph
|
|
ts_node_type treesit-node-type
|
|
ts_node_symbol
|
|
ts_node_start_byte treesit-node-start
|
|
ts_node_start_point
|
|
ts_node_end_byte treesit-node-end
|
|
ts_node_end_point
|
|
ts_node_string treesit-node-string
|
|
ts_node_is_null
|
|
ts_node_is_named treesit-node-check
|
|
ts_node_is_missing treesit-node-check
|
|
ts_node_is_extra treesit-node-check
|
|
ts_node_has_changes
|
|
ts_node_has_error treesit-node-check
|
|
ts_node_parent treesit-node-parent
|
|
ts_node_child treesit-node-child
|
|
ts_node_field_name_for_child treesit-node-field-name-for-child
|
|
ts_node_child_count treesit-node-child-count
|
|
ts_node_named_child treesit-node-child
|
|
ts_node_named_child_count treesit-node-child-count
|
|
ts_node_child_by_field_name treesit-node-child-by-field-name
|
|
ts_node_child_by_field_id
|
|
ts_node_next_sibling treesit-node-next-sibling
|
|
ts_node_prev_sibling treesit-node-prev-sibling
|
|
ts_node_next_named_sibling treesit-node-next-sibling
|
|
ts_node_prev_named_sibling treesit-node-prev-sibling
|
|
ts_node_first_child_for_byte treesit-node-first-child-for-pos
|
|
ts_node_first_named_child_for_byte treesit-node-first-child-for-pos
|
|
ts_node_descendant_for_byte_range treesit-node-descendant-for-range
|
|
ts_node_descendant_for_point_range
|
|
ts_node_named_descendant_for_byte_range treesit-node-descendant-for-range
|
|
ts_node_named_descendant_for_point_range
|
|
ts_node_edit
|
|
ts_node_eq treesit-node-eq
|
|
ts_tree_cursor_new
|
|
ts_tree_cursor_delete
|
|
ts_tree_cursor_reset
|
|
ts_tree_cursor_current_node
|
|
ts_tree_cursor_current_field_name
|
|
ts_tree_cursor_current_field_id
|
|
ts_tree_cursor_goto_parent
|
|
ts_tree_cursor_goto_next_sibling
|
|
ts_tree_cursor_goto_first_child
|
|
ts_tree_cursor_goto_first_child_for_byte
|
|
ts_tree_cursor_goto_first_child_for_point
|
|
ts_tree_cursor_copy
|
|
ts_query_new
|
|
ts_query_delete
|
|
ts_query_pattern_count
|
|
ts_query_capture_count
|
|
ts_query_string_count
|
|
ts_query_start_byte_for_pattern
|
|
ts_query_predicates_for_pattern
|
|
ts_query_step_is_definite
|
|
ts_query_capture_name_for_id
|
|
ts_query_string_value_for_id
|
|
ts_query_disable_capture
|
|
ts_query_disable_pattern
|
|
ts_query_cursor_new
|
|
ts_query_cursor_delete
|
|
ts_query_cursor_exec treesit-query-capture
|
|
ts_query_cursor_did_exceed_match_limit
|
|
ts_query_cursor_match_limit
|
|
ts_query_cursor_set_match_limit
|
|
ts_query_cursor_set_byte_range
|
|
ts_query_cursor_set_point_range
|
|
ts_query_cursor_next_match
|
|
ts_query_cursor_remove_match
|
|
ts_query_cursor_next_capture
|
|
ts_language_symbol_count
|
|
ts_language_symbol_name
|
|
ts_language_symbol_for_name
|
|
ts_language_field_count
|
|
ts_language_field_name_for_id
|
|
ts_language_field_id_for_name
|
|
ts_language_symbol_type
|
|
ts_language_version
|
|
@end example
|