2017-06-26 21:46:34 +00:00
|
|
|
// Copyright 2016-2017, Pulumi Corporation. All rights reserved.
|
Begin overhauling semantic phases
This change further merges the new AST and MuPack/MuIL formats and
abstractions into the core of the compiler. A good amount of the old
code is gone now; I decided against ripping it all out in one fell
swoop so that I can methodically check that we are preserving all
relevant decisions and/or functionality we had in the old model.
The changes are too numerous to outline in this commit message,
however, here are the noteworthy ones:
* Split up the notion of symbols and tokens, resulting in:
- pkg/symbols for true compiler symbols (bound nodes)
- pkg/tokens for name-based tokens, identifiers, constants
* Several packages move underneath pkg/compiler:
- pkg/ast becomes pkg/compiler/ast
- pkg/errors becomes pkg/compiler/errors
- pkg/symbols becomes pkg/compiler/symbols
* pkg/ast/... becomes pkg/compiler/legacy/ast/...
* pkg/pack/ast becomes pkg/compiler/ast.
* pkg/options goes away, merged back into pkg/compiler.
* All binding functionality moves underneath a dedicated
package, pkg/compiler/binder. The legacy.go file contains
cruft that will eventually go away, while the other files
represent a halfway point between new and old, but are
expected to stay roughly in the current shape.
* All parsing functionality is moved underneath a new
pkg/compiler/metadata namespace, and we adopt new terminology
"metadata reading" since real parsing happens in the MetaMu
compilers. Hence, Parser has become metadata.Reader.
* In general phases of the compiler no longer share access to
the actual compiler.Compiler object. Instead, shared state is
moved to the core.Context object underneath pkg/compiler/core.
* Dependency resolution during binding has been rewritten to
the new model, including stashing bound package symbols in the
context object, and detecting import cycles.
* Compiler construction does not take a workspace object. Instead,
creation of a workspace is entirely hidden inside of the compiler's
constructor logic.
* There are three Compile* functions on the Compiler interface, to
support different styles of invoking compilation: Compile() auto-
detects a Mu package, based on the workspace; CompilePath(string)
loads the target as a Mu package and compiles it, regardless of
the workspace settings; and, CompilePackage(*pack.Package) will
compile a pre-loaded package AST, again regardless of workspace.
* Delete the _fe, _sema, and parsetree phases. They are no longer
relevant and the functionality is largely subsumed by the above.
...and so very much more. I'm surprised I ever got this to compile again!
2017-01-18 20:18:37 +00:00
|
|
|
|
2017-05-18 18:38:28 +00:00
|
|
|
// Package tokens contains the core LumiIL symbol and token types.
|
Begin overhauling semantic phases
This change further merges the new AST and MuPack/MuIL formats and
abstractions into the core of the compiler. A good amount of the old
code is gone now; I decided against ripping it all out in one fell
swoop so that I can methodically check that we are preserving all
relevant decisions and/or functionality we had in the old model.
The changes are too numerous to outline in this commit message,
however, here are the noteworthy ones:
* Split up the notion of symbols and tokens, resulting in:
- pkg/symbols for true compiler symbols (bound nodes)
- pkg/tokens for name-based tokens, identifiers, constants
* Several packages move underneath pkg/compiler:
- pkg/ast becomes pkg/compiler/ast
- pkg/errors becomes pkg/compiler/errors
- pkg/symbols becomes pkg/compiler/symbols
* pkg/ast/... becomes pkg/compiler/legacy/ast/...
* pkg/pack/ast becomes pkg/compiler/ast.
* pkg/options goes away, merged back into pkg/compiler.
* All binding functionality moves underneath a dedicated
package, pkg/compiler/binder. The legacy.go file contains
cruft that will eventually go away, while the other files
represent a halfway point between new and old, but are
expected to stay roughly in the current shape.
* All parsing functionality is moved underneath a new
pkg/compiler/metadata namespace, and we adopt new terminology
"metadata reading" since real parsing happens in the MetaMu
compilers. Hence, Parser has become metadata.Reader.
* In general phases of the compiler no longer share access to
the actual compiler.Compiler object. Instead, shared state is
moved to the core.Context object underneath pkg/compiler/core.
* Dependency resolution during binding has been rewritten to
the new model, including stashing bound package symbols in the
context object, and detecting import cycles.
* Compiler construction does not take a workspace object. Instead,
creation of a workspace is entirely hidden inside of the compiler's
constructor logic.
* There are three Compile* functions on the Compiler interface, to
support different styles of invoking compilation: Compile() auto-
detects a Mu package, based on the workspace; CompilePath(string)
loads the target as a Mu package and compiles it, regardless of
the workspace settings; and, CompilePackage(*pack.Package) will
compile a pre-loaded package AST, again regardless of workspace.
* Delete the _fe, _sema, and parsetree phases. They are no longer
relevant and the functionality is largely subsumed by the above.
...and so very much more. I'm surprised I ever got this to compile again!
2017-01-18 20:18:37 +00:00
|
|
|
package tokens
|
|
|
|
|
2017-01-21 17:08:35 +00:00
|
|
|
import (
|
|
|
|
"strings"
|
|
|
|
|
2017-08-31 21:31:33 +00:00
|
|
|
"github.com/pkg/errors"
|
|
|
|
|
2017-09-22 02:18:21 +00:00
|
|
|
"github.com/pulumi/pulumi/pkg/util/contract"
|
2017-01-21 17:08:35 +00:00
|
|
|
)
|
|
|
|
|
Overhaul names versus tokens
I was sloppy in my use of names versus tokens in the original AST.
Now that we're actually binding things to concrete symbols, etc., we
need to be more precise. In particular, names are just identifiers
that must be "interpreted" in a given lexical context for them to
make any sense; whereas, tokens stand alone and can be resolved without
context other than the set of imported packages, modules, and overall
module structure. As such, names are much simpler than tokens.
As explained in the comments, tokens.Names are simple identifiers:
Name = [A-Za-z_][A-Za-z0-9_]*
and tokens.QNames are fully qualified identifiers delimited by "/":
QName = [ <Name> "/" ]* <Name>
The legal grammar for a token depends on the subset of symbols that
token is meant to represent. However, the most general case, that
accepts all specializations of tokens, is roughly as follows:
Token = <Name> |
<PackageName>
[ ":" <ModuleName>
[ "/" <ModuleMemberName>
[ "." <Class MemberName> ]
]
]
where:
PackageName = <QName>
ModuleName = <QName>
ModuleMemberName = <Name>
ClassMemberName = <Name>
Please refer to the comments in pkg/tokens/tokens.go for more details.
2017-01-20 01:57:20 +00:00
|
|
|
// Token is a qualified name that is capable of resolving to a symbol entirely on its own. Most uses of tokens are
|
Tidy up more lint
This change fixes a few things:
* Most importantly, we need to place a leading "." in the paths
to Gometalinter, otherwise some sub-linters just silently skip
the directory altogether. errcheck is one such linter, which
is a very important one!
* Use an explicit Gometalinter.json file to configure the various
settings. This flips on a few additional linters that aren't
on by default (line line length checking). Sadly, a few that
I'd like to enable take waaaay too much time, so in the future
we may consider a nightly job (this includes code similarity,
unused parameters, unused functions, and others that generally
require global analysis).
* Now that we're running more, however, linting takes a while!
The core Lumi project now takes 26 seconds to lint on my laptop.
That's not terrible, but it's long enough that we don't want to
do the silly "run them twice" thing our Makefiles were previously
doing. Instead, we shall deploy some $$($${PIPESTATUS[1]}-1))-fu
to rely on the fact that grep returns 1 on "zero lines".
* Finally, fix the many issues that this turned up.
I think(?) we are done, except, of course, for needing to drive
down some of the cyclomatic complexity issues (which I'm possibly
going to punt on; see pulumi/lumi#259 for more details).
2017-06-22 19:09:46 +00:00
|
|
|
// typed based on the context, so that a subset of the token syntax is permissible (see the various typedefs below).
|
Overhaul names versus tokens
I was sloppy in my use of names versus tokens in the original AST.
Now that we're actually binding things to concrete symbols, etc., we
need to be more precise. In particular, names are just identifiers
that must be "interpreted" in a given lexical context for them to
make any sense; whereas, tokens stand alone and can be resolved without
context other than the set of imported packages, modules, and overall
module structure. As such, names are much simpler than tokens.
As explained in the comments, tokens.Names are simple identifiers:
Name = [A-Za-z_][A-Za-z0-9_]*
and tokens.QNames are fully qualified identifiers delimited by "/":
QName = [ <Name> "/" ]* <Name>
The legal grammar for a token depends on the subset of symbols that
token is meant to represent. However, the most general case, that
accepts all specializations of tokens, is roughly as follows:
Token = <Name> |
<PackageName>
[ ":" <ModuleName>
[ "/" <ModuleMemberName>
[ "." <Class MemberName> ]
]
]
where:
PackageName = <QName>
ModuleName = <QName>
ModuleMemberName = <Name>
ClassMemberName = <Name>
Please refer to the comments in pkg/tokens/tokens.go for more details.
2017-01-20 01:57:20 +00:00
|
|
|
// However, in its full generality, a token can have a package part, a module part, a module-member part, and a
|
|
|
|
// class-member part. Obviously tokens that are meant to address just a module won't have the module-member part, and
|
|
|
|
// tokens addressing module members won't have the class-member part, etc.
|
|
|
|
//
|
|
|
|
// Token's grammar is as follows:
|
|
|
|
//
|
2017-01-23 22:48:55 +00:00
|
|
|
// Token = <Identifier> |
|
|
|
|
// <QualifiedToken> |
|
|
|
|
// <DecoratedType>
|
|
|
|
// Identifier = <Name>
|
Implement structured token binding
This change fixes a whole host of issues with our current token binding
logic. There are two primary aspects of this change:
First, the prior token syntax was ambiguous, due to our choice of
delimiter characters. For instance, "/" could be used both as a module
member delimiter, in addition to being a valid character for sub-modules.
The result is that we could not look at a token and know for certain
which kind it is. There was also some annoyance with "." being the
delimiter for class members in addition to being the leading character
for special names like ".this", ".super", and ".ctor". Now, we just use
":" as the delimiter character for everything. The result is unambiguous.
Second, the simplistic token table lookup really doesn't work. This is
for three reasons: 1) decorated types like arrays, maps, pointers, and
functions shouldn't need token lookup in the classical sense; 2) largely
because of decorated naming, the mapping of token pieces to symbolic
information isn't straightforward and requires parsing; 3) default modules
need to be expanded and the old method only worked for simple cases and,
in particular, would not work when combined with decorated names.
2017-02-08 22:10:16 +00:00
|
|
|
// QualifiedToken = <PackageName> [ ":" <ModuleName> [ ":" <ModuleMemberName> [ ":" <ClassMemberName> ] ] ]
|
2017-04-19 17:53:14 +00:00
|
|
|
// PackageName = ... similar to <QName>, except dashes permitted ...
|
Overhaul names versus tokens
I was sloppy in my use of names versus tokens in the original AST.
Now that we're actually binding things to concrete symbols, etc., we
need to be more precise. In particular, names are just identifiers
that must be "interpreted" in a given lexical context for them to
make any sense; whereas, tokens stand alone and can be resolved without
context other than the set of imported packages, modules, and overall
module structure. As such, names are much simpler than tokens.
As explained in the comments, tokens.Names are simple identifiers:
Name = [A-Za-z_][A-Za-z0-9_]*
and tokens.QNames are fully qualified identifiers delimited by "/":
QName = [ <Name> "/" ]* <Name>
The legal grammar for a token depends on the subset of symbols that
token is meant to represent. However, the most general case, that
accepts all specializations of tokens, is roughly as follows:
Token = <Name> |
<PackageName>
[ ":" <ModuleName>
[ "/" <ModuleMemberName>
[ "." <Class MemberName> ]
]
]
where:
PackageName = <QName>
ModuleName = <QName>
ModuleMemberName = <Name>
ClassMemberName = <Name>
Please refer to the comments in pkg/tokens/tokens.go for more details.
2017-01-20 01:57:20 +00:00
|
|
|
// ModuleName = <QName>
|
|
|
|
// ModuleMemberName = <Name>
|
|
|
|
// ClassMemberName = <Name>
|
|
|
|
//
|
2017-01-23 22:48:55 +00:00
|
|
|
// A token may be a simple identifier in the case that it refers to a built-in symbol, like a primitive type, or a
|
|
|
|
// variable in scope, rather than a qualified token that is to be bound to a symbol through package/module resolution.
|
Overhaul names versus tokens
I was sloppy in my use of names versus tokens in the original AST.
Now that we're actually binding things to concrete symbols, etc., we
need to be more precise. In particular, names are just identifiers
that must be "interpreted" in a given lexical context for them to
make any sense; whereas, tokens stand alone and can be resolved without
context other than the set of imported packages, modules, and overall
module structure. As such, names are much simpler than tokens.
As explained in the comments, tokens.Names are simple identifiers:
Name = [A-Za-z_][A-Za-z0-9_]*
and tokens.QNames are fully qualified identifiers delimited by "/":
QName = [ <Name> "/" ]* <Name>
The legal grammar for a token depends on the subset of symbols that
token is meant to represent. However, the most general case, that
accepts all specializations of tokens, is roughly as follows:
Token = <Name> |
<PackageName>
[ ":" <ModuleName>
[ "/" <ModuleMemberName>
[ "." <Class MemberName> ]
]
]
where:
PackageName = <QName>
ModuleName = <QName>
ModuleMemberName = <Name>
ClassMemberName = <Name>
Please refer to the comments in pkg/tokens/tokens.go for more details.
2017-01-20 01:57:20 +00:00
|
|
|
//
|
Implement structured token binding
This change fixes a whole host of issues with our current token binding
logic. There are two primary aspects of this change:
First, the prior token syntax was ambiguous, due to our choice of
delimiter characters. For instance, "/" could be used both as a module
member delimiter, in addition to being a valid character for sub-modules.
The result is that we could not look at a token and know for certain
which kind it is. There was also some annoyance with "." being the
delimiter for class members in addition to being the leading character
for special names like ".this", ".super", and ".ctor". Now, we just use
":" as the delimiter character for everything. The result is unambiguous.
Second, the simplistic token table lookup really doesn't work. This is
for three reasons: 1) decorated types like arrays, maps, pointers, and
functions shouldn't need token lookup in the classical sense; 2) largely
because of decorated naming, the mapping of token pieces to symbolic
information isn't straightforward and requires parsing; 3) default modules
need to be expanded and the old method only worked for simple cases and,
in particular, would not work when combined with decorated names.
2017-02-08 22:10:16 +00:00
|
|
|
// Notice that both package and module names may be qualified names (meaning they can have "/"s in them; see QName's
|
Overhaul names versus tokens
I was sloppy in my use of names versus tokens in the original AST.
Now that we're actually binding things to concrete symbols, etc., we
need to be more precise. In particular, names are just identifiers
that must be "interpreted" in a given lexical context for them to
make any sense; whereas, tokens stand alone and can be resolved without
context other than the set of imported packages, modules, and overall
module structure. As such, names are much simpler than tokens.
As explained in the comments, tokens.Names are simple identifiers:
Name = [A-Za-z_][A-Za-z0-9_]*
and tokens.QNames are fully qualified identifiers delimited by "/":
QName = [ <Name> "/" ]* <Name>
The legal grammar for a token depends on the subset of symbols that
token is meant to represent. However, the most general case, that
accepts all specializations of tokens, is roughly as follows:
Token = <Name> |
<PackageName>
[ ":" <ModuleName>
[ "/" <ModuleMemberName>
[ "." <Class MemberName> ]
]
]
where:
PackageName = <QName>
ModuleName = <QName>
ModuleMemberName = <Name>
ClassMemberName = <Name>
Please refer to the comments in pkg/tokens/tokens.go for more details.
2017-01-20 01:57:20 +00:00
|
|
|
// comments), and that module and class members must use unqualified, simple names (meaning they have no delimiters).
|
|
|
|
// The specialized token kinds differ only in what elements they require as part of the token string.
|
2017-01-23 22:48:55 +00:00
|
|
|
//
|
|
|
|
// Finally, a token may also be a decorated type. This is for built-in array, map, pointer, and function types:
|
|
|
|
//
|
|
|
|
// DecoratedType = "*" <Token> |
|
|
|
|
// "[]" <Token> |
|
|
|
|
// "map[" <Token> "]" <Token> |
|
|
|
|
// "(" [ <Token> [ "," <Token> ]* ] ")" <Token>?
|
|
|
|
//
|
|
|
|
// Notice that a recursive parsing process is required to extract elements from a <DecoratedType> token.
|
Overhaul names versus tokens
I was sloppy in my use of names versus tokens in the original AST.
Now that we're actually binding things to concrete symbols, etc., we
need to be more precise. In particular, names are just identifiers
that must be "interpreted" in a given lexical context for them to
make any sense; whereas, tokens stand alone and can be resolved without
context other than the set of imported packages, modules, and overall
module structure. As such, names are much simpler than tokens.
As explained in the comments, tokens.Names are simple identifiers:
Name = [A-Za-z_][A-Za-z0-9_]*
and tokens.QNames are fully qualified identifiers delimited by "/":
QName = [ <Name> "/" ]* <Name>
The legal grammar for a token depends on the subset of symbols that
token is meant to represent. However, the most general case, that
accepts all specializations of tokens, is roughly as follows:
Token = <Name> |
<PackageName>
[ ":" <ModuleName>
[ "/" <ModuleMemberName>
[ "." <Class MemberName> ]
]
]
where:
PackageName = <QName>
ModuleName = <QName>
ModuleMemberName = <Name>
ClassMemberName = <Name>
Please refer to the comments in pkg/tokens/tokens.go for more details.
2017-01-20 01:57:20 +00:00
|
|
|
type Token string
|
|
|
|
|
Implement structured token binding
This change fixes a whole host of issues with our current token binding
logic. There are two primary aspects of this change:
First, the prior token syntax was ambiguous, due to our choice of
delimiter characters. For instance, "/" could be used both as a module
member delimiter, in addition to being a valid character for sub-modules.
The result is that we could not look at a token and know for certain
which kind it is. There was also some annoyance with "." being the
delimiter for class members in addition to being the leading character
for special names like ".this", ".super", and ".ctor". Now, we just use
":" as the delimiter character for everything. The result is unambiguous.
Second, the simplistic token table lookup really doesn't work. This is
for three reasons: 1) decorated types like arrays, maps, pointers, and
functions shouldn't need token lookup in the classical sense; 2) largely
because of decorated naming, the mapping of token pieces to symbolic
information isn't straightforward and requires parsing; 3) default modules
need to be expanded and the old method only worked for simple cases and,
in particular, would not work when combined with decorated names.
2017-02-08 22:10:16 +00:00
|
|
|
const TokenDelimiter string = ":" // the character delimiting portions of a qualified token.
|
|
|
|
|
|
|
|
func (tok Token) Delimiters() int { return strings.Count(string(tok), TokenDelimiter) }
|
|
|
|
func (tok Token) HasModule() bool { return tok.Delimiters() > 0 }
|
|
|
|
func (tok Token) HasModuleMember() bool { return tok.Delimiters() > 1 }
|
|
|
|
func (tok Token) HasClassMember() bool { return tok.Delimiters() > 2 }
|
|
|
|
func (tok Token) Simple() bool { return tok.Delimiters() == 0 }
|
|
|
|
func (tok Token) String() string { return string(tok) }
|
|
|
|
|
|
|
|
// delimiter returns the Nth index of a delimiter, as specified by the argument.
|
|
|
|
func (tok Token) delimiter(n int) int {
|
|
|
|
ix := -1
|
|
|
|
for n > 0 {
|
|
|
|
// Make sure we still have space.
|
|
|
|
if ix+1 >= len(tok) {
|
|
|
|
ix = -1
|
|
|
|
break
|
|
|
|
}
|
|
|
|
|
|
|
|
// If we do, keep looking for the next delimiter.
|
|
|
|
nix := strings.Index(string(tok[ix+1:]), TokenDelimiter)
|
|
|
|
if nix == -1 {
|
|
|
|
break
|
|
|
|
}
|
|
|
|
ix += 1 + nix
|
|
|
|
|
|
|
|
n--
|
|
|
|
}
|
|
|
|
return ix
|
2017-01-22 17:45:58 +00:00
|
|
|
}
|
|
|
|
|
2017-01-25 18:51:04 +00:00
|
|
|
// Name returns the Token as a Name (and assumes it is a legal one).
|
|
|
|
func (tok Token) Name() Name {
|
|
|
|
contract.Requiref(tok.Simple(), "tok", "Simple")
|
2017-02-11 23:45:37 +00:00
|
|
|
contract.Requiref(IsName(tok.String()), "tok", "IsName(%v)", tok)
|
2017-01-25 18:51:04 +00:00
|
|
|
return Name(tok.String())
|
|
|
|
}
|
|
|
|
|
Implement structured token binding
This change fixes a whole host of issues with our current token binding
logic. There are two primary aspects of this change:
First, the prior token syntax was ambiguous, due to our choice of
delimiter characters. For instance, "/" could be used both as a module
member delimiter, in addition to being a valid character for sub-modules.
The result is that we could not look at a token and know for certain
which kind it is. There was also some annoyance with "." being the
delimiter for class members in addition to being the leading character
for special names like ".this", ".super", and ".ctor". Now, we just use
":" as the delimiter character for everything. The result is unambiguous.
Second, the simplistic token table lookup really doesn't work. This is
for three reasons: 1) decorated types like arrays, maps, pointers, and
functions shouldn't need token lookup in the classical sense; 2) largely
because of decorated naming, the mapping of token pieces to symbolic
information isn't straightforward and requires parsing; 3) default modules
need to be expanded and the old method only worked for simple cases and,
in particular, would not work when combined with decorated names.
2017-02-08 22:10:16 +00:00
|
|
|
// Package extracts the package from the token, assuming one exists.
|
|
|
|
func (tok Token) Package() Package {
|
|
|
|
if t := Type(tok); t.Decorated() || t.Primitive() {
|
|
|
|
return "" // decorated and primitive types are built-in (and hence have no package).
|
|
|
|
}
|
|
|
|
if tok.HasModule() {
|
|
|
|
return Package(tok[:tok.delimiter(1)])
|
|
|
|
}
|
|
|
|
return Package(tok)
|
|
|
|
}
|
|
|
|
|
|
|
|
// Module extracts the module portion from the token, assuming one exists.
|
|
|
|
func (tok Token) Module() Module {
|
|
|
|
if tok.HasModule() {
|
|
|
|
if tok.HasModuleMember() {
|
|
|
|
return Module(tok[:tok.delimiter(2)])
|
|
|
|
}
|
|
|
|
return Module(tok)
|
|
|
|
}
|
|
|
|
return Module("")
|
|
|
|
}
|
|
|
|
|
|
|
|
// ModuleMember extracts the module member portion from the token, assuming one exists.
|
|
|
|
func (tok Token) ModuleMember() ModuleMember {
|
|
|
|
if tok.HasModuleMember() {
|
|
|
|
if tok.HasClassMember() {
|
|
|
|
return ModuleMember(tok[:tok.delimiter(3)])
|
|
|
|
}
|
|
|
|
return ModuleMember(tok)
|
2017-01-27 00:49:38 +00:00
|
|
|
}
|
Implement structured token binding
This change fixes a whole host of issues with our current token binding
logic. There are two primary aspects of this change:
First, the prior token syntax was ambiguous, due to our choice of
delimiter characters. For instance, "/" could be used both as a module
member delimiter, in addition to being a valid character for sub-modules.
The result is that we could not look at a token and know for certain
which kind it is. There was also some annoyance with "." being the
delimiter for class members in addition to being the leading character
for special names like ".this", ".super", and ".ctor". Now, we just use
":" as the delimiter character for everything. The result is unambiguous.
Second, the simplistic token table lookup really doesn't work. This is
for three reasons: 1) decorated types like arrays, maps, pointers, and
functions shouldn't need token lookup in the classical sense; 2) largely
because of decorated naming, the mapping of token pieces to symbolic
information isn't straightforward and requires parsing; 3) default modules
need to be expanded and the old method only worked for simple cases and,
in particular, would not work when combined with decorated names.
2017-02-08 22:10:16 +00:00
|
|
|
return ModuleMember("")
|
2017-01-27 00:49:38 +00:00
|
|
|
}
|
|
|
|
|
Implement structured token binding
This change fixes a whole host of issues with our current token binding
logic. There are two primary aspects of this change:
First, the prior token syntax was ambiguous, due to our choice of
delimiter characters. For instance, "/" could be used both as a module
member delimiter, in addition to being a valid character for sub-modules.
The result is that we could not look at a token and know for certain
which kind it is. There was also some annoyance with "." being the
delimiter for class members in addition to being the leading character
for special names like ".this", ".super", and ".ctor". Now, we just use
":" as the delimiter character for everything. The result is unambiguous.
Second, the simplistic token table lookup really doesn't work. This is
for three reasons: 1) decorated types like arrays, maps, pointers, and
functions shouldn't need token lookup in the classical sense; 2) largely
because of decorated naming, the mapping of token pieces to symbolic
information isn't straightforward and requires parsing; 3) default modules
need to be expanded and the old method only worked for simple cases and,
in particular, would not work when combined with decorated names.
2017-02-08 22:10:16 +00:00
|
|
|
// ClassMember extracts the class member portion from the token, assuming one exists.
|
|
|
|
func (tok Token) ClassMember() ClassMember {
|
2017-01-27 00:49:38 +00:00
|
|
|
if tok.HasClassMember() {
|
Implement structured token binding
This change fixes a whole host of issues with our current token binding
logic. There are two primary aspects of this change:
First, the prior token syntax was ambiguous, due to our choice of
delimiter characters. For instance, "/" could be used both as a module
member delimiter, in addition to being a valid character for sub-modules.
The result is that we could not look at a token and know for certain
which kind it is. There was also some annoyance with "." being the
delimiter for class members in addition to being the leading character
for special names like ".this", ".super", and ".ctor". Now, we just use
":" as the delimiter character for everything. The result is unambiguous.
Second, the simplistic token table lookup really doesn't work. This is
for three reasons: 1) decorated types like arrays, maps, pointers, and
functions shouldn't need token lookup in the classical sense; 2) largely
because of decorated naming, the mapping of token pieces to symbolic
information isn't straightforward and requires parsing; 3) default modules
need to be expanded and the old method only worked for simple cases and,
in particular, would not work when combined with decorated names.
2017-02-08 22:10:16 +00:00
|
|
|
return ClassMember(tok)
|
2017-01-25 17:29:34 +00:00
|
|
|
}
|
Implement structured token binding
This change fixes a whole host of issues with our current token binding
logic. There are two primary aspects of this change:
First, the prior token syntax was ambiguous, due to our choice of
delimiter characters. For instance, "/" could be used both as a module
member delimiter, in addition to being a valid character for sub-modules.
The result is that we could not look at a token and know for certain
which kind it is. There was also some annoyance with "." being the
delimiter for class members in addition to being the leading character
for special names like ".this", ".super", and ".ctor". Now, we just use
":" as the delimiter character for everything. The result is unambiguous.
Second, the simplistic token table lookup really doesn't work. This is
for three reasons: 1) decorated types like arrays, maps, pointers, and
functions shouldn't need token lookup in the classical sense; 2) largely
because of decorated naming, the mapping of token pieces to symbolic
information isn't straightforward and requires parsing; 3) default modules
need to be expanded and the old method only worked for simple cases and,
in particular, would not work when combined with decorated names.
2017-02-08 22:10:16 +00:00
|
|
|
return ClassMember("")
|
2017-01-25 17:29:34 +00:00
|
|
|
}
|
|
|
|
|
Overhaul names versus tokens
I was sloppy in my use of names versus tokens in the original AST.
Now that we're actually binding things to concrete symbols, etc., we
need to be more precise. In particular, names are just identifiers
that must be "interpreted" in a given lexical context for them to
make any sense; whereas, tokens stand alone and can be resolved without
context other than the set of imported packages, modules, and overall
module structure. As such, names are much simpler than tokens.
As explained in the comments, tokens.Names are simple identifiers:
Name = [A-Za-z_][A-Za-z0-9_]*
and tokens.QNames are fully qualified identifiers delimited by "/":
QName = [ <Name> "/" ]* <Name>
The legal grammar for a token depends on the subset of symbols that
token is meant to represent. However, the most general case, that
accepts all specializations of tokens, is roughly as follows:
Token = <Name> |
<PackageName>
[ ":" <ModuleName>
[ "/" <ModuleMemberName>
[ "." <Class MemberName> ]
]
]
where:
PackageName = <QName>
ModuleName = <QName>
ModuleMemberName = <Name>
ClassMemberName = <Name>
Please refer to the comments in pkg/tokens/tokens.go for more details.
2017-01-20 01:57:20 +00:00
|
|
|
// Package is a token representing just a package. It uses a much simpler grammar:
|
|
|
|
// Package = <PackageName>
|
|
|
|
// Note that a package name of "." means "current package", to simplify emission and lookups.
|
|
|
|
type Package Token
|
|
|
|
|
2017-01-21 20:25:59 +00:00
|
|
|
func NewPackageToken(nm PackageName) Package {
|
2017-04-19 17:53:14 +00:00
|
|
|
contract.Assertf(IsPackageName(string(nm)), "Package name '%v' is not a legal qualified name", nm)
|
2017-01-21 19:04:03 +00:00
|
|
|
return Package(nm)
|
|
|
|
}
|
|
|
|
|
2017-01-21 17:08:35 +00:00
|
|
|
func (tok Package) Name() PackageName {
|
|
|
|
return PackageName(tok)
|
|
|
|
}
|
|
|
|
|
2017-01-21 20:25:59 +00:00
|
|
|
func (tok Package) String() string { return string(tok) }
|
|
|
|
|
Overhaul names versus tokens
I was sloppy in my use of names versus tokens in the original AST.
Now that we're actually binding things to concrete symbols, etc., we
need to be more precise. In particular, names are just identifiers
that must be "interpreted" in a given lexical context for them to
make any sense; whereas, tokens stand alone and can be resolved without
context other than the set of imported packages, modules, and overall
module structure. As such, names are much simpler than tokens.
As explained in the comments, tokens.Names are simple identifiers:
Name = [A-Za-z_][A-Za-z0-9_]*
and tokens.QNames are fully qualified identifiers delimited by "/":
QName = [ <Name> "/" ]* <Name>
The legal grammar for a token depends on the subset of symbols that
token is meant to represent. However, the most general case, that
accepts all specializations of tokens, is roughly as follows:
Token = <Name> |
<PackageName>
[ ":" <ModuleName>
[ "/" <ModuleMemberName>
[ "." <Class MemberName> ]
]
]
where:
PackageName = <QName>
ModuleName = <QName>
ModuleMemberName = <Name>
ClassMemberName = <Name>
Please refer to the comments in pkg/tokens/tokens.go for more details.
2017-01-20 01:57:20 +00:00
|
|
|
// Module is a token representing a module. It uses the following subset of the token grammar:
|
|
|
|
// Module = <Package> ":" <ModuleName>
|
|
|
|
// Note that a module name of "." means "current module", to simplify emission and lookups.
|
|
|
|
type Module Token
|
|
|
|
|
2017-01-21 20:25:59 +00:00
|
|
|
func NewModuleToken(pkg Package, nm ModuleName) Module {
|
2017-02-08 17:12:09 +00:00
|
|
|
contract.Assertf(IsQName(string(nm)), "Package '%v' module name '%v' is not a legal qualified name", pkg, nm)
|
Implement structured token binding
This change fixes a whole host of issues with our current token binding
logic. There are two primary aspects of this change:
First, the prior token syntax was ambiguous, due to our choice of
delimiter characters. For instance, "/" could be used both as a module
member delimiter, in addition to being a valid character for sub-modules.
The result is that we could not look at a token and know for certain
which kind it is. There was also some annoyance with "." being the
delimiter for class members in addition to being the leading character
for special names like ".this", ".super", and ".ctor". Now, we just use
":" as the delimiter character for everything. The result is unambiguous.
Second, the simplistic token table lookup really doesn't work. This is
for three reasons: 1) decorated types like arrays, maps, pointers, and
functions shouldn't need token lookup in the classical sense; 2) largely
because of decorated naming, the mapping of token pieces to symbolic
information isn't straightforward and requires parsing; 3) default modules
need to be expanded and the old method only worked for simple cases and,
in particular, would not work when combined with decorated names.
2017-02-08 22:10:16 +00:00
|
|
|
return Module(string(pkg) + TokenDelimiter + string(nm))
|
2017-01-21 19:04:03 +00:00
|
|
|
}
|
|
|
|
|
2017-01-21 17:08:35 +00:00
|
|
|
func (tok Module) Package() Package {
|
Implement structured token binding
This change fixes a whole host of issues with our current token binding
logic. There are two primary aspects of this change:
First, the prior token syntax was ambiguous, due to our choice of
delimiter characters. For instance, "/" could be used both as a module
member delimiter, in addition to being a valid character for sub-modules.
The result is that we could not look at a token and know for certain
which kind it is. There was also some annoyance with "." being the
delimiter for class members in addition to being the leading character
for special names like ".this", ".super", and ".ctor". Now, we just use
":" as the delimiter character for everything. The result is unambiguous.
Second, the simplistic token table lookup really doesn't work. This is
for three reasons: 1) decorated types like arrays, maps, pointers, and
functions shouldn't need token lookup in the classical sense; 2) largely
because of decorated naming, the mapping of token pieces to symbolic
information isn't straightforward and requires parsing; 3) default modules
need to be expanded and the old method only worked for simple cases and,
in particular, would not work when combined with decorated names.
2017-02-08 22:10:16 +00:00
|
|
|
t := Token(tok)
|
|
|
|
contract.Assertf(t.HasModule(), "Module token '%v' missing module delimiter", tok)
|
|
|
|
return Package(tok[:t.delimiter(1)])
|
2017-01-21 17:08:35 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
func (tok Module) Name() ModuleName {
|
Implement structured token binding
This change fixes a whole host of issues with our current token binding
logic. There are two primary aspects of this change:
First, the prior token syntax was ambiguous, due to our choice of
delimiter characters. For instance, "/" could be used both as a module
member delimiter, in addition to being a valid character for sub-modules.
The result is that we could not look at a token and know for certain
which kind it is. There was also some annoyance with "." being the
delimiter for class members in addition to being the leading character
for special names like ".this", ".super", and ".ctor". Now, we just use
":" as the delimiter character for everything. The result is unambiguous.
Second, the simplistic token table lookup really doesn't work. This is
for three reasons: 1) decorated types like arrays, maps, pointers, and
functions shouldn't need token lookup in the classical sense; 2) largely
because of decorated naming, the mapping of token pieces to symbolic
information isn't straightforward and requires parsing; 3) default modules
need to be expanded and the old method only worked for simple cases and,
in particular, would not work when combined with decorated names.
2017-02-08 22:10:16 +00:00
|
|
|
t := Token(tok)
|
|
|
|
contract.Assertf(t.HasModule(), "Module token '%v' missing module delimiter", tok)
|
|
|
|
return ModuleName(tok[t.delimiter(1)+1:])
|
2017-01-21 17:08:35 +00:00
|
|
|
}
|
|
|
|
|
2017-01-21 20:25:59 +00:00
|
|
|
func (tok Module) String() string { return string(tok) }
|
|
|
|
|
Overhaul names versus tokens
I was sloppy in my use of names versus tokens in the original AST.
Now that we're actually binding things to concrete symbols, etc., we
need to be more precise. In particular, names are just identifiers
that must be "interpreted" in a given lexical context for them to
make any sense; whereas, tokens stand alone and can be resolved without
context other than the set of imported packages, modules, and overall
module structure. As such, names are much simpler than tokens.
As explained in the comments, tokens.Names are simple identifiers:
Name = [A-Za-z_][A-Za-z0-9_]*
and tokens.QNames are fully qualified identifiers delimited by "/":
QName = [ <Name> "/" ]* <Name>
The legal grammar for a token depends on the subset of symbols that
token is meant to represent. However, the most general case, that
accepts all specializations of tokens, is roughly as follows:
Token = <Name> |
<PackageName>
[ ":" <ModuleName>
[ "/" <ModuleMemberName>
[ "." <Class MemberName> ]
]
]
where:
PackageName = <QName>
ModuleName = <QName>
ModuleMemberName = <Name>
ClassMemberName = <Name>
Please refer to the comments in pkg/tokens/tokens.go for more details.
2017-01-20 01:57:20 +00:00
|
|
|
// ModuleMember is a token representing a module's member. It uses the following grammar. Note that this is not
|
|
|
|
// ambiguous because member names cannot contain slashes, and so the "last" slash in a name delimits the member:
|
|
|
|
// ModuleMember = <Module> "/" <ModuleMemberName>
|
|
|
|
type ModuleMember Token
|
|
|
|
|
2017-01-21 20:25:59 +00:00
|
|
|
func NewModuleMemberToken(mod Module, nm ModuleMemberName) ModuleMember {
|
2017-02-08 17:12:09 +00:00
|
|
|
contract.Assertf(IsName(string(nm)), "Module '%v' member name '%v' is not a legal name", mod, nm)
|
Implement structured token binding
This change fixes a whole host of issues with our current token binding
logic. There are two primary aspects of this change:
First, the prior token syntax was ambiguous, due to our choice of
delimiter characters. For instance, "/" could be used both as a module
member delimiter, in addition to being a valid character for sub-modules.
The result is that we could not look at a token and know for certain
which kind it is. There was also some annoyance with "." being the
delimiter for class members in addition to being the leading character
for special names like ".this", ".super", and ".ctor". Now, we just use
":" as the delimiter character for everything. The result is unambiguous.
Second, the simplistic token table lookup really doesn't work. This is
for three reasons: 1) decorated types like arrays, maps, pointers, and
functions shouldn't need token lookup in the classical sense; 2) largely
because of decorated naming, the mapping of token pieces to symbolic
information isn't straightforward and requires parsing; 3) default modules
need to be expanded and the old method only worked for simple cases and,
in particular, would not work when combined with decorated names.
2017-02-08 22:10:16 +00:00
|
|
|
return ModuleMember(string(mod) + TokenDelimiter + string(nm))
|
2017-01-21 19:04:03 +00:00
|
|
|
}
|
|
|
|
|
2017-08-31 21:31:33 +00:00
|
|
|
// ParseModuleMember attempts to turn the string s into a module member, returning an error if it isn't a valid one.
|
|
|
|
func ParseModuleMember(s string) (ModuleMember, error) {
|
|
|
|
if !Token(s).HasModuleMember() {
|
|
|
|
return "", errors.Errorf("String '%v' is not a valid module member", s)
|
|
|
|
}
|
|
|
|
return ModuleMember(s), nil
|
|
|
|
}
|
|
|
|
|
2017-01-21 17:08:35 +00:00
|
|
|
func (tok ModuleMember) Package() Package {
|
|
|
|
return tok.Module().Package()
|
|
|
|
}
|
|
|
|
|
|
|
|
func (tok ModuleMember) Module() Module {
|
Implement structured token binding
This change fixes a whole host of issues with our current token binding
logic. There are two primary aspects of this change:
First, the prior token syntax was ambiguous, due to our choice of
delimiter characters. For instance, "/" could be used both as a module
member delimiter, in addition to being a valid character for sub-modules.
The result is that we could not look at a token and know for certain
which kind it is. There was also some annoyance with "." being the
delimiter for class members in addition to being the leading character
for special names like ".this", ".super", and ".ctor". Now, we just use
":" as the delimiter character for everything. The result is unambiguous.
Second, the simplistic token table lookup really doesn't work. This is
for three reasons: 1) decorated types like arrays, maps, pointers, and
functions shouldn't need token lookup in the classical sense; 2) largely
because of decorated naming, the mapping of token pieces to symbolic
information isn't straightforward and requires parsing; 3) default modules
need to be expanded and the old method only worked for simple cases and,
in particular, would not work when combined with decorated names.
2017-02-08 22:10:16 +00:00
|
|
|
t := Token(tok)
|
|
|
|
contract.Assertf(t.HasModuleMember(), "Module member token '%v' missing module member delimiter", tok)
|
|
|
|
return Module(tok[:t.delimiter(2)])
|
2017-01-21 17:08:35 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
func (tok ModuleMember) Name() ModuleMemberName {
|
Implement structured token binding
This change fixes a whole host of issues with our current token binding
logic. There are two primary aspects of this change:
First, the prior token syntax was ambiguous, due to our choice of
delimiter characters. For instance, "/" could be used both as a module
member delimiter, in addition to being a valid character for sub-modules.
The result is that we could not look at a token and know for certain
which kind it is. There was also some annoyance with "." being the
delimiter for class members in addition to being the leading character
for special names like ".this", ".super", and ".ctor". Now, we just use
":" as the delimiter character for everything. The result is unambiguous.
Second, the simplistic token table lookup really doesn't work. This is
for three reasons: 1) decorated types like arrays, maps, pointers, and
functions shouldn't need token lookup in the classical sense; 2) largely
because of decorated naming, the mapping of token pieces to symbolic
information isn't straightforward and requires parsing; 3) default modules
need to be expanded and the old method only worked for simple cases and,
in particular, would not work when combined with decorated names.
2017-02-08 22:10:16 +00:00
|
|
|
t := Token(tok)
|
|
|
|
contract.Assertf(t.HasModuleMember(), "Module member token '%v' missing module member delimiter", tok)
|
|
|
|
return ModuleMemberName(tok[t.delimiter(2)+1:])
|
2017-01-21 17:08:35 +00:00
|
|
|
}
|
|
|
|
|
2017-01-21 20:25:59 +00:00
|
|
|
func (tok ModuleMember) String() string { return string(tok) }
|
|
|
|
|
Overhaul names versus tokens
I was sloppy in my use of names versus tokens in the original AST.
Now that we're actually binding things to concrete symbols, etc., we
need to be more precise. In particular, names are just identifiers
that must be "interpreted" in a given lexical context for them to
make any sense; whereas, tokens stand alone and can be resolved without
context other than the set of imported packages, modules, and overall
module structure. As such, names are much simpler than tokens.
As explained in the comments, tokens.Names are simple identifiers:
Name = [A-Za-z_][A-Za-z0-9_]*
and tokens.QNames are fully qualified identifiers delimited by "/":
QName = [ <Name> "/" ]* <Name>
The legal grammar for a token depends on the subset of symbols that
token is meant to represent. However, the most general case, that
accepts all specializations of tokens, is roughly as follows:
Token = <Name> |
<PackageName>
[ ":" <ModuleName>
[ "/" <ModuleMemberName>
[ "." <Class MemberName> ]
]
]
where:
PackageName = <QName>
ModuleName = <QName>
ModuleMemberName = <Name>
ClassMemberName = <Name>
Please refer to the comments in pkg/tokens/tokens.go for more details.
2017-01-20 01:57:20 +00:00
|
|
|
// ClassMember is a token representing a class's member. It uses the following grammar. Unlike ModuleMember, this
|
Tidy up more lint
This change fixes a few things:
* Most importantly, we need to place a leading "." in the paths
to Gometalinter, otherwise some sub-linters just silently skip
the directory altogether. errcheck is one such linter, which
is a very important one!
* Use an explicit Gometalinter.json file to configure the various
settings. This flips on a few additional linters that aren't
on by default (line line length checking). Sadly, a few that
I'd like to enable take waaaay too much time, so in the future
we may consider a nightly job (this includes code similarity,
unused parameters, unused functions, and others that generally
require global analysis).
* Now that we're running more, however, linting takes a while!
The core Lumi project now takes 26 seconds to lint on my laptop.
That's not terrible, but it's long enough that we don't want to
do the silly "run them twice" thing our Makefiles were previously
doing. Instead, we shall deploy some $$($${PIPESTATUS[1]}-1))-fu
to rely on the fact that grep returns 1 on "zero lines".
* Finally, fix the many issues that this turned up.
I think(?) we are done, except, of course, for needing to drive
down some of the cyclomatic complexity issues (which I'm possibly
going to punt on; see pulumi/lumi#259 for more details).
2017-06-22 19:09:46 +00:00
|
|
|
// cannot use a slash for delimiting names, because we use often ClassMember and ModuleMember interchangeably:
|
2017-01-21 17:08:35 +00:00
|
|
|
// ClassMember = <ModuleMember> "." <ClassMemberName>
|
Overhaul names versus tokens
I was sloppy in my use of names versus tokens in the original AST.
Now that we're actually binding things to concrete symbols, etc., we
need to be more precise. In particular, names are just identifiers
that must be "interpreted" in a given lexical context for them to
make any sense; whereas, tokens stand alone and can be resolved without
context other than the set of imported packages, modules, and overall
module structure. As such, names are much simpler than tokens.
As explained in the comments, tokens.Names are simple identifiers:
Name = [A-Za-z_][A-Za-z0-9_]*
and tokens.QNames are fully qualified identifiers delimited by "/":
QName = [ <Name> "/" ]* <Name>
The legal grammar for a token depends on the subset of symbols that
token is meant to represent. However, the most general case, that
accepts all specializations of tokens, is roughly as follows:
Token = <Name> |
<PackageName>
[ ":" <ModuleName>
[ "/" <ModuleMemberName>
[ "." <Class MemberName> ]
]
]
where:
PackageName = <QName>
ModuleName = <QName>
ModuleMemberName = <Name>
ClassMemberName = <Name>
Please refer to the comments in pkg/tokens/tokens.go for more details.
2017-01-20 01:57:20 +00:00
|
|
|
type ClassMember Token
|
|
|
|
|
2017-01-21 20:25:59 +00:00
|
|
|
func NewClassMemberToken(class Type, nm ClassMemberName) ClassMember {
|
2017-02-08 17:12:09 +00:00
|
|
|
contract.Assertf(IsName(string(nm)), "Class '%v' member name '%v' is not a legal name", class, nm)
|
Implement structured token binding
This change fixes a whole host of issues with our current token binding
logic. There are two primary aspects of this change:
First, the prior token syntax was ambiguous, due to our choice of
delimiter characters. For instance, "/" could be used both as a module
member delimiter, in addition to being a valid character for sub-modules.
The result is that we could not look at a token and know for certain
which kind it is. There was also some annoyance with "." being the
delimiter for class members in addition to being the leading character
for special names like ".this", ".super", and ".ctor". Now, we just use
":" as the delimiter character for everything. The result is unambiguous.
Second, the simplistic token table lookup really doesn't work. This is
for three reasons: 1) decorated types like arrays, maps, pointers, and
functions shouldn't need token lookup in the classical sense; 2) largely
because of decorated naming, the mapping of token pieces to symbolic
information isn't straightforward and requires parsing; 3) default modules
need to be expanded and the old method only worked for simple cases and,
in particular, would not work when combined with decorated names.
2017-02-08 22:10:16 +00:00
|
|
|
return ClassMember(string(class) + TokenDelimiter + string(nm))
|
2017-01-21 19:04:03 +00:00
|
|
|
}
|
|
|
|
|
2017-01-21 17:08:35 +00:00
|
|
|
func (tok ClassMember) Package() Package {
|
|
|
|
return tok.Module().Package()
|
|
|
|
}
|
|
|
|
|
|
|
|
func (tok ClassMember) Module() Module {
|
|
|
|
return tok.Class().Module()
|
|
|
|
}
|
|
|
|
|
|
|
|
func (tok ClassMember) Class() Type {
|
Implement structured token binding
This change fixes a whole host of issues with our current token binding
logic. There are two primary aspects of this change:
First, the prior token syntax was ambiguous, due to our choice of
delimiter characters. For instance, "/" could be used both as a module
member delimiter, in addition to being a valid character for sub-modules.
The result is that we could not look at a token and know for certain
which kind it is. There was also some annoyance with "." being the
delimiter for class members in addition to being the leading character
for special names like ".this", ".super", and ".ctor". Now, we just use
":" as the delimiter character for everything. The result is unambiguous.
Second, the simplistic token table lookup really doesn't work. This is
for three reasons: 1) decorated types like arrays, maps, pointers, and
functions shouldn't need token lookup in the classical sense; 2) largely
because of decorated naming, the mapping of token pieces to symbolic
information isn't straightforward and requires parsing; 3) default modules
need to be expanded and the old method only worked for simple cases and,
in particular, would not work when combined with decorated names.
2017-02-08 22:10:16 +00:00
|
|
|
t := Token(tok)
|
|
|
|
contract.Assertf(t.HasClassMember(), "Class member token '%v' missing class member delimiter", tok)
|
|
|
|
return Type(tok[:t.delimiter(3)])
|
2017-01-21 17:08:35 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
func (tok ClassMember) Name() ClassMemberName {
|
Implement structured token binding
This change fixes a whole host of issues with our current token binding
logic. There are two primary aspects of this change:
First, the prior token syntax was ambiguous, due to our choice of
delimiter characters. For instance, "/" could be used both as a module
member delimiter, in addition to being a valid character for sub-modules.
The result is that we could not look at a token and know for certain
which kind it is. There was also some annoyance with "." being the
delimiter for class members in addition to being the leading character
for special names like ".this", ".super", and ".ctor". Now, we just use
":" as the delimiter character for everything. The result is unambiguous.
Second, the simplistic token table lookup really doesn't work. This is
for three reasons: 1) decorated types like arrays, maps, pointers, and
functions shouldn't need token lookup in the classical sense; 2) largely
because of decorated naming, the mapping of token pieces to symbolic
information isn't straightforward and requires parsing; 3) default modules
need to be expanded and the old method only worked for simple cases and,
in particular, would not work when combined with decorated names.
2017-02-08 22:10:16 +00:00
|
|
|
t := Token(tok)
|
|
|
|
contract.Assertf(t.HasClassMember(), "Class member token '%v' missing class member delimiter", tok)
|
|
|
|
return ClassMemberName(tok[t.delimiter(3)+1:])
|
2017-01-21 17:08:35 +00:00
|
|
|
}
|
|
|
|
|
2017-01-21 20:25:59 +00:00
|
|
|
func (tok ClassMember) String() string { return string(tok) }
|
|
|
|
|
2017-01-23 22:48:55 +00:00
|
|
|
// Type is a token representing a type. It is either a primitive type name, reference to a module class, or decorated:
|
|
|
|
// Type = <Name> | <ModuleMember> | <DecoratedType>
|
Overhaul names versus tokens
I was sloppy in my use of names versus tokens in the original AST.
Now that we're actually binding things to concrete symbols, etc., we
need to be more precise. In particular, names are just identifiers
that must be "interpreted" in a given lexical context for them to
make any sense; whereas, tokens stand alone and can be resolved without
context other than the set of imported packages, modules, and overall
module structure. As such, names are much simpler than tokens.
As explained in the comments, tokens.Names are simple identifiers:
Name = [A-Za-z_][A-Za-z0-9_]*
and tokens.QNames are fully qualified identifiers delimited by "/":
QName = [ <Name> "/" ]* <Name>
The legal grammar for a token depends on the subset of symbols that
token is meant to represent. However, the most general case, that
accepts all specializations of tokens, is roughly as follows:
Token = <Name> |
<PackageName>
[ ":" <ModuleName>
[ "/" <ModuleMemberName>
[ "." <Class MemberName> ]
]
]
where:
PackageName = <QName>
ModuleName = <QName>
ModuleMemberName = <Name>
ClassMemberName = <Name>
Please refer to the comments in pkg/tokens/tokens.go for more details.
2017-01-20 01:57:20 +00:00
|
|
|
type Type Token
|
|
|
|
|
2017-01-21 20:25:59 +00:00
|
|
|
func NewTypeToken(mod Module, nm TypeName) Type {
|
2017-02-08 17:12:09 +00:00
|
|
|
contract.Assertf(IsName(string(nm)), "Module '%v' type name '%v' is not a legal name", mod, nm)
|
Implement structured token binding
This change fixes a whole host of issues with our current token binding
logic. There are two primary aspects of this change:
First, the prior token syntax was ambiguous, due to our choice of
delimiter characters. For instance, "/" could be used both as a module
member delimiter, in addition to being a valid character for sub-modules.
The result is that we could not look at a token and know for certain
which kind it is. There was also some annoyance with "." being the
delimiter for class members in addition to being the leading character
for special names like ".this", ".super", and ".ctor". Now, we just use
":" as the delimiter character for everything. The result is unambiguous.
Second, the simplistic token table lookup really doesn't work. This is
for three reasons: 1) decorated types like arrays, maps, pointers, and
functions shouldn't need token lookup in the classical sense; 2) largely
because of decorated naming, the mapping of token pieces to symbolic
information isn't straightforward and requires parsing; 3) default modules
need to be expanded and the old method only worked for simple cases and,
in particular, would not work when combined with decorated names.
2017-02-08 22:10:16 +00:00
|
|
|
return Type(string(mod) + TokenDelimiter + string(nm))
|
2017-01-21 19:04:03 +00:00
|
|
|
}
|
|
|
|
|
2017-01-21 17:08:35 +00:00
|
|
|
func (tok Type) Package() Package {
|
2017-01-23 22:48:55 +00:00
|
|
|
if tok.Primitive() || tok.Decorated() {
|
2017-01-21 17:08:35 +00:00
|
|
|
return Package("")
|
|
|
|
}
|
2017-01-27 23:42:39 +00:00
|
|
|
return ModuleMember(tok).Package()
|
2017-01-21 17:08:35 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
func (tok Type) Module() Module {
|
2017-01-23 22:48:55 +00:00
|
|
|
if tok.Primitive() || tok.Decorated() {
|
2017-01-21 17:08:35 +00:00
|
|
|
return Module("")
|
|
|
|
}
|
2017-01-27 23:42:39 +00:00
|
|
|
return ModuleMember(tok).Module()
|
2017-01-21 17:08:35 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
func (tok Type) Name() TypeName {
|
2017-01-23 22:48:55 +00:00
|
|
|
if tok.Primitive() || tok.Decorated() {
|
2017-01-21 17:08:35 +00:00
|
|
|
return TypeName(tok)
|
|
|
|
}
|
2017-01-27 23:42:39 +00:00
|
|
|
return TypeName(ModuleMember(tok).Name())
|
2017-01-21 17:08:35 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
func (tok Type) Member() ModuleMember {
|
|
|
|
return ModuleMember(tok)
|
|
|
|
}
|
|
|
|
|
2017-01-23 22:48:55 +00:00
|
|
|
// Decorated indicates whether this token represents a decorated type.
|
|
|
|
func (tok Type) Decorated() bool {
|
|
|
|
return tok.Pointer() || tok.Array() || tok.Map() || tok.Function()
|
|
|
|
}
|
|
|
|
|
|
|
|
func (tok Type) Pointer() bool { return IsPointerType(tok) }
|
|
|
|
func (tok Type) Array() bool { return IsArrayType(tok) }
|
|
|
|
func (tok Type) Map() bool { return IsMapType(tok) }
|
|
|
|
func (tok Type) Function() bool { return IsFunctionType(tok) }
|
|
|
|
|
2017-01-21 17:08:35 +00:00
|
|
|
// Primitive indicates whether this type is a primitive type name (i.e., not qualified with a module, etc).
|
|
|
|
func (tok Type) Primitive() bool {
|
2017-01-23 22:48:55 +00:00
|
|
|
return !tok.Decorated() && !Token(tok).HasModule()
|
2017-01-21 17:08:35 +00:00
|
|
|
}
|
|
|
|
|
2017-01-21 20:25:59 +00:00
|
|
|
func (tok Type) String() string { return string(tok) }
|
|
|
|
|
Overhaul names versus tokens
I was sloppy in my use of names versus tokens in the original AST.
Now that we're actually binding things to concrete symbols, etc., we
need to be more precise. In particular, names are just identifiers
that must be "interpreted" in a given lexical context for them to
make any sense; whereas, tokens stand alone and can be resolved without
context other than the set of imported packages, modules, and overall
module structure. As such, names are much simpler than tokens.
As explained in the comments, tokens.Names are simple identifiers:
Name = [A-Za-z_][A-Za-z0-9_]*
and tokens.QNames are fully qualified identifiers delimited by "/":
QName = [ <Name> "/" ]* <Name>
The legal grammar for a token depends on the subset of symbols that
token is meant to represent. However, the most general case, that
accepts all specializations of tokens, is roughly as follows:
Token = <Name> |
<PackageName>
[ ":" <ModuleName>
[ "/" <ModuleMemberName>
[ "." <Class MemberName> ]
]
]
where:
PackageName = <QName>
ModuleName = <QName>
ModuleMemberName = <Name>
ClassMemberName = <Name>
Please refer to the comments in pkg/tokens/tokens.go for more details.
2017-01-20 01:57:20 +00:00
|
|
|
// Variable is a token representing a variable (module property, class property, or local variable (including
|
|
|
|
// parameters)). It can be a simple name for the local cases, or a true token for others:
|
|
|
|
// Variable = <Name> | <ModuleMember> | <ClassMember>
|
|
|
|
type Variable Token
|
|
|
|
|
2017-01-21 20:25:59 +00:00
|
|
|
func (tok Variable) String() string { return string(tok) }
|
|
|
|
|
Overhaul names versus tokens
I was sloppy in my use of names versus tokens in the original AST.
Now that we're actually binding things to concrete symbols, etc., we
need to be more precise. In particular, names are just identifiers
that must be "interpreted" in a given lexical context for them to
make any sense; whereas, tokens stand alone and can be resolved without
context other than the set of imported packages, modules, and overall
module structure. As such, names are much simpler than tokens.
As explained in the comments, tokens.Names are simple identifiers:
Name = [A-Za-z_][A-Za-z0-9_]*
and tokens.QNames are fully qualified identifiers delimited by "/":
QName = [ <Name> "/" ]* <Name>
The legal grammar for a token depends on the subset of symbols that
token is meant to represent. However, the most general case, that
accepts all specializations of tokens, is roughly as follows:
Token = <Name> |
<PackageName>
[ ":" <ModuleName>
[ "/" <ModuleMemberName>
[ "." <Class MemberName> ]
]
]
where:
PackageName = <QName>
ModuleName = <QName>
ModuleMemberName = <Name>
ClassMemberName = <Name>
Please refer to the comments in pkg/tokens/tokens.go for more details.
2017-01-20 01:57:20 +00:00
|
|
|
// Function is a token representing a variable (module method or class method). Its grammar is as follows:
|
|
|
|
// Variable = <ModuleMember> | <ClassMember>
|
|
|
|
type Function Token
|
|
|
|
|
2017-01-21 20:25:59 +00:00
|
|
|
func (tok Function) String() string { return string(tok) }
|
2017-07-14 06:03:28 +00:00
|
|
|
|
|
|
|
// ByName implements sort.Interface to allow an array of tokens to be
|
|
|
|
// sorted based on string order.
|
|
|
|
type ByName []Token
|
|
|
|
|
|
|
|
func (ts ByName) Len() int { return len(ts) }
|
|
|
|
func (ts ByName) Less(i int, j int) bool { return ts[i] < ts[j] }
|
|
|
|
func (ts ByName) Swap(i int, j int) { ts[i], ts[j] = ts[j], ts[i] }
|