pulumi/pkg/tokens/tokens.go

307 lines
11 KiB
Go
Raw Permalink Normal View History

2017-06-26 21:46:34 +00:00
// Copyright 2016-2017, Pulumi Corporation. All rights reserved.
Begin overhauling semantic phases This change further merges the new AST and MuPack/MuIL formats and abstractions into the core of the compiler. A good amount of the old code is gone now; I decided against ripping it all out in one fell swoop so that I can methodically check that we are preserving all relevant decisions and/or functionality we had in the old model. The changes are too numerous to outline in this commit message, however, here are the noteworthy ones: * Split up the notion of symbols and tokens, resulting in: - pkg/symbols for true compiler symbols (bound nodes) - pkg/tokens for name-based tokens, identifiers, constants * Several packages move underneath pkg/compiler: - pkg/ast becomes pkg/compiler/ast - pkg/errors becomes pkg/compiler/errors - pkg/symbols becomes pkg/compiler/symbols * pkg/ast/... becomes pkg/compiler/legacy/ast/... * pkg/pack/ast becomes pkg/compiler/ast. * pkg/options goes away, merged back into pkg/compiler. * All binding functionality moves underneath a dedicated package, pkg/compiler/binder. The legacy.go file contains cruft that will eventually go away, while the other files represent a halfway point between new and old, but are expected to stay roughly in the current shape. * All parsing functionality is moved underneath a new pkg/compiler/metadata namespace, and we adopt new terminology "metadata reading" since real parsing happens in the MetaMu compilers. Hence, Parser has become metadata.Reader. * In general phases of the compiler no longer share access to the actual compiler.Compiler object. Instead, shared state is moved to the core.Context object underneath pkg/compiler/core. * Dependency resolution during binding has been rewritten to the new model, including stashing bound package symbols in the context object, and detecting import cycles. * Compiler construction does not take a workspace object. Instead, creation of a workspace is entirely hidden inside of the compiler's constructor logic. * There are three Compile* functions on the Compiler interface, to support different styles of invoking compilation: Compile() auto- detects a Mu package, based on the workspace; CompilePath(string) loads the target as a Mu package and compiles it, regardless of the workspace settings; and, CompilePackage(*pack.Package) will compile a pre-loaded package AST, again regardless of workspace. * Delete the _fe, _sema, and parsetree phases. They are no longer relevant and the functionality is largely subsumed by the above. ...and so very much more. I'm surprised I ever got this to compile again!
2017-01-18 20:18:37 +00:00
// Package tokens contains the core LumiIL symbol and token types.
Begin overhauling semantic phases This change further merges the new AST and MuPack/MuIL formats and abstractions into the core of the compiler. A good amount of the old code is gone now; I decided against ripping it all out in one fell swoop so that I can methodically check that we are preserving all relevant decisions and/or functionality we had in the old model. The changes are too numerous to outline in this commit message, however, here are the noteworthy ones: * Split up the notion of symbols and tokens, resulting in: - pkg/symbols for true compiler symbols (bound nodes) - pkg/tokens for name-based tokens, identifiers, constants * Several packages move underneath pkg/compiler: - pkg/ast becomes pkg/compiler/ast - pkg/errors becomes pkg/compiler/errors - pkg/symbols becomes pkg/compiler/symbols * pkg/ast/... becomes pkg/compiler/legacy/ast/... * pkg/pack/ast becomes pkg/compiler/ast. * pkg/options goes away, merged back into pkg/compiler. * All binding functionality moves underneath a dedicated package, pkg/compiler/binder. The legacy.go file contains cruft that will eventually go away, while the other files represent a halfway point between new and old, but are expected to stay roughly in the current shape. * All parsing functionality is moved underneath a new pkg/compiler/metadata namespace, and we adopt new terminology "metadata reading" since real parsing happens in the MetaMu compilers. Hence, Parser has become metadata.Reader. * In general phases of the compiler no longer share access to the actual compiler.Compiler object. Instead, shared state is moved to the core.Context object underneath pkg/compiler/core. * Dependency resolution during binding has been rewritten to the new model, including stashing bound package symbols in the context object, and detecting import cycles. * Compiler construction does not take a workspace object. Instead, creation of a workspace is entirely hidden inside of the compiler's constructor logic. * There are three Compile* functions on the Compiler interface, to support different styles of invoking compilation: Compile() auto- detects a Mu package, based on the workspace; CompilePath(string) loads the target as a Mu package and compiles it, regardless of the workspace settings; and, CompilePackage(*pack.Package) will compile a pre-loaded package AST, again regardless of workspace. * Delete the _fe, _sema, and parsetree phases. They are no longer relevant and the functionality is largely subsumed by the above. ...and so very much more. I'm surprised I ever got this to compile again!
2017-01-18 20:18:37 +00:00
package tokens
import (
"strings"
"github.com/pkg/errors"
"github.com/pulumi/pulumi/pkg/util/contract"
)
// Token is a qualified name that is capable of resolving to a symbol entirely on its own. Most uses of tokens are
// typed based on the context, so that a subset of the token syntax is permissible (see the various typedefs below).
// However, in its full generality, a token can have a package part, a module part, a module-member part, and a
// class-member part. Obviously tokens that are meant to address just a module won't have the module-member part, and
// tokens addressing module members won't have the class-member part, etc.
//
// Token's grammar is as follows:
//
// Token = <Identifier> |
// <QualifiedToken> |
// <DecoratedType>
// Identifier = <Name>
// QualifiedToken = <PackageName> [ ":" <ModuleName> [ ":" <ModuleMemberName> [ ":" <ClassMemberName> ] ] ]
// PackageName = ... similar to <QName>, except dashes permitted ...
// ModuleName = <QName>
// ModuleMemberName = <Name>
// ClassMemberName = <Name>
//
// A token may be a simple identifier in the case that it refers to a built-in symbol, like a primitive type, or a
// variable in scope, rather than a qualified token that is to be bound to a symbol through package/module resolution.
//
// Notice that both package and module names may be qualified names (meaning they can have "/"s in them; see QName's
// comments), and that module and class members must use unqualified, simple names (meaning they have no delimiters).
// The specialized token kinds differ only in what elements they require as part of the token string.
//
// Finally, a token may also be a decorated type. This is for built-in array, map, pointer, and function types:
//
// DecoratedType = "*" <Token> |
// "[]" <Token> |
// "map[" <Token> "]" <Token> |
// "(" [ <Token> [ "," <Token> ]* ] ")" <Token>?
//
// Notice that a recursive parsing process is required to extract elements from a <DecoratedType> token.
type Token string
const TokenDelimiter string = ":" // the character delimiting portions of a qualified token.
func (tok Token) Delimiters() int { return strings.Count(string(tok), TokenDelimiter) }
func (tok Token) HasModule() bool { return tok.Delimiters() > 0 }
func (tok Token) HasModuleMember() bool { return tok.Delimiters() > 1 }
func (tok Token) HasClassMember() bool { return tok.Delimiters() > 2 }
func (tok Token) Simple() bool { return tok.Delimiters() == 0 }
func (tok Token) String() string { return string(tok) }
// delimiter returns the Nth index of a delimiter, as specified by the argument.
func (tok Token) delimiter(n int) int {
ix := -1
for n > 0 {
// Make sure we still have space.
if ix+1 >= len(tok) {
ix = -1
break
}
// If we do, keep looking for the next delimiter.
nix := strings.Index(string(tok[ix+1:]), TokenDelimiter)
if nix == -1 {
break
}
ix += 1 + nix
n--
}
return ix
2017-01-22 17:45:58 +00:00
}
// Name returns the Token as a Name (and assumes it is a legal one).
func (tok Token) Name() Name {
contract.Requiref(tok.Simple(), "tok", "Simple")
contract.Requiref(IsName(tok.String()), "tok", "IsName(%v)", tok)
return Name(tok.String())
}
// Package extracts the package from the token, assuming one exists.
func (tok Token) Package() Package {
if t := Type(tok); t.Decorated() || t.Primitive() {
return "" // decorated and primitive types are built-in (and hence have no package).
}
if tok.HasModule() {
return Package(tok[:tok.delimiter(1)])
}
return Package(tok)
}
// Module extracts the module portion from the token, assuming one exists.
func (tok Token) Module() Module {
if tok.HasModule() {
if tok.HasModuleMember() {
return Module(tok[:tok.delimiter(2)])
}
return Module(tok)
}
return Module("")
}
// ModuleMember extracts the module member portion from the token, assuming one exists.
func (tok Token) ModuleMember() ModuleMember {
if tok.HasModuleMember() {
if tok.HasClassMember() {
return ModuleMember(tok[:tok.delimiter(3)])
}
return ModuleMember(tok)
}
return ModuleMember("")
}
// ClassMember extracts the class member portion from the token, assuming one exists.
func (tok Token) ClassMember() ClassMember {
if tok.HasClassMember() {
return ClassMember(tok)
}
return ClassMember("")
}
// Package is a token representing just a package. It uses a much simpler grammar:
// Package = <PackageName>
// Note that a package name of "." means "current package", to simplify emission and lookups.
type Package Token
2017-01-21 20:25:59 +00:00
func NewPackageToken(nm PackageName) Package {
contract.Assertf(IsPackageName(string(nm)), "Package name '%v' is not a legal qualified name", nm)
return Package(nm)
}
func (tok Package) Name() PackageName {
return PackageName(tok)
}
2017-01-21 20:25:59 +00:00
func (tok Package) String() string { return string(tok) }
// Module is a token representing a module. It uses the following subset of the token grammar:
// Module = <Package> ":" <ModuleName>
// Note that a module name of "." means "current module", to simplify emission and lookups.
type Module Token
2017-01-21 20:25:59 +00:00
func NewModuleToken(pkg Package, nm ModuleName) Module {
contract.Assertf(IsQName(string(nm)), "Package '%v' module name '%v' is not a legal qualified name", pkg, nm)
return Module(string(pkg) + TokenDelimiter + string(nm))
}
func (tok Module) Package() Package {
t := Token(tok)
contract.Assertf(t.HasModule(), "Module token '%v' missing module delimiter", tok)
return Package(tok[:t.delimiter(1)])
}
func (tok Module) Name() ModuleName {
t := Token(tok)
contract.Assertf(t.HasModule(), "Module token '%v' missing module delimiter", tok)
return ModuleName(tok[t.delimiter(1)+1:])
}
2017-01-21 20:25:59 +00:00
func (tok Module) String() string { return string(tok) }
// ModuleMember is a token representing a module's member. It uses the following grammar. Note that this is not
// ambiguous because member names cannot contain slashes, and so the "last" slash in a name delimits the member:
// ModuleMember = <Module> "/" <ModuleMemberName>
type ModuleMember Token
2017-01-21 20:25:59 +00:00
func NewModuleMemberToken(mod Module, nm ModuleMemberName) ModuleMember {
contract.Assertf(IsName(string(nm)), "Module '%v' member name '%v' is not a legal name", mod, nm)
return ModuleMember(string(mod) + TokenDelimiter + string(nm))
}
// ParseModuleMember attempts to turn the string s into a module member, returning an error if it isn't a valid one.
func ParseModuleMember(s string) (ModuleMember, error) {
if !Token(s).HasModuleMember() {
return "", errors.Errorf("String '%v' is not a valid module member", s)
}
return ModuleMember(s), nil
}
func (tok ModuleMember) Package() Package {
return tok.Module().Package()
}
func (tok ModuleMember) Module() Module {
t := Token(tok)
contract.Assertf(t.HasModuleMember(), "Module member token '%v' missing module member delimiter", tok)
return Module(tok[:t.delimiter(2)])
}
func (tok ModuleMember) Name() ModuleMemberName {
t := Token(tok)
contract.Assertf(t.HasModuleMember(), "Module member token '%v' missing module member delimiter", tok)
return ModuleMemberName(tok[t.delimiter(2)+1:])
}
2017-01-21 20:25:59 +00:00
func (tok ModuleMember) String() string { return string(tok) }
// ClassMember is a token representing a class's member. It uses the following grammar. Unlike ModuleMember, this
// cannot use a slash for delimiting names, because we use often ClassMember and ModuleMember interchangeably:
// ClassMember = <ModuleMember> "." <ClassMemberName>
type ClassMember Token
2017-01-21 20:25:59 +00:00
func NewClassMemberToken(class Type, nm ClassMemberName) ClassMember {
contract.Assertf(IsName(string(nm)), "Class '%v' member name '%v' is not a legal name", class, nm)
return ClassMember(string(class) + TokenDelimiter + string(nm))
}
func (tok ClassMember) Package() Package {
return tok.Module().Package()
}
func (tok ClassMember) Module() Module {
return tok.Class().Module()
}
func (tok ClassMember) Class() Type {
t := Token(tok)
contract.Assertf(t.HasClassMember(), "Class member token '%v' missing class member delimiter", tok)
return Type(tok[:t.delimiter(3)])
}
func (tok ClassMember) Name() ClassMemberName {
t := Token(tok)
contract.Assertf(t.HasClassMember(), "Class member token '%v' missing class member delimiter", tok)
return ClassMemberName(tok[t.delimiter(3)+1:])
}
2017-01-21 20:25:59 +00:00
func (tok ClassMember) String() string { return string(tok) }
// Type is a token representing a type. It is either a primitive type name, reference to a module class, or decorated:
// Type = <Name> | <ModuleMember> | <DecoratedType>
type Type Token
2017-01-21 20:25:59 +00:00
func NewTypeToken(mod Module, nm TypeName) Type {
contract.Assertf(IsName(string(nm)), "Module '%v' type name '%v' is not a legal name", mod, nm)
return Type(string(mod) + TokenDelimiter + string(nm))
}
func (tok Type) Package() Package {
if tok.Primitive() || tok.Decorated() {
return Package("")
}
2017-01-27 23:42:39 +00:00
return ModuleMember(tok).Package()
}
func (tok Type) Module() Module {
if tok.Primitive() || tok.Decorated() {
return Module("")
}
2017-01-27 23:42:39 +00:00
return ModuleMember(tok).Module()
}
func (tok Type) Name() TypeName {
if tok.Primitive() || tok.Decorated() {
return TypeName(tok)
}
2017-01-27 23:42:39 +00:00
return TypeName(ModuleMember(tok).Name())
}
func (tok Type) Member() ModuleMember {
return ModuleMember(tok)
}
// Decorated indicates whether this token represents a decorated type.
func (tok Type) Decorated() bool {
return tok.Pointer() || tok.Array() || tok.Map() || tok.Function()
}
func (tok Type) Pointer() bool { return IsPointerType(tok) }
func (tok Type) Array() bool { return IsArrayType(tok) }
func (tok Type) Map() bool { return IsMapType(tok) }
func (tok Type) Function() bool { return IsFunctionType(tok) }
// Primitive indicates whether this type is a primitive type name (i.e., not qualified with a module, etc).
func (tok Type) Primitive() bool {
return !tok.Decorated() && !Token(tok).HasModule()
}
2017-01-21 20:25:59 +00:00
func (tok Type) String() string { return string(tok) }
// Variable is a token representing a variable (module property, class property, or local variable (including
// parameters)). It can be a simple name for the local cases, or a true token for others:
// Variable = <Name> | <ModuleMember> | <ClassMember>
type Variable Token
2017-01-21 20:25:59 +00:00
func (tok Variable) String() string { return string(tok) }
// Function is a token representing a variable (module method or class method). Its grammar is as follows:
// Variable = <ModuleMember> | <ClassMember>
type Function Token
2017-01-21 20:25:59 +00:00
func (tok Function) String() string { return string(tok) }
// ByName implements sort.Interface to allow an array of tokens to be
// sorted based on string order.
type ByName []Token
func (ts ByName) Len() int { return len(ts) }
func (ts ByName) Less(i int, j int) bool { return ts[i] < ts[j] }
func (ts ByName) Swap(i int, j int) { ts[i], ts[j] = ts[j], ts[i] }