Terbium Language Specification
This page goes over basic common language specifications for the Terbium Programming Language.
This page goes over basic common language specifications for the Terbium Programming Language.
Pre-release
Terbium's syntax is most "accurately" highlighted by existing TypeScript highlighters. Until syntax highlighting is commonly supported for this language (or where it has not been added), ts
can be used as a replacement.
Many keywords/identifiers won't be highlighted, i.e. func
.
Runtime
The runtime execution of a Terbium program should go in the following order:
execute static blocks.
execute the entrypoint function.
execute cleanup tasks.
Static Blocks
A static block is a block of code ran at the beginning of every imported module, including the source file. All top-level dynamic code are considered static blocks. A static block may also be a block labelled with static
:
Lifetime of Static Blocks
A big difference between code executed in static-blocks is that although they are executed statically, the scope of a static block is dropped before calling the entrypoint. That is, you will not have access to any local variables (i.e. let
-defined variables) within your main code:
Think of top-level code being implicitly being wrapped with static {}
:
Note that this is different from static variables, which exist throughout the runtime of the program:
Entrypoint
The entrypoint of your program is where the runtime logic of your code generally enters. Techinically, code in static blocks are ran before the entrypoint, however static blocks are ran regardless of whether they are directly executed from a binary or imported as a library or module – this is in contrast to the entrypoint, which is never ran when importing a module or library, but only when directly executed from a binary.
The entrypoint is specified as a function and is resolved in the following priority:
if there is a top-level function decorated with the
@entrypoint
decorator, use that functionif there is a top-level function named
main
, use that functionif strategies 1 and 2 fail, do not resolve an entrypoint function
If an entrypoint function exists, its signature will be checked against the following type:
That is:
the function must take either zero parameters or one parameter of type
[string]
, preferrably namedargs
the function must have a return type of either:
void
(viavoid throws E
whenE = never
, sincevoid throws never
flattens to justvoid
), orvoid throws E
(shortened to justthrows E
), when you want error handling and propagation in the entrypoint function
the function must be contained, which means it takes no captures. since the entrypoint function is assumed to at the top-level, such a scenario is impossible, anyways.
The following lints/errors regarding entrypoints should be issued:
no_entrypoint
when there is no entrypoint function to a binary (warn
by default)main_not_func
when there is no@entrypoint
but there exists an itemmain
that is not a function at the top-level (warn
by default)misplaced_entrypoint
when a function decorated with@entrypoint
is not in the top-level (always error)invalid_entrypoint
when an entrypoint function has captures or has an invalid signature (always error, see below)
The following will resolve as entrypoint functions:
The following demonstrates the priority @entrypoint
has over func main()
:
The following will not resolve as entrypoints:
The args
parameter
args
parameterThe entrypoint function can optionally take a single parameter, args
. It is a [string]
of the arguments passed to the command used to run the binary.
Primitive types
Terbium comes with many primitive types. A type is the classification of an object.
Number types
Numbers can be represented in memory in many different formats in Terbium.
uint
Unsigned integer, bit-width inferred
int
Signed integer, bit-width inferred
uintN
[1]
Unsigned N-bit integer
intN
[1]
Signed N-bit integer
char
Equivalent to a uint32
that represents a single Unicode character [2]
float128
IEEE 754 floating point number, quadruple precision (128 bits)
float
, float64
IEEE 754 floating point number, double precision (64 bits)
float32
IEEE 754 floating point number, single precision (32 bits)
[1] Valid values for N: 8
, 16
, 32
, 64
, 128
. [2] See the character literals section for more information.
Integer literals
An integer can be defined by simply writing the integer. By default, integer literals are defined as int
(signed integers). Suffix the integer with u
to define it as uint
(unsigned).
The cast syntax can be used to cast to specific bit-widths:
If casting fails, an error is raised:
If you want integers to wrap or coerce silently, use the .wrapping_cast
method:
Integer literals can also be defined using radix specifiers:
0b
Binary
0b1101
0o
Octal
0o167
0x
Hexadecimal
0x41ce3a
Floating-point number literals
A floating-point number can be defined by simply writing the number with a decimal point. By default, float-literals are defined as float
(float64
).
A trailing dot, either start or end, may be allowed or disallowed with varying implementations. Similiarly, multiple preceding zeros for floats (e.g. 00005.1
) and also integers (e.g. 00005
) may also be allowed or disallowed with varying implementations.
Floating-point numbers have their bit-widths specified through casting, similar to integers:
Booleans
A boolean is represented as the bool
type. It can either be true
or false
. One can be created literally by writing true
or false
.
Arrays
An array is a statically-sized, stack-allocated, and ordered collection of objects. Arrays cannot grow or shrink and their size must be known at compile-time.
The type of arrays are T[N]
, where T
is the type of an element in the array, and N
is the size of the array. For example, int[10]
is a 10-element array of integers. It's size, given int
is resolved as a int32
, is 32 * 10 = 320 bits, or 40 bytes.
An array literal is defined through surrounding the elements with []
:
Slices
A slice is a view of an array in which the length of the slice may not be known at compile-time. The source array must exist and all values in the slice are borrowed values from the source array.
The type of slices are T[]
, where T
is the type of an element in the slice.
A slice cannot be defined literally since its data is borrowed from a source array. A slice may be retrieved by slicing an array, however:
Byte-slices
One type of slice (uint8[]
) is a slice of uint8
, better known as byte-slices.
Tuples
A tuple is an ordered collection of objects of varying types (which are known), packed into one object.
The type of tuples are (T1, T2, ...)
, i.e. a comma-separated list of types surrounded by ()
. For example, the tuple (1, 2)
has the type (uint32, uint32)
.
A tuple literal is defined by comma-separating its elements then surrounding them with ()
:
Tuples may not grow or shrink, and the types of their elements cannot change.
Lists
A List
is a built-in collection type, however it is not considered a primitive type.
These are similar to arrays in the sense that they store ordered collections of objects, however these are heap-allocated and are growable and shrinkable. Their true size is only known at runtime.
A list is defined as follows:
A list can be created literally by casting an array literal to a List
:
It can also be created by directly using its constructor:
Range
A Range<Idx>
represents a range, which can store lower and upper bounds. These can represent ranges of numbers, characters, etc.
A range literal can be one of the following:
Strings
A string
should be stored as a raw uint8[]
with a specified encoding. A string
cannot be left without an encoding. By default, all strings will be in utf-8
encoding. Strings are immutable and cannot grow nor shrink.
Since a string
is stored as a slice, there must be a source List<uint8>
the string is sliced from, or it must be from a string-literal or statically-created string. Usually, this is taken care of internally.
String Literals
A string can be defined by surrounding the contents of the string with either '
or "
. By default, string literals will be encoded in utf-8
.
Multi-line string literals
Multiline strings can be formed by surrounding the contents of the string with #"
and its mirror counterpart ("#
).
You may add any N number of pound symbols, in which the string will be terminated by a quote followed by said N number of pound symbols.
Backslash escapes
In a normal string, backslash escape sequences exist for providing characters that were previously unnecessary to type out or impossible.
Newline
Carriage Return
Tab
\b
Backspace
\f
Form Feed
\\
Literal backslash
\'
Literal '
\"
Literal "
\0
[1]
Null byte
\x12
[2]
Character by hex codepoint (2 digits)
\u1234
[2]
Character by hex codepoint (4 digits)
\U12345678
[2]
Character by hex codepoint (8 digits)
[1] Because strings will always have an encoding, null-bytes can only be placed in byte-string literals. [2] Numbers are a placeholder of a valid hex value of the specified length, e.g. \u200b
Raw strings
Prefix a string literal with ~
to make it a raw string. A raw string will not take into account backslash escapes.
Interpolated strings
Prefix a string literal with $
to add string-interpolation support to it.
In reality, string-interpolation is syntax sugar. The above roughly desugars to:
See [String Formatting] for more information.
Byte Strings
Strings without encodings are simply represented as byte slices: uint8[]
. It is simply a string of bytes.
A literal string can be defined as a byte slice by adding b
before the string:
Character-literals
A char
represents a single Unicode character. It is a "Unicode scalar value", and represented as a uint32
.
A character-literal is represented by prefixing a single-character string-literal with a c
:
You can also cast a one-character string to a char
:
Since char
is internally represented as a uint32
, you can also cast integers to chars as well:
Conditional expressions
A conditional expression is an expression that runs code depending on whether a boolean condition is either true or false.
There are many conditional operators:
==
Equals
op func eq(self, other: Rhs) -> bool
Eq<Rhs = Self>
!=
Not equals
op func ne(self, other: Rhs) -> bool
Ne<Rhs = Self>
<
Less than
op func lt(self, other: Rhs) -> bool
Lt<Rhs = Self>
<=
Less than or equals
op func le(self, other: Rhs) -> bool
Le<Rhs = Self>
>
Greater than
op func gt(self, other: Rhs) -> bool
Gt<Rhs = Self>
>=
Greater than or equals
op func ge(self, other: Rhs) -> bool
Ge<Rhs = Self>
x is T
Check if the type of x
is compatible with T
N/A
N/A
x in y
Contains
op func contains(self, value: V) -> bool
Contains<V>
There are also three logical operators:
||
Infix
Infix
&&
Infix
Logical AND
!
Prefix
Logical NOT
💡 When an object obj
is truthy, it means that obj to bool == true
.
These logical operators also work with traditional values:
a || b
Return a if a is truthy, else return b
a && b
Return b if a is truthy, else return a
!a
Performs op func not(self) -> Output
(trait equivalent is Not<Output = Self>
).
The operators a || b
and a && b
short-circuit, meaning while a
is always evaluated as the main condition check, b
is only evaluated when the value has to resolve to b
. For example:
It should be noted that the logical operators ||
and &&
only work if the left-hand value implements a cast function to a boolean. See the casting section for more information. Additionally, the types of both sides of the operator must be compatible, and the resulting type will be the broader type of the two values. For example:
If-statements
The standard if
statement can be used to run code if a condition is true
:
The condition must be a bool
and will not be implicitly casted to one:
Use else
to run code if a condition is false:
The else if
construct is also provided:
If-statements as expressions
If-statements that converge can be used as expressions. If an if-statement is convergent, it means that the if-statement will always be evaluate to something, i.e. an if-statement with an else
. The implicit-return syntax can be used to specify the output of if-statements:
The then
keyword can make your code cleaner by removing the need for curly brackets:
else if
also works:
Note that the then
style of writing if-expressions must have an else
block (i.e. it must converge).
While-loops
A loop runs a block of code over and over again until it is told to exit. One type of loop is a while-loop, which, given a condition, will continuously run the condition until the condition is false
.
The while-loop construct is found in most other programming languages and the concept is exactly the same:
Control flow with continue
and break
is also provided to exit out of loops early:
break if
, continue if
break if
, continue if
These can be used instead of a traditional if-statement as a shorthand to avoid another block and another level of indentation if following proper code styles. The above code can be rewritten as:
while-else
while-else
An else
block can be added to a while-loop, which will be run if the while loop was exited without a break
:
While-else expressions, breaking with values
While-else-loops can also be expressions. Since normal while-loops cannot be guaranteed to converge, they are not considered expressions. while-else
loops are always convergent.
Specify a value after break
to break out of the while-loop with the value. For example:
The break if
grammar can be extended to break [expression] if <condition>
to return values:
If you ever break with a value, the type of all break-values in the same loop _must_** be compatible with each other!** The type of the value returned from the while-loop will be the broadest of all values. In the list of break-values, this includes the type of the value in the else
block.
Loop statements
A while true
loop can either run infinitely or break. If a while true
loop is ever exited, it has converged through a break
statement. In this manner, while true
loops do not need an else
block, since it will never be called.
A special case for this scenario would be inconsistent -- should it be resolved syntactially? Analytically? At runtime? Because of this, a more explicit type of loop is provided, inspired by the Rust Programming Language, the loop
...loop:
A loop-statement:
Logically the same as a
while true
loopCannot take an
else
blockCan always be an expression
when
and match
statements
when
and match
statementsChaining many else if
together can be repetitive and make your code look bloated. Terbium provides the when
statement for this purpose.
A when
statement maps conditions to the expected result:
The above is equivalent to:
Diverging when
-statements can also be used as expressions. When used as expressions, the else clause must always exist.
You can also use blocks of code instead of just an expression in each arm:
match
statements
match
statementsThe when
statement is a powerful tool when dealing with many conditions in your code. However, there are times when a match
-statement can help simplify your code even further.
A match
statement is similar to a when
statement, however it maps patterns rather than conditions to corresponding values. The patterns are matched against a subject value.
Take this when
-statement, for example:
Even with a when
-statement, this code still seems repetitive. This is why pattern-matching with if
-statements is supported:
Functions
You can abstract a procedure a function which can be called over and over again.
A function may be defined with the func
keyword:
Then, it may be called by referencing the function (by its name in this case) followed by a call to the function using parenthesis:
Functions may take parameters. Parameters are comma-separated and must have their type specified with a colon followed by the type:
When calling functions that take parameters, all parameters without a default must be provided, otherwise a compile-time error is thrown. All types must also be met.
Function may also return values. The return
keyword can be used, or an implicit-keyword can be issued by removing the semicolon off of the last expression of the function. Return types must be explicitly specified (for block-style functions) by specifiying the type after ->
.
The return type for returning nothing is void
, i.e. func x() -> void { ... }
. If no return type is specified in the signature, the void
type is used instead.
Functions may also be defined with the expression-style =
shorthand, like so:
In expression-style functions, the return type can be inferred and does not have to be explicitly specified.
The type of functions, and any callable, is written as (parameters) -> return_type
. For example, the function defined above has the type (uint, uint) -> uint
.
Closures
A closure is a function that captures values from its outside scope. For example,
In the above function, values were implicitly captured. Values may also be explicitly captured with the captures
keyword. Explicitly captured closures cannot use values that were not explcitly captured:
A function can be declared as contained
to explicitly specify the function to capture nothing from outside scope. This means only variables created within the function, and static or constant variables, can be used in the function:
Anonymous functions
An anonymous function is a function declared without a name. The body of an anonymous function can be specified after an instance of the do
keyword, and if the anonymous function takes parameters, the do
is prefixed with a backslash \
followed by parameters:
All parameters and the return type of anonymous functions can be inferred. All anonymous functions have inferred capturing of outside variables.
Default values
If parameters are not passed into a function, Terbium will throw a compile-error. However, with default values, they are used instead and no error is thrown:
Default values are lazily-evaluated expressions, so they can reference previous parameters:
The default
operation is also implemented for many types. Placing a ?
after the parameter name makes the parameter default to the value returned by the default operation:
Mutable parameters
Add mut
before any parameter to make the parameter itself mutable:
Keyword arguments
You may specify arguments to a function call by name using keyword arguments:
Positional arguments may not come after keyword arguments:
The Type System
The Terbium type system is a comprehensive type system which will be outlined in this section.
Concrete types
Concrete types are types that are usable, constructible, and unabstract -- that is, structs, enums, classes, and not traits.
What is a struct
?
struct
?A struct
is the most basic way to define a concrete type. A struct has fields in which their values are packed together in memory. For example:
Then, to create an instance of Point
, use a struct literal:
You can use the extend
keyword to add methods on Point
. Let's declare the construct operation so we can declare the struct via a constructor:
Note that any types that implement the construct operation cannot be constructed using the struct-literal syntax. That means Point { ... }
cannot be used to construct point anymore since we declared a constructor for Point
, and will throw an error.
What is a class
?
class
?A class
is a higher level way to define a struct
and its methods, through a more familiar syntax. Here is the same Point
class implemented as a class:
Classes with fields must implement the construct operation in order to have a way to construct the class. No matter what, classes cannot be constructed with struct-literal syntax. Classes that have fields but do not have constructor implementations are called stale classes since they cannot be instantiated.
Desugaring classes
class
is essentially syntax sugar around struct
and extend
:
Here is the same point class from before (with an extra method for completeness):
What is an enum
?
enum
?An enum is an enumeration of many variants of an object represented as one value.
For example, an enumeration of colors:
The Color
enum is seen to have 3 variants. It can only ever be in those three variants. If it is represented otherwise in memory, it is undefined behavior.
The discriminant of enum variants is the value stored in memory that determines which variant an enum value is representing at compile time. Discriminants are automatically determined, however they can also be manually passed:
By the default, the bit-width of the discriminant is automatically determined, based on the largest discriminant out of all variants. The smallest bit-width possible (unsigned) is used, however you can manually specify the representation with the by
keyword:
You can also inherit variants from other enums with a colon:
Composition with traits
Traits are abstractions over classes that perform common behaviors. A particular type can have these behaviors and can share them with other types. Traits can be used to define shared behavior in an abstract way:
Types can implement traits with the extend
keyword:
Classes have another way of implementing traits through the with
keyword, mixing in the trait implementation with the general class declaration:
Inheritance
Classes can inherit from parent concrete types. In this way, they inherit both the fields and their methods.
For example,
Union (sum) types
A union of types in Terbium represents a type that may represent one of many given types. A simple union type can be written as A | B
.
Standard union types are safely represented as enums. This means they take up extra space in memory to store its enum discriminant:
These are also typically referred to as safe unions.
💡 Narrow and wide types
When a type is narrowed, it means that a broader type is turned into one that is more specific. When a type is widened, it means that a specific type is turned into one that is more broad.
For example, a coercion from string
-> uint8[]
is a type widening since string
is more specific than uint8[]
. Similarly, a specification of a uint
as a uint8
is a type narrowing since uint
, which was broad over all unsigned integer types, has been narrowed into a more specified uint8
.
When we union types together, we are widening the types that are compatible with it. However, this narrows the specificity of the value within the function.
For example, take the following scenario:
We can pass either values that meet A
or B
as x
, since they are compatible by the union. However, when we want to use x
, the type has been narrowed by merging properties of both A
and B
into a single type under the hood.
Unsafe unions: RawUnion
RawUnion
The RawUnion
type provides a way to specify a union type that is represented without the enum discriminant. At runtime, this makes the value unsafe to access since we cannot guarantee that the dynamically generated value is truly the type we think it is.
For example, in A | B
, we can check at runtime if the value is A
or B
since we can access the enum discriminant of the value. However, in RawUnion<A, B>
, we cannot, since there is no discriminant known.
Runtime union type coercion
For a given union type A | B
, how would we check whether a type is A
or B
at runtime? How could we widen the type into a more specific A
or B
?
The check can be done with the is
operator, which for x is U
, given x: T
, checks if T
is compatible and more specific than U
:
The coercion can be done with a cast:
This is checked with safe unions and will throw a runtime error if the cast cannot be performed. With RawUnion
s, the cast will always succeed by a simple transmute, which could be undefined behavior!
Product (and) types
Union types are types that require a type to meet at least one of the type constraints. The opposite would be And types, which require a type to meet all type constraints, written as A & B
.
Take the following function:
Since we can guarantee x
is both A
and B
, we can use methods from both traits. &
is useful when making sure parameters meet multiple type bounds or traits; not just one.
Generics and Type bounds
When a type is generic, it means that the application could be generalized over any type. For example, the identity function (takes a parameter and returns it) does not have to be limited to one type:
Here, we are saying "for any type T
, this function will take a parameter of this type, T
, and return a value of the same type T
. For example, T
could be substituted with int
, making the signature identity(val: int) -> int
.
💡 Monomorphization
What was just described in the previous sentence is monomorphization. It turns all uncertain generic types into actual types. By looking at what types the function is generic over when it is called, Terbium can generic a separate function that operates for every type that is used.
For example:
Generics on types
Types which may contain data may be generic over the data it contains. One good example is the List
type, in which the type of the elements in the List
varies. This is why List
takes a type parameter (e.g. List<int32>
), which specifies the type of the elements in the list.
Here is a struct
which is also generic over some type T
:
Type bounds
When being generic over any type, the type will be extremely narrow and we probably won't be able to do much about the type. This is why we may want to only be generic over types that meet a specific bound. Then, the type can be widened so that fields and methods that apply to that bound are usable.
Here, T
is bound by uint
. This means any type compatible with uint
can be used in place of T
, but nothing else:
Type bounds are commonly traits, for example:
Join multiple trait bounds with &
:
The where
clause
where
clauseAdditional type bounds can be added with the where
clause, resembling that of the Rust Programming Language.
For example,
This is the exact same as <T: A & B>
.
Where clauses are not solely a different way to specify type bounds, however when type bounds are specified that way it may make type bounds more expressed and readable. Where clauses can also be used to bound types that are not directly type parameters of the immediate declaration.
For example:
It can also be used to bound the class itself:
Metaprogramming: Macros and decorators
Metaprogramming is the concept that allows code to generate code. In this way, boilerplate and repetitive code can be reduced, and your code could be made more readable or provide a more elegant interface.
Declarative macros
Declarative macros are substitution-like macros which match a token signature against provided tokens, and substitutes them into the given substitution. Declarative macros closely resemble declarative macros in the Rust Programming Language.
For example, replacing repetitive code:
Token types
token
Any token
ident
Any identifier, including soft keywords
string_literal
String literal
int_literal
Integer literal
float_literal
Float literal
bool_literal
true
or false
literal
String, int, float, or bool literals
expr
Expression
stmt
Statement
block
Block of statements
type
Type
vis
Visibility specifier
pattern
Match pattern
deco
Decorator, @pt:
, or @!pt:
specifier
decl
Declaration of a function or type-like
path
Import path, i.e. a.b.{c, d}
You can also use *
to match 0 or more of a token, +
to match 1 or more, and ?
to match 0 or 1:
What's this unhygienic
keyword? Let's talk about macro hygeine.
Macro hygeine
Macros by default are hygenic, in that all variables created inside of the macro are only accessible within the macro scope. For example,
Therefore, if we want to leak the functions a
, b
, and c
we defined above, we will have to use the unhygienic
keyword to make the function able to leak variables defined within. This essentially "removes" the block scope from its expansion:
Decorators
Decorators are annotations prefixed with @
put on top of item declarations to modify their behavior.
Simple function decorators
A function-only decorator can be declared like a decorator in Python, which simply is a function that takes the decorated function as an argument and returns a new function. The decorator function is called at compile-time, so you will only have access to build-dependencies:
Simple-function decorators can take parameters:
The decorator will be called without parenthesis if no parameters are accepted after the initial function. If there are any parameters taken after the function, optional or not, the decorator will have to be called with parenthesis.
Procedural decorators
A procedural decorator generates code using pure Terbium code. It takes the AST (Abstract Syntax Tree) of the function or item being decorated and you can transform and return back a new AST made from the source AST.
Use the decorator
keyword to create a procedural decorator (not decorator func
):
Procedural decorators can also take parameters:
Last updated