docs: Extended documentation and updated wiki

Signed-off-by: erick-alcachofa <erick@artichoke.dev>
This commit is contained in:
erick-alcachofa 2025-12-30 03:10:06 +00:00
parent 8f15650a42
commit caa6d5d0d2
Signed by: me
GPG Key ID: 6FA5F8643444BAFA
10 changed files with 1046 additions and 286 deletions

40
docs/Architecture.md Normal file
View File

@ -0,0 +1,40 @@
# Compiler Architecture
## Components
- **Tokenizer:** Coroutine-driven lexer that emits `Token` values lazily,
enabling lookahead and precise diagnostics for keywords, operators, literals,
and comments. A trie-based keyword map plus demangling and string utilities
keep error messages readable.
- **Hybrid Parser:** Combines a handwritten recursive-descent parser for
high-level structure (imports, modules, aliases, declarations, statements) with
a Pratt (precedence-climbing) engine for expressions. Recent merges added
optional slice bounds (`[:end]`, `[start:]`), type-initiated expressions
(`[]Type { ... }`), turbofish disambiguation in generics, and precedence capping
so `->` works for both pointer member access and `match`/`switch` cases.
- **AST:** Hierarchical node definitions under `lib/include/artichoke/Parser/AST`
model compilation units, declarations, statements, expressions, and types.
Visitors such as `toString` (Markdown) and `toDot` (Graphviz) support
visualization and debugging.
- **Frontend CLI:** `frontend/src/main.cpp` normalizes file paths, invokes the
parser, and prints either the structured AST or descriptive diagnostics.
- **Support Utilities:** Shared helpers (`Expected`, trie map, string helpers,
coroutine scaffolding, demangling) provide robust error propagation and
ergonomics throughout the compiler.
## Workflow
1. Tokenizer lazily produces tokens via coroutines, supporting lookahead and
rich diagnostics.
2. Recursive-descent routines process declarations and statements, delegating to
the Pratt engine for expressions. The parser constructs ASTs aligned with the
formal grammar.
3. Frontend emits ASTs (`ast::toString`) or clear error messages when parsing
fails.
## Future Work
- Semantic analysis (type checking, symbol resolution) building on the expanded
expression and type features already integrated.
- Intermediate representation and code generation backend.
- Tooling support: formatter, language server, extended automated tests.

View File

@ -0,0 +1,19 @@
# Sample Program Overview
This section highlights the language features exercised by the canonical
overview program distributed with the project.
- Imports: module wildcards, specific symbols, and module aliases.
- Type aliases with `using` for types and functions.
- Generics: struct definitions with `<typename T>`, turbofish instantiations (e.g., `Point::<i32>`).
- Functions: regular functions and methods (`this` parameter syntax), return types via `->`.
- Enums: tagged unions with `Result::<T, E>` usage and variant initialization (`Err{ -1 }`, `Ok{}`).
- Variables: `let`/`def` with type inference, complex pointer/optional qualifiers (`*$?` combinations).
- Slices: literals `[]Type { ... }`, slicing syntax `[start:end]`, specialized suffixes (`.*`, `.#`, `.[len]`).
- Control flow: if/else with unwrapping, while loops (condition-based and iterator-style), do/while, C-style for loops, range for loops, labeled loops.
- Pattern matching: `match` with bindings, `_` default; `switch` for value cases.
- Resource management: `defer`, `errdefer` for cleanup semantics.
- Reflection: `. @` operator to fetch metadata (`.@`, `. @alignment`, `. @size`).
These features can be explored by running the parser CLI against any `artichoke`
source file to inspect the resulting AST or diagnostics.

55
docs/GettingStarted.md Normal file
View File

@ -0,0 +1,55 @@
# Getting Started
Build and run the `artichoke` parser frontend to experiment with the language
features described in this documentation.
## Prerequisites
- C++23 compiler (tested with Clang 17/GCC 13).
- CMake 3.26+.
- Ninja or Make.
- Optional: `ctest` for tokenizer tests.
## Build the Toolchain
```bash
cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug
cmake --build build
```
The executable `build/frontend/artichoke-c` reads a source file, parses it, and
prints a Markdown AST or diagnostics.
## Run the Parser
```bash
./build/frontend/artichoke-c path/to/program.arti
```
The CLI prints either a Markdown AST or descriptive diagnostics.
## Run Tests
Tokenizer tests live under `tests/Tokenizer/`.
```bash
cmake --build build --target tests
ctest --test-dir build/tests --output-on-failure
```
Enable testing during configuration with `-DENABLE_TESTING=ON`.
## Repository Layout
- `frontend/` CLI entry point.
- `lib/` Tokenizer, parser, AST, and utilities.
- `tests/Tokenizer/` Tokenization coverage.
- `docs/` Reference programs and supporting materials.
## Next Steps
- Review the [Sample Programs](Examples-SamplePrograms.md) and overview guides
to understand the language.
- Dive into [Language Overview](Language-Overview.md) and
[Control Flow](Language-ControlFlow.md) for targeted explanations.
- Use [Architecture](Architecture.md) if you plan to extend the compiler.

View File

@ -1,286 +1,81 @@
# The `artichoke` Programming Language: A Technical Overview
## 1. Introduction
`artichoke` is a statically-typed, general-purpose programming language designed
with an emphasis on performance, safety, and expressive syntax. It combines
low-level control over memory with modern, high-level features like generics,
algebraic data types, and integrated error handling. This document provides an
overview of the language's features as defined by its core grammar.
Is highly inspired by C, C++, Rust, and mostly Zig.
## 2. Basic Syntax & Structure
### Modules, Imports, and Aliases
`artichoke` code is organized into modules. The `import` statement is used to bring
symbols from other modules into the current scope.
* **Importing a specific element:** `import my_module::some_function;`
* **Importing all direct elements of a module:** `import std::*;`
* **Importing an entire submodule:** `import std::memory;`
The `using` keyword creates a local, more convenient alias for a type, function,
or module name.
```
using mem = std::memory;
using FileHandle = std::fs::File;
```
### Comments
The language uses C-style block comments.
```
/* This is a multi-line
comment. */
```
## 3. The Type System
`artichoke`'s type system is strong and static, with a rich set of features for
defining complex data structures.
### Type Qualifiers
Qualifiers modify the type to their immediate right, allowing for precise and
complex type definitions.
* **`*` (Pointer):** Creates a pointer to a type. Pointers cannot be `null`.
* **`$` (Mutable):** Marks a type as mutable. This is used for function
parameters, local variables, and struct fields to allow modification.
* **`?` (Optional):** Marks a type as nullable. An optional type can hold either a
value of its underlying type or `null`.
* **`[]` (Slice):** A "fat pointer" representing a view into a contiguous
sequence of elements. It contains both a pointer to the data and a length.
These qualifiers can be combined. For example, `*$?int` defines a **pointer to a
mutable optional integer**.
### Generics
Generics allow for writing flexible, reusable code that can operate on multiple
types. They are defined using `<typename T>`.
```
/* A generic struct */
struct Point<typename T> {
x: T,
y: T
}
/* A generic function */
fn scale<typename T>(lhs: *Point<T>, rhs: T) -> Point {
/* ... */
}
```
## 4. Declarations
### Variables
Variables are declared using the `let` (mutable) and `def` (immutable/constant)
keywords.
* **Type inference** is supported when the type can be determined from the initializer.
* Variables must be initialized with either a type, a value, or both.
```
/* Mutable variable with explicit type */
let x: i32 = 10;
/* Immutable variable with type inference */
def do_you_get_it = meaning_of_life();
```
### Structs
Structs are composite data types that group together variables under one name.
They support generics.
```
struct Rectangle {
top: Point<i32>,
bot: Point<i32>
}
```
**Initialization:** Structs can be initialized using positional or named fields,
but not a mix of both.
```
/* Positional initialization */
def top_left = Point<i32>{ 0, 10 };
/* Named-field initialization */
def top_right = Point<i32>{ x: 10, y: 10 };
```
### Enums (Tagged Unions)
Enums define a type that can be one of several different variants. Variants can
optionally hold data.
```
enum AssetType {
Texture,
Model,
Sound,
}
enum Result<typename T, typename E> {
Ok(T),
Err(E)
}
```
**Initialization:** Enum variants are accessed using scope resolution (`::`).
```
def my_asset = AssetType::Texture;
def success = Result<i32, string>::Ok(100);
```
### Functions
Functions are defined with the fn keyword. The return type is specified after
the parameter list with `->`.
```
fn meaning_of_life() -> i32 {
return 42;
}
```
#### Member Functions (`this` parameter)
If the first parameter of a function is declared with the `this` keyword, it can
be called using "member function" syntax.
```
/* Definition */
fn add<typename T>(this *$Point<T>, other: *Point<T>) {
this->x += other->x;
this->y += other->y;
}
/* Can be called in two ways: */
/* Member function syntax */
my_point.add(&other_point);
/* Normal function syntax */
add(&my_point, &other_point);
```
## 5. Control Flow
### `if`/`else` Statements
`artichoke` supports C-style `if`/`else` and `else if` chains. It also integrates a
powerful unwrapping feature for handling `Result` and optional (`?`) types.
```
/* Standard if/else */
if (argc < 2) {
return Result::Err(-1);
}
/* Unwrapping a Result */
if (foo()) |ok| {
/* `ok` holds the success value */
}
else |err| {
/* `err` holds the error value */
}
```
### Loops
The language provides a comprehensive set of looping constructs.
* **C-Style `for`:** `for (let i \= 0; i \< 10; i \+= 1\) { ... }`
* **Range-based `for`:** `for (let e := arrSlice) { ... }`
* **`while` Loop:** Can optionally have an `else` block that executes when the loop
condition is no longer met.
* **Iterator `while`:** Supports unwrapping `Result`/optional types, executing as
long as the value is valid.
* **`do-while` Loop:** Guarantees the body executes at least once.
* **Infinite `loop`:** `loop { ... }`
#### Loop Labels and Control
Loops can be labeled. The `break` and `continue` statements can optionally specify a
label to control nested loops.
```
outer_loop := while (condition) {
inner_loop := for (...) {
break outer_loop;
}
}
```
## 6. Expressions and Operators
### Pointer and Member Access
* **`&` (Address-of):** Gets a pointer to a variable.
* **`*` (Dereference):** Accesses the value a pointer points to.
* **`.` (Member Access):** Accesses a member of a struct value.
* **`->` (Pointer Member Access):** Dereferences a pointer and accesses a member
(`p->x` is shorthand for `(*p).x`).
### Slice Operators
Slices have a dedicated set of operators for manipulation.
* **`[start:end]` (Slicing):** Creates a new slice from an existing one.
* **`.*` (Pointer Access):** Gets the underlying raw pointer of the slice.
* **`.#` (Length Access):** Gets the number of elements in the slice.
* **`.[length]` (Slice from Pointer):** Creates a slice from a raw pointer and a length.
### Assignment
The language supports simple (`=`) and compound assignment (`+=`, `*=`, etc.)
operators.
## 7. Advanced Features
### Resource Management (`defer` and `errdefer`)
`artichoke` uses `defer` for deterministic resource management.
* **`defer`:** Schedules an expression or code block to be executed when the
current scope is exited. Deferred calls are executed in Last-In, First-Out
(LIFO) order.
* **`errdefer`:** Similar to `defer`, but the code is only executed if the scope is
exited due to a function returning an error (an `Err` variant of a `Result`).
```
defer call_cleanup();
errdefer {
log("An error occurred!");
}
```
### Reflection (`.@`)
The language provides a compile-time reflection mechanism via the `.@` operator.
It can be applied to values, types, and static members to query metadata.
* **On values:** `my_variable.@type`
* **On types:** `Point<u32>.@size, Point<u32>.@alignment`
* **On static members:** `Point<u32>::x.@offset`
```
/* Gets size in bytes */
def size_bytes = Point<u32>.@size;
/* Gets string representation of the type */
def point_name = Point<u32>.@typename;
```
# `artichoke` Language Wiki
`artichoke` is a modern, statically-typed programming language designed to
satisfy my personal preferences and requirements for programming, combining the
low-level control and powerful modern features like a robust type system,
generics, integrated error handling, and a clean, ergonomic syntax.
The goal of `artichoke` is to provide a language that is simple, safe, and
productive for programming, eliminating common pitfalls without sacrificing
performance or control.
## Using This Wiki
Start with [Getting Started](GettingStarted.md) to build and run the parser.
Continue with the language guide and control-flow chapters for deeper dives into
syntax and semantics. The reference section contains the formal grammar and
token catalog, while the sample programs illustrate how features fit together.
Report any gaps or inconsistencies via issues or patches.
## Quick Links
- **Getting Started:** [Getting Started](GettingStarted.md)
- **Language Guide:** [Language Overview](Language-Overview.md)
- **Control Flow:** [Control Flow](Language-ControlFlow.md)
- **Expressions & Operators:** [Expressions & Operators](Language-Expressions.md)
- **Pattern Unwrapping:** [Patterns](Language-Patterns.md)
- **Grammar Reference:** [Grammar Reference](Reference-Grammar.md)
- **Token Reference:** [Token Reference](Reference-Tokens.md)
- **Architecture Overview:** [Architecture](Architecture.md)
- **Sample Programs:** [Sample Programs](Examples-SamplePrograms.md)
## Core Philosophy & Features
`artichoke` is built around a few core principles to create a safer, more
productive programming experience:
* **Explicitness:** Type conversions and error handling are explicit.
* **Safety:** Non-nullable pointers, a robust type system, and deterministic
resource management are prioritized.
* **Unambiguous Design:** A grammar designed for fast, single-pass parsing and
clear error reporting.
* **Modern Ergonomics:** Features like generics, defer, and a clean module
system reduce boilerplate and improve readability.
The language includes a powerful **generic type system**, first-class **error
handling**, a full suite of **control flow** statements (including match), a
true **module system**, and **compile-time reflection**.
## Project Status
`artichoke` is currently in the **early implementation phase**. The front-end
infrastructure is not yet defined but contains a simple program for printing and
visualizing the generated AST, development has shifted now toward semantic
validation.
- [x] **Lexical Analysis:** Full tokenizer implementation.
- [x] **Syntactic Analysis:** Handwritten Recursive Descent + Pratt Expression
Parser.
- [x] **AST Infrastructure:** Complete Abstract Syntax Tree with Graphviz and
String-Graph based visualization support.
- [ ] **Semantic Analysis (In Progress):** Multi-pass symbol table generation
and type checking.
- [ ] **Backend:** Code generation and optimization.
## Contributing
The `artichoke` project is hosted on a personal, self-hosted Gitea instance. If
you are interested in contributing, you have two options:
1. **Request an Account:** Please contact support@artichoke.dev to request an
account on the Gitea instance.
2. **Submit Patches:** Alternatively, you can send patches or diffs directly to
the same email address.
In all cases, proper attribution will be given for your contributions in the
source files and/or the project wiki.
## License
This project is licensed under the **GNU Affero General Public License v3.0**.
The full license text can be found in the LICENSE file in this repository.

View File

@ -0,0 +1,123 @@
# Control Flow
This section outlines the control-flow constructs currently supported by the
`artichoke` parser, including variable declarations, loops, pattern unwrapping,
and resource management.
## Variable Declarations
```arti
let x: i32 = 10;
let answer = meaning_of_life();
def PI: f64 = 3.14159265358979;
def ptr: *i32 = &answer;
let mutable_pointer: *$i32 = &x;
let complex_pointer: *$*$i32 = &mutable_pointer;
let null_int: ?i32 = null;
```
- `let` declares mutable bindings; `def` declares immutable ones.
- Pointer/mutability/optional qualifiers (`*`, `$`, `?`) attach immediately to
the type on their right.
## `if` / `else`
```arti
if (foo()) |ok_val| {
/* success path */
} else |err_val| {
/* error path */
}
if (condition) {
/* then */
} else if (other_condition) {
/* else-if */
} else {
/* final branch */
}
```
- Unwrap clauses (`|name|`) bind `Result` or optional values for the block.
- Parentheses around conditions are required.
## `match`
```arti
match (foo()) {
Result::<i32, []u8>::Ok |v| -> {
std::io::print("Success!");
}
_ -> {
/* default */
}
}
```
- Patterns accept type expressions and optional bindings.
- `_` handles unmatched cases.
## `switch`
```arti
switch (value) {
0 -> { /* ... */ }
(1 + 2) -> { /* ... */ }
_ -> { /* ... */ }
}
```
- Value-based branching for expressions.
## Loops
```arti
while (foo()) |ok_val| {
/* loop while Ok */
} else |err_val| {
/* handles Err */
}
while (foo.next()) |item| {
/* iterator-style loop */
}
do {
/* body */
} while (true);
for (let i = 0; i < 10; i += 1) {
/* C-style loop */
}
for (let element := returns_range_function()) {
/* range loop */
}
outer_loop := while (condition) {
inner_loop := for (let i = 0; i < 10; i += 1) {
if (i == 5) { break outer_loop; }
}
}
```
- Range loops require `:=` and bind the element name using `let` or `def`.
- Labels (`outer_loop :=`) allow `break`/`continue` to target outer loops.
## Defer & errdefer
```arti
defer cleanup();
errdefer { log_failure(); }
```
- `defer` runs at scope exit in reverse order.
- `errdefer` runs only if the function returns an error variant.
## Return and Expressions
- `return expr;` or `return;` (when void-like).
- Any expression followed by `;` forms a statement.
See `docs/example.arti` for the full program showcasing these constructs.

View File

@ -0,0 +1,107 @@
# Expressions & Operators
`artichoke` uses a Pratt-style expression parser supporting rich infix, prefix,
and postfix syntax. This section summarizes the key behaviors currently
implemented.
## Literals
- Numeric literals: `42`, `3.14159`, `10`.
- Character/boolean/null: `'a'`, `true`, `false`, `null`.
- Strings follow double-quoted C-style syntax with escapes.
All literal tokens map to dedicated AST nodes (`CharLiteral`, `NullLiteral`,
`StringLiteral`, `FloatLiteral`, `IntegerLiteral`, `BooleanLiteral`).
## Identifiers and Module Access
- Simple identifiers refer to variables or functions: `x`, `meaning_of_life`.
- Namespaced access uses `::`: `Result::<void, i32>::Err`, `std::memory`.
## Function Calls and Methods
```arti
meaning_of_life();
scale(&point, 2);
block.initialize(2048);
```
- Turbofish syntax applies at call sites when generics are involved.
- Methods (declared with `this`) can be invoked as member calls (`expr.method`)
or as regular functions (`method(expr, ...)`).
## Operators and Precedence
`artichoke` uses Pratt parsing with the following precedence (lowest to highest):
1. Assignment: `=`, `+=`, `-=`, `*=`, `/=`, `%=`
2. Boolean OR: `or`, `||`
3. Boolean AND: `and`, `&&`
4. Comparisons: `==`, `!=`, `<`, `>`, `<=`, `>=`
5. Bitwise OR/XOR/AND: `|`, `^`, `&`
6. Shifts: `<<`, `>>`
7. Addition/Subtraction: `+`, `-`
8. Multiplication/Division/Modulo: `*`, `/`, `%`
9. Prefix: `!`, `-`, `~`, `&`, `*`
10. Postfix and suffix operators
The sample program demonstrates complex precedence:
```arti
let calculation = ~5 + 10 * 2 / (length - 1) % 4 << 2 >> 1;
let logic_check = (calculation >= 100 or !true) and (length != 0);
x = y = length += 10;
```
- Assignment chains associate right-to-left.
- Parentheses override precedence as expected.
- Boolean aliases (`or`, `and`) behave like `||`, `&&`.
## Postfix Operators
- `slice[index]` and `slice[start:end]` for indexing and slicing.
- `slice.*` to retrieve the raw pointer.
- `slice.#` to obtain length.
- `ptr.[len]` to form a slice from pointer + length.
- `value.member`, `value->member` for object and pointer member access.
- `value.@`, `Type::member.@attribute` for reflection.
- `Type::<T>{ ... }` for object literals (named initializers).
These suffixes can be chained, e.g.,
`optional_ptr->slice[other.# - 1].member_func(list.*, 2).data[0]`.
## Object Literals
```arti
Point::<T> {
.x = lhs->x * rhs,
.y = lhs->y * rhs
}
```
- Named initializer syntax `.field = expr` is used consistently to emphasize
readability and order independence.
## Reflection
```arti
foo.@;
Point::<u32>::x.@alignment;
Point::<u32>.@size;
```
- Reflection works on values, types, and struct members, returning metadata used
by introspection tools.
## Error Handling Expressions
- `Result` values are constructed with variant initializers (`Result::<void, i32>::Err{ -1 }`).
- Unwrapping happens in control-flow statements.
## AST Rendering
- `ast::toString` produces the Markdown AST dumps emitted by the CLI; these
align with the structures implied by the example program.
These behaviors are reflected in the AST output produced by the parser.

142
docs/Language-Overview.md Normal file
View File

@ -0,0 +1,142 @@
# Language Overview
Summarizes the core syntax and semantics supported in the current
parser-focused phase of the language.
## Imports and Aliases
```arti
import std::memory;
import std::*;
import my_module::some_function;
import my_module::some_typename;
using mem = std::memory;
using malloc = mem::mem_alloc;
using my_type = my_module::some_typename;
using my_func = my_module::some_function;
```
- `import module::symbol;` brings a specific symbol into scope.
- `import module::*;` imports all direct children of `module` (not recursive).
- `using` introduces aliases for modules, types, or functions.
## Structs and Generics
```arti
struct Point<typename T> {
x: T,
y: T
}
struct Rectangle {
top: Point::<i32>,
bot: Point::<i32>
}
```
- Generic definitions use `<typename T>`.
- Instantiations require `::<>` (turbofish) to disambiguate from comparisons.
- Fields use `name: Type` syntax.
## Functions and Methods
```arti
fn meaning_of_life() -> i32 {
return 42;
}
fn scale<typename T>(lhs: *Point::<T>, rhs: T) -> Point::<T> {
return Point::<T> {
.x = lhs->x * rhs,
.y = lhs->y * rhs
};
}
fn add<typename T>(this *Point::<T>, other: *Point::<T>) {
this->x += other->x;
this->y += other->y;
}
```
- Return types follow the parameter list via `->`.
- Methods use `this <type>` as the first parameter, enabling both member and
free-function call styles.
## Enums and Variants
```arti
enum Result<typename T, typename E> {
Ok(T),
Err(E)
}
return Result::<void, i32>::Err{ -1 };
return Result::<void, i32>::Ok{};
```
- `Result` demonstrates tagged unions with data payloads.
- Variants initialize with braces, optionally containing payloads.
## Variables, Pointers, Qualifiers
```arti
let x: i32 = 10;
let answer = meaning_of_life();
def PI: f64 = 3.14159265358979;
def ptr: *i32 = &answer;
let mutable_pointer: *$i32 = &x;
let complex_pointer: *$*$i32 = &mutable_pointer;
let null_int: ?i32 = null;
```
- `let` for mutable, `def` for immutable bindings.
- Qualifiers `*`, `$`, `?` apply to the immediate type to the right and can be
combined to express rich pointer semantics.
## Slices and Literals
```arti
let arrSlice: ?[]i32 = []i32 { 2, 4, 6, 8, 10 };
let full = arrSlice[:];
let range = arrSlice[1:3];
let head = arrSlice[:2];
let tail = arrSlice[2:];
let memPtr = arrSlice.*;
let memLength = arrSlice.#;
let newSlice = memPtr.[memLength];
```
- `[]Type { ... }` constructs slice literals.
- Slicing syntax mirrors Python with optional start/end.
- Specialized suffixes:
- `expr.*` raw pointer;
- `expr.#` length;
- `ptr.[len]` create slice from pointer + length.
## Reflection
```arti
def refl_info = foo.@;
def xalign = Point::<u32>::x.@alignment;
def type_size = Point::<u32>.@size;
```
- `. @` yields metadata for values, types, or struct members.
- Attributes include `@alignment`, `@size`, `@typename`, `@offset`.
## Resource Management
```arti
defer cleanup();
errdefer { log_failure(); }
```
- `defer` schedules work at scope exit (LIFO order).
- `errdefer` runs only when the function returns an error variant.
These constructs appear throughout idiomatic `artichoke` code and are supported by
the current parser.

80
docs/Language-Patterns.md Normal file
View File

@ -0,0 +1,80 @@
# Pattern Unwrapping & Binding
`artichoke` supports unwrapping `Result` and optional values directly within
control-flow constructs. This section describes the available patterns.
## `if` / `else`
```arti
if (foo()) |ok_val| {
/* Ok branch */
} else |err_val| {
/* Err branch */
}
```
- Using `|name|` after the condition binds the success value (or error value in
the `else` branch) for the scope of that block.
- Works with any type that returns `Result` or `?` (optional) values.
## `while` Patterns
```arti
while (foo()) |ok_val| {
/* Loop continues while Ok */
} else |err_val| {
/* Executes on Err */
}
while (foo.next()) |item| {
/* Iterator-style loop until optional becomes empty */
}
```
- The first form keeps looping while the expression yields `Ok`.
- The iterator-style variant continues while the optional contains a value.
## `match` Cases
```arti
match (foo()) {
Result::<i32, []u8>::Ok |v| -> {
std::io::print("Success!");
}
_ -> { /* Default */ }
}
```
- Cases accept type expressions and optional bindings (`|v|`).
- `_` handles the default/remaining patterns.
## Range Loop Binding
```arti
for (let element := returns_range_function()) {
/* element is bound for each iteration */
}
```
- Range loops bind the element name chosen in the header.
## Labels
```arti
outer_loop := while (condition) {
inner_loop := for (let i = 0; i < 10; i += 1) {
if (i == 5) { break outer_loop; }
}
}
```
- Labels let you control nested loops using `break label;` or `continue label;`.
## Error Reporting
- When an unwrap clause is malformed (missing pipes, invalid identifier) the
parser emits diagnostics indicating the expected syntax, helping align code
with the documented patterns.
These patterns appear throughout typical `artichoke` code and are supported by the
current parser.

368
docs/Reference-Grammar.md Normal file
View File

@ -0,0 +1,368 @@
# Grammar Reference
Formal grammar aligned with the current parser implementation.
```
/*
================================================================================
| |
| The Artichoke Programming Language |
| Official EBNF Grammar |
| |
================================================================================
*/
/* --- Program Structure --- */
/* A program is a sequence of top-level declarations and statements. */
<program> =
<declaration>*
<eof>
<declaration> =
"export" <exportable_declaration>
| <non_exportable_declaration>
<exportable_declaration> =
<module_statement>
| <struct_declaration>
| <enum_declaration>
| <function_declaration>
<non_exportable_declaration> =
<import_statement>
| <alias_statement>
| <module_statement>
| <struct_declaration>
| <enum_declaration>
| <function_declaration>
<module_statement> =
"module" <namespaced_identifier> "{"
( <module_statement>
| <alias_statement>
| <struct_declaration>
| <enum_declaration>
| <function_declaration> )*
"}"
<import_statement> =
"import" <import_target> ";"
<import_target> =
<namespaced_identifier> ( "::" "*" )?
<alias_statement> =
"using" <identifier> "=" <type> ";"
/* --- Declarations --- */
/* Rules for defining functions, structs, enums, and their components. */
<function_declaration> =
"fn" <identifier> <generic_params>? "(" <fn_params_list>? ")" ( "->" <type> )? <code_block>
<fn_params_list> =
"this" <type> ("," <fn_param> ( "," <fn_param> )* )?
| <fn_param> ( "," <fn_param> )*
<fn_param> =
<identifier> ":" <type>
<struct_declaration> =
"struct" <identifier> <generic_params>? "{" <struct_members> "}"
<struct_members> =
<struct_member> ( "," <struct_member> )*
<struct_member> =
<identifier> ":" <type>
<enum_declaration> =
"enum" <identifier> <generic_params>? "{" <enum_members> "}"
<enum_members> =
<enum_member> ( "," <enum_member> )*
<enum_member> =
<identifier> ( "(" <type> ")" )?
<generic_params> =
"<" <generic_params_list> ">"
<generic_params_list> =
<generic_param> ( "," <generic_param> )*
<generic_param> =
"typename" <identifier>
/* --- Statements & Control Flow --- */
/* Rules for code blocks, variable declarations, and control structures. */
<code_block> =
"{" <statement>* "}"
<statement> =
<variable_declaration> ";"
| <if_statement>
| <defer_statement> ";"
| <errdefer_statement> ";"
| <return_statement> ";"
| <break_statement> ";"
| <continue_statement> ";"
| <match_statement>
| <switch_statement>
| <loop_statement>
| <expression> ";"
<variable_declaration> =
<variable_declarator> <identifier> <variable_declaration_tail>
<variable_declaration_tail> =
":" <type> ( "=" <expression> )?
| "=" <expression>
<variable_declarator> =
"let"
| "def"
<if_statement> =
"if" "(" <expression> ")" <variable_unwrapper>? <code_block>
<else_statement>?
<else_statement> =
"else" <else_statement_tail>
<else_statement_tail> =
<if_statement>
| <variable_unwrapper>? <code_block>
<variable_unwrapper> =
"|" <identifier> "|"
<loop_statement> =
(<identifier> ":=")? (
<c_for_statement>
| <range_for_statement>
| <while_statement>
| <do_while_statement>
| <inf_loop_statement>
)
<c_for_statement> =
"for" "(" ( <variable_declaration> | <expression> )? ";" <expression> ";" <expression>? ")"
<code_block>
<range_for_statement> =
"for" "(" <variable_declarator> <identifier> ":=" <expression> ")"
<code_block>
<while_statement> =
"while" "(" <expression> ")" <variable_unwrapper>? <code_block>
<else_statement>?
<do_while_statement> =
"do" <code_block> "while" "(" <expression> ")"
<inf_loop_statement> =
"loop" <code_block>
<match_statement> =
"match" "(" <expression> ")" "{" <match_case>* <default_case>? "}"
<switch_statement> =
"switch" "(" <expression> ")" "{" <switch_case>* <default_case>? "}"
<match_case> =
<type_name> ( "|" <identifier> "|" )? "->" <code_block>
<switch_case> =
<expression> "->" <code_block>
<default_case> =
"_" "->" <code_block>
<break_statement> =
"break" <identifier>?
<continue_statement> =
"continue" <identifier>?
<defer_statement> =
"defer" ( <expression> | <code_block> )
<errdefer_statement> =
"errdefer" ( <expression> | <code_block> )
<return_statement> =
"return" <expression>?
/* --- Expressions & Operator Precedence --- */
/* The full expression hierarchy, from lowest to highest precedence. */
<expression> =
<bool_or_expression> ( ( <assign_op> | <compound_assign_op> ) <expression> )?
<bool_or_expression> =
<bool_and_expression> ( ( "||" | "or" ) <bool_and_expression> )*
<bool_and_expression> =
<compare_expression> ( ( "&&" | "and" ) <compare_expression> )*
<compare_expression> =
<bitwise_expression> ( <compare_op> <bitwise_expression> )?
<bitwise_expression> =
<bitwise_shift_expression> ( <bitwise_op> <bitwise_shift_expression> )*
<bitwise_shift_expression> =
<addition_expression> ( <bitshift_op> <addition_expression> )*
<addition_expression> =
<multiply_expression> ( <addition_op> <multiply_expression> )*
<multiply_expression> =
<prefix_expression> ( <multiply_op> <prefix_expression> )*
<prefix_expression> =
<prefix_op>* <postfix_expression>
<postfix_expression> =
<primary_expression> ( <suffix_op> | <fn_call_arguments> )*
/* --- Primary Expressions & Literals --- */
/* The highest-precedence expressions, including literals and grouped expressions. */
<primary_expression> =
<grouped_expression>
| <literal>
| <type_initialized_literal>
| <access_expression> ( "{" <struct_literal_body> "}" )?
<access_expression> =
<identifier> ( "::" "<" <types_list> ">" )?
<type_initiated_literal> =
<type> "{" <struct_literal_body> "}"
<literal> =
<char_literal>
| <null_literal>
| <string_literal>
| <number_literal>
| <boolean_literal>
<grouped_expression> =
"(" <expression> ")"
<fn_call_arguments> =
"(" <expression_list> ")"
<expression_list> =
(<expression> ",")* <expression>?
<struct_literal_body> =
( <named_field_list> | <positional_field_list> )? ","?
<named_field_list> =
<named_field_init> ( "," <named_field_init> )*
<named_field_init> =
"." <identifier> "=" <expression>
<positional_field_list> =
<expression> ( "," <expression> )*
<null_literal> =
"null"
<boolean_literal> =
"true"
| "false"
<number_literal> = /* Assumed to be defined by the tokenizer */
<string_literal> = /* Assumed to be defined by the tokenizer */
<char_literal> = /* Assumed to be defined by the tokenizer */
/* --- Operators --- */
/* Definitions for all operator token sets. */
<assign_op> = "="
<compound_assign_op> = "+=" | "-=" | "*=" | "/=" | "%=" | "&=" | "|=" | "<<=" | ">>=" | "||=" | "&&="
<compare_op> = "==" | "!=" | ">" | "<" | ">=" | "<="
<bitwise_op> = "&" | "^" | "|"
<bitshift_op> = "<<" | ">>"
<addition_op> = "+" | "-"
<multiply_op> = "*" | "/" | "%"
<prefix_op> = "!" | "-" | "~" | "&" | "*"
<suffix_op> =
"[" <array_access_tail>
| "." <identifier>
| "::" <identifier> ( "::" "<" <types_list> ">" )?
| "->" <identifier>
| ".@" <identifier>?
| ".[" <expression> "]"
| ".#"
| ".*"
<array_access_tail> =
<expression>? <slice_or_index_tail>
| ":" <expression>? "]"
<slice_or_index_tail> =
"]"
| ":" <expression>? "]"
/* --- Type System --- */
/* Rules for defining types, type names, and type qualifiers. */
<type> =
<type_qualifier_chain>? <type_name>
<type_qualifier_chain> =
( "*" | "[]" ) <type_qualifier_chain>?
| "$" <type_qualifier_chain_after_mutable>?
| "?" <type_qualifier_chain_after_optional>?
<type_qualifier_chain_after_optional> =
( "*" | "[]" ) <type_qualifier_chain>?
| "$" <type_qualifier_chain_after_mutable>?
<type_qualifier_chain_after_mutable> =
( "*" | "[]" ) <type_qualifier_chain>?
| "?" <type_qualifier_chain_after_optional>?
<type_name> =
<access_expression> ( "::" <identifier> ( "::" "<" <types_list> ">" )? )*
<namespaced_identifier> =
<identifier> ( "::" <identifier> )*
<types_list> =
<type> ( "," <types_list> )*
/* --- Lexical Tokens & Base Definitions --- */
/* The lowest-level building blocks of the language. */
<identifier> =
<nondigit> <identifier_tail>
<identifier_tail> =
<empty>
| <nondigit> <identifier_tail>
| <digit> <identifier_tail>
<nondigit> = "_" | [a-z] | [A-Z]
<digit> = <zero> | <nonzero_digit>
<zero> = "0"
<nonzero_digit> = [1-9]
<empty> = E /* Represents an empty terminal string */
<eof> = /* End Of File */
```

31
docs/Reference-Tokens.md Normal file
View File

@ -0,0 +1,31 @@
# Token Reference
Token definitions used by the tokenizer.
## Literals
- `tkInteger` - decimal integers (`10`, `42`).
- `tkDecimal` - floating-point literals (`3.14159265358979`).
- `tkString` - double-quoted strings.
- `tkCharacter` - character literals.
- `tkIdentifier` - identifiers and names.
- `tkEOF` - end-of-file sentinel.
## Operators
Includes all operator tokens such as `::`, `->`, `.[`, `.#`, `.*`, `.@`, and the
compound assignments (`+=`, etc.).
## Keywords
`import`, `using`, `struct`, `enum`, `fn`, `return`, `let`, `def`, `if`, `else`,
`for`, `while`, `do`, `loop`, `match`, `switch`, `defer`, `errdefer`, `true`,
`false`, `null`, `typename`, `this`, `_` (default pattern), and logical aliases
`or`, `and`.
## Notes
- Keyword recognition uses a trie to ensure single-pass tokenization.
- Multi-character operators (`::`, `.[`, `.#`, `.*`, `.@`, `>>=`) rely on
lookahead logic.
- Generic parsing may split `>>` into `>` + `>` to disambiguate template closers.