diff --git a/docs/Architecture.md b/docs/Architecture.md new file mode 100644 index 0000000..b920e69 --- /dev/null +++ b/docs/Architecture.md @@ -0,0 +1,40 @@ +# Compiler Architecture + +## Components + +- **Tokenizer:** Coroutine-driven lexer that emits `Token` values lazily, + enabling lookahead and precise diagnostics for keywords, operators, literals, + and comments. A trie-based keyword map plus demangling and string utilities + keep error messages readable. +- **Hybrid Parser:** Combines a handwritten recursive-descent parser for + high-level structure (imports, modules, aliases, declarations, statements) with + a Pratt (precedence-climbing) engine for expressions. Recent merges added + optional slice bounds (`[:end]`, `[start:]`), type-initiated expressions + (`[]Type { ... }`), turbofish disambiguation in generics, and precedence capping + so `->` works for both pointer member access and `match`/`switch` cases. +- **AST:** Hierarchical node definitions under `lib/include/artichoke/Parser/AST` + model compilation units, declarations, statements, expressions, and types. + Visitors such as `toString` (Markdown) and `toDot` (Graphviz) support + visualization and debugging. +- **Frontend CLI:** `frontend/src/main.cpp` normalizes file paths, invokes the + parser, and prints either the structured AST or descriptive diagnostics. +- **Support Utilities:** Shared helpers (`Expected`, trie map, string helpers, + coroutine scaffolding, demangling) provide robust error propagation and + ergonomics throughout the compiler. + +## Workflow + +1. Tokenizer lazily produces tokens via coroutines, supporting lookahead and + rich diagnostics. +2. Recursive-descent routines process declarations and statements, delegating to + the Pratt engine for expressions. The parser constructs ASTs aligned with the + formal grammar. +3. Frontend emits ASTs (`ast::toString`) or clear error messages when parsing + fails. + +## Future Work + +- Semantic analysis (type checking, symbol resolution) building on the expanded + expression and type features already integrated. +- Intermediate representation and code generation backend. +- Tooling support: formatter, language server, extended automated tests. diff --git a/docs/Examples-SamplePrograms.md b/docs/Examples-SamplePrograms.md new file mode 100644 index 0000000..c41cc00 --- /dev/null +++ b/docs/Examples-SamplePrograms.md @@ -0,0 +1,19 @@ +# Sample Program Overview + +This section highlights the language features exercised by the canonical +overview program distributed with the project. + +- Imports: module wildcards, specific symbols, and module aliases. +- Type aliases with `using` for types and functions. +- Generics: struct definitions with ``, turbofish instantiations (e.g., `Point::`). +- Functions: regular functions and methods (`this` parameter syntax), return types via `->`. +- Enums: tagged unions with `Result::` usage and variant initialization (`Err{ -1 }`, `Ok{}`). +- Variables: `let`/`def` with type inference, complex pointer/optional qualifiers (`*$?` combinations). +- Slices: literals `[]Type { ... }`, slicing syntax `[start:end]`, specialized suffixes (`.*`, `.#`, `.[len]`). +- Control flow: if/else with unwrapping, while loops (condition-based and iterator-style), do/while, C-style for loops, range for loops, labeled loops. +- Pattern matching: `match` with bindings, `_` default; `switch` for value cases. +- Resource management: `defer`, `errdefer` for cleanup semantics. +- Reflection: `. @` operator to fetch metadata (`.@`, `. @alignment`, `. @size`). + +These features can be explored by running the parser CLI against any `artichoke` +source file to inspect the resulting AST or diagnostics. diff --git a/docs/GettingStarted.md b/docs/GettingStarted.md new file mode 100644 index 0000000..3dc818d --- /dev/null +++ b/docs/GettingStarted.md @@ -0,0 +1,55 @@ +# Getting Started + +Build and run the `artichoke` parser frontend to experiment with the language +features described in this documentation. + +## Prerequisites + +- C++23 compiler (tested with Clang 17/GCC 13). +- CMake 3.26+. +- Ninja or Make. +- Optional: `ctest` for tokenizer tests. + +## Build the Toolchain + +```bash +cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug +cmake --build build +``` + +The executable `build/frontend/artichoke-c` reads a source file, parses it, and +prints a Markdown AST or diagnostics. + +## Run the Parser + +```bash +./build/frontend/artichoke-c path/to/program.arti +``` + +The CLI prints either a Markdown AST or descriptive diagnostics. + +## Run Tests + +Tokenizer tests live under `tests/Tokenizer/`. + +```bash +cmake --build build --target tests +ctest --test-dir build/tests --output-on-failure +``` + +Enable testing during configuration with `-DENABLE_TESTING=ON`. + +## Repository Layout + +- `frontend/` CLI entry point. +- `lib/` Tokenizer, parser, AST, and utilities. +- `tests/Tokenizer/` Tokenization coverage. +- `docs/` Reference programs and supporting materials. + +## Next Steps + +- Review the [Sample Programs](Examples-SamplePrograms.md) and overview guides + to understand the language. +- Dive into [Language Overview](Language-Overview.md) and + [Control Flow](Language-ControlFlow.md) for targeted explanations. +- Use [Architecture](Architecture.md) if you plan to extend the compiler. diff --git a/docs/Home.md b/docs/Home.md index a2300fb..69ae017 100644 --- a/docs/Home.md +++ b/docs/Home.md @@ -1,286 +1,81 @@ -# The `artichoke` Programming Language: A Technical Overview - -## 1. Introduction - -`artichoke` is a statically-typed, general-purpose programming language designed -with an emphasis on performance, safety, and expressive syntax. It combines -low-level control over memory with modern, high-level features like generics, -algebraic data types, and integrated error handling. This document provides an -overview of the language's features as defined by its core grammar. - -Is highly inspired by C, C++, Rust, and mostly Zig. - -## 2. Basic Syntax & Structure - -### Modules, Imports, and Aliases - -`artichoke` code is organized into modules. The `import` statement is used to bring -symbols from other modules into the current scope. - -* **Importing a specific element:** `import my_module::some_function;` -* **Importing all direct elements of a module:** `import std::*;` -* **Importing an entire submodule:** `import std::memory;` - -The `using` keyword creates a local, more convenient alias for a type, function, -or module name. - -``` -using mem = std::memory; -using FileHandle = std::fs::File; -``` - -### Comments - -The language uses C-style block comments. - -``` -/* This is a multi-line - comment. */ -``` - -## 3. The Type System - -`artichoke`'s type system is strong and static, with a rich set of features for -defining complex data structures. - -### Type Qualifiers - -Qualifiers modify the type to their immediate right, allowing for precise and -complex type definitions. - -* **`*` (Pointer):** Creates a pointer to a type. Pointers cannot be `null`. -* **`$` (Mutable):** Marks a type as mutable. This is used for function - parameters, local variables, and struct fields to allow modification. -* **`?` (Optional):** Marks a type as nullable. An optional type can hold either a - value of its underlying type or `null`. -* **`[]` (Slice):** A "fat pointer" representing a view into a contiguous - sequence of elements. It contains both a pointer to the data and a length. - -These qualifiers can be combined. For example, `*$?int` defines a **pointer to a -mutable optional integer**. - -### Generics - -Generics allow for writing flexible, reusable code that can operate on multiple -types. They are defined using ``. - -``` -/* A generic struct */ -struct Point { - x: T, - y: T -} - -/* A generic function */ -fn scale(lhs: *Point, rhs: T) -> Point { - /* ... */ -} -``` - - -## 4. Declarations - -### Variables - -Variables are declared using the `let` (mutable) and `def` (immutable/constant) -keywords. - -* **Type inference** is supported when the type can be determined from the initializer. -* Variables must be initialized with either a type, a value, or both. - -``` -/* Mutable variable with explicit type */ -let x: i32 = 10; - -/* Immutable variable with type inference */ -def do_you_get_it = meaning_of_life(); -``` - -### Structs - -Structs are composite data types that group together variables under one name. -They support generics. - -``` -struct Rectangle { - top: Point, - bot: Point -} -``` - -**Initialization:** Structs can be initialized using positional or named fields, -but not a mix of both. - -``` -/* Positional initialization */ -def top_left = Point{ 0, 10 }; - -/* Named-field initialization */ -def top_right = Point{ x: 10, y: 10 }; -``` - -### Enums (Tagged Unions) - -Enums define a type that can be one of several different variants. Variants can -optionally hold data. - -``` -enum AssetType { - Texture, - Model, - Sound, -} - -enum Result { - Ok(T), - Err(E) -} -``` - -**Initialization:** Enum variants are accessed using scope resolution (`::`). - -``` -def my_asset = AssetType::Texture; -def success = Result::Ok(100); -``` - -### Functions - -Functions are defined with the fn keyword. The return type is specified after -the parameter list with `->`. - -``` -fn meaning_of_life() -> i32 { - return 42; -} -``` - -#### Member Functions (`this` parameter) - -If the first parameter of a function is declared with the `this` keyword, it can -be called using "member function" syntax. - -``` -/* Definition */ -fn add(this *$Point, other: *Point) { - this->x += other->x; - this->y += other->y; -} - -/* Can be called in two ways: */ -/* Member function syntax */ -my_point.add(&other_point); - -/* Normal function syntax */ -add(&my_point, &other_point); -``` - -## 5. Control Flow - -### `if`/`else` Statements - -`artichoke` supports C-style `if`/`else` and `else if` chains. It also integrates a -powerful unwrapping feature for handling `Result` and optional (`?`) types. - -``` - -/* Standard if/else */ -if (argc < 2) { - return Result::Err(-1); -} - -/* Unwrapping a Result */ -if (foo()) |ok| { - /* `ok` holds the success value */ -} -else |err| { - /* `err` holds the error value */ -} -``` - -### Loops - -The language provides a comprehensive set of looping constructs. - -* **C-Style `for`:** `for (let i \= 0; i \< 10; i \+= 1\) { ... }` -* **Range-based `for`:** `for (let e := arrSlice) { ... }` -* **`while` Loop:** Can optionally have an `else` block that executes when the loop - condition is no longer met. -* **Iterator `while`:** Supports unwrapping `Result`/optional types, executing as - long as the value is valid. -* **`do-while` Loop:** Guarantees the body executes at least once. -* **Infinite `loop`:** `loop { ... }` - -#### Loop Labels and Control - -Loops can be labeled. The `break` and `continue` statements can optionally specify a -label to control nested loops. - -``` -outer_loop := while (condition) { - inner_loop := for (...) { - break outer_loop; - } -} -``` - -## 6. Expressions and Operators - -### Pointer and Member Access - -* **`&` (Address-of):** Gets a pointer to a variable. -* **`*` (Dereference):** Accesses the value a pointer points to. -* **`.` (Member Access):** Accesses a member of a struct value. -* **`->` (Pointer Member Access):** Dereferences a pointer and accesses a member - (`p->x` is shorthand for `(*p).x`). - -### Slice Operators - -Slices have a dedicated set of operators for manipulation. - -* **`[start:end]` (Slicing):** Creates a new slice from an existing one. -* **`.*` (Pointer Access):** Gets the underlying raw pointer of the slice. -* **`.#` (Length Access):** Gets the number of elements in the slice. -* **`.[length]` (Slice from Pointer):** Creates a slice from a raw pointer and a length. - -### Assignment - -The language supports simple (`=`) and compound assignment (`+=`, `*=`, etc.) -operators. - -## 7. Advanced Features - -### Resource Management (`defer` and `errdefer`) - -`artichoke` uses `defer` for deterministic resource management. - -* **`defer`:** Schedules an expression or code block to be executed when the - current scope is exited. Deferred calls are executed in Last-In, First-Out - (LIFO) order. -* **`errdefer`:** Similar to `defer`, but the code is only executed if the scope is - exited due to a function returning an error (an `Err` variant of a `Result`). - -``` -defer call_cleanup(); - -errdefer { - log("An error occurred!"); -} -``` - -### Reflection (`.@`) - -The language provides a compile-time reflection mechanism via the `.@` operator. -It can be applied to values, types, and static members to query metadata. - -* **On values:** `my_variable.@type` -* **On types:** `Point.@size, Point.@alignment` -* **On static members:** `Point::x.@offset` - -``` -/* Gets size in bytes */ -def size_bytes = Point.@size; - -/* Gets string representation of the type */ -def point_name = Point.@typename; -``` +# `artichoke` Language Wiki + +`artichoke` is a modern, statically-typed programming language designed to +satisfy my personal preferences and requirements for programming, combining the +low-level control and powerful modern features like a robust type system, +generics, integrated error handling, and a clean, ergonomic syntax. + +The goal of `artichoke` is to provide a language that is simple, safe, and +productive for programming, eliminating common pitfalls without sacrificing +performance or control. + +## Using This Wiki + +Start with [Getting Started](GettingStarted.md) to build and run the parser. +Continue with the language guide and control-flow chapters for deeper dives into +syntax and semantics. The reference section contains the formal grammar and +token catalog, while the sample programs illustrate how features fit together. +Report any gaps or inconsistencies via issues or patches. + +## Quick Links + +- **Getting Started:** [Getting Started](GettingStarted.md) +- **Language Guide:** [Language Overview](Language-Overview.md) +- **Control Flow:** [Control Flow](Language-ControlFlow.md) +- **Expressions & Operators:** [Expressions & Operators](Language-Expressions.md) +- **Pattern Unwrapping:** [Patterns](Language-Patterns.md) +- **Grammar Reference:** [Grammar Reference](Reference-Grammar.md) +- **Token Reference:** [Token Reference](Reference-Tokens.md) +- **Architecture Overview:** [Architecture](Architecture.md) +- **Sample Programs:** [Sample Programs](Examples-SamplePrograms.md) + +## Core Philosophy & Features + +`artichoke` is built around a few core principles to create a safer, more +productive programming experience: + +* **Explicitness:** Type conversions and error handling are explicit. +* **Safety:** Non-nullable pointers, a robust type system, and deterministic +resource management are prioritized. +* **Unambiguous Design:** A grammar designed for fast, single-pass parsing and +clear error reporting. +* **Modern Ergonomics:** Features like generics, defer, and a clean module +system reduce boilerplate and improve readability. + +The language includes a powerful **generic type system**, first-class **error +handling**, a full suite of **control flow** statements (including match), a +true **module system**, and **compile-time reflection**. + +## Project Status + +`artichoke` is currently in the **early implementation phase**. The front-end +infrastructure is not yet defined but contains a simple program for printing and +visualizing the generated AST, development has shifted now toward semantic +validation. + +- [x] **Lexical Analysis:** Full tokenizer implementation. +- [x] **Syntactic Analysis:** Handwritten Recursive Descent + Pratt Expression +Parser. +- [x] **AST Infrastructure:** Complete Abstract Syntax Tree with Graphviz and +String-Graph based visualization support. +- [ ] **Semantic Analysis (In Progress):** Multi-pass symbol table generation +and type checking. +- [ ] **Backend:** Code generation and optimization. + +## Contributing + +The `artichoke` project is hosted on a personal, self-hosted Gitea instance. If +you are interested in contributing, you have two options: + +1. **Request an Account:** Please contact support@artichoke.dev to request an +account on the Gitea instance. +2. **Submit Patches:** Alternatively, you can send patches or diffs directly to +the same email address. + +In all cases, proper attribution will be given for your contributions in the +source files and/or the project wiki. + +## License + +This project is licensed under the **GNU Affero General Public License v3.0**. +The full license text can be found in the LICENSE file in this repository. diff --git a/docs/Language-ControlFlow.md b/docs/Language-ControlFlow.md new file mode 100644 index 0000000..11c8449 --- /dev/null +++ b/docs/Language-ControlFlow.md @@ -0,0 +1,123 @@ +# Control Flow + +This section outlines the control-flow constructs currently supported by the +`artichoke` parser, including variable declarations, loops, pattern unwrapping, +and resource management. + +## Variable Declarations + +```arti +let x: i32 = 10; +let answer = meaning_of_life(); +def PI: f64 = 3.14159265358979; + +def ptr: *i32 = &answer; +let mutable_pointer: *$i32 = &x; +let complex_pointer: *$*$i32 = &mutable_pointer; +let null_int: ?i32 = null; +``` + +- `let` declares mutable bindings; `def` declares immutable ones. +- Pointer/mutability/optional qualifiers (`*`, `$`, `?`) attach immediately to + the type on their right. + +## `if` / `else` + +```arti +if (foo()) |ok_val| { + /* success path */ +} else |err_val| { + /* error path */ +} + +if (condition) { + /* then */ +} else if (other_condition) { + /* else-if */ +} else { + /* final branch */ +} +``` + +- Unwrap clauses (`|name|`) bind `Result` or optional values for the block. +- Parentheses around conditions are required. + +## `match` + +```arti +match (foo()) { + Result::::Ok |v| -> { + std::io::print("Success!"); + } + _ -> { + /* default */ + } +} +``` + +- Patterns accept type expressions and optional bindings. +- `_` handles unmatched cases. + +## `switch` + +```arti +switch (value) { + 0 -> { /* ... */ } + (1 + 2) -> { /* ... */ } + _ -> { /* ... */ } +} +``` + +- Value-based branching for expressions. + +## Loops + +```arti +while (foo()) |ok_val| { + /* loop while Ok */ +} else |err_val| { + /* handles Err */ +} + +while (foo.next()) |item| { + /* iterator-style loop */ +} + +do { + /* body */ +} while (true); + +for (let i = 0; i < 10; i += 1) { + /* C-style loop */ +} + +for (let element := returns_range_function()) { + /* range loop */ +} + +outer_loop := while (condition) { + inner_loop := for (let i = 0; i < 10; i += 1) { + if (i == 5) { break outer_loop; } + } +} +``` + +- Range loops require `:=` and bind the element name using `let` or `def`. +- Labels (`outer_loop :=`) allow `break`/`continue` to target outer loops. + +## Defer & errdefer + +```arti +defer cleanup(); +errdefer { log_failure(); } +``` + +- `defer` runs at scope exit in reverse order. +- `errdefer` runs only if the function returns an error variant. + +## Return and Expressions + +- `return expr;` or `return;` (when void-like). +- Any expression followed by `;` forms a statement. + +See `docs/example.arti` for the full program showcasing these constructs. diff --git a/docs/Language-Expressions.md b/docs/Language-Expressions.md new file mode 100644 index 0000000..2674164 --- /dev/null +++ b/docs/Language-Expressions.md @@ -0,0 +1,107 @@ +# Expressions & Operators + +`artichoke` uses a Pratt-style expression parser supporting rich infix, prefix, +and postfix syntax. This section summarizes the key behaviors currently +implemented. + +## Literals + +- Numeric literals: `42`, `3.14159`, `10`. +- Character/boolean/null: `'a'`, `true`, `false`, `null`. +- Strings follow double-quoted C-style syntax with escapes. + +All literal tokens map to dedicated AST nodes (`CharLiteral`, `NullLiteral`, +`StringLiteral`, `FloatLiteral`, `IntegerLiteral`, `BooleanLiteral`). + +## Identifiers and Module Access + +- Simple identifiers refer to variables or functions: `x`, `meaning_of_life`. +- Namespaced access uses `::`: `Result::::Err`, `std::memory`. + +## Function Calls and Methods + +```arti +meaning_of_life(); +scale(&point, 2); +block.initialize(2048); +``` + +- Turbofish syntax applies at call sites when generics are involved. +- Methods (declared with `this`) can be invoked as member calls (`expr.method`) + or as regular functions (`method(expr, ...)`). + +## Operators and Precedence + +`artichoke` uses Pratt parsing with the following precedence (lowest to highest): + +1. Assignment: `=`, `+=`, `-=`, `*=`, `/=`, `%=` … +2. Boolean OR: `or`, `||` +3. Boolean AND: `and`, `&&` +4. Comparisons: `==`, `!=`, `<`, `>`, `<=`, `>=` +5. Bitwise OR/XOR/AND: `|`, `^`, `&` +6. Shifts: `<<`, `>>` +7. Addition/Subtraction: `+`, `-` +8. Multiplication/Division/Modulo: `*`, `/`, `%` +9. Prefix: `!`, `-`, `~`, `&`, `*` +10. Postfix and suffix operators + +The sample program demonstrates complex precedence: + +```arti +let calculation = ~5 + 10 * 2 / (length - 1) % 4 << 2 >> 1; +let logic_check = (calculation >= 100 or !true) and (length != 0); + +x = y = length += 10; +``` + +- Assignment chains associate right-to-left. +- Parentheses override precedence as expected. +- Boolean aliases (`or`, `and`) behave like `||`, `&&`. + +## Postfix Operators + +- `slice[index]` and `slice[start:end]` for indexing and slicing. +- `slice.*` to retrieve the raw pointer. +- `slice.#` to obtain length. +- `ptr.[len]` to form a slice from pointer + length. +- `value.member`, `value->member` for object and pointer member access. +- `value.@`, `Type::member.@attribute` for reflection. +- `Type::{ ... }` for object literals (named initializers). + +These suffixes can be chained, e.g., +`optional_ptr->slice[other.# - 1].member_func(list.*, 2).data[0]`. + +## Object Literals + +```arti +Point:: { + .x = lhs->x * rhs, + .y = lhs->y * rhs +} +``` + +- Named initializer syntax `.field = expr` is used consistently to emphasize + readability and order independence. + +## Reflection + +```arti +foo.@; +Point::::x.@alignment; +Point::.@size; +``` + +- Reflection works on values, types, and struct members, returning metadata used + by introspection tools. + +## Error Handling Expressions + +- `Result` values are constructed with variant initializers (`Result::::Err{ -1 }`). +- Unwrapping happens in control-flow statements. + +## AST Rendering + +- `ast::toString` produces the Markdown AST dumps emitted by the CLI; these + align with the structures implied by the example program. + +These behaviors are reflected in the AST output produced by the parser. diff --git a/docs/Language-Overview.md b/docs/Language-Overview.md new file mode 100644 index 0000000..1fa471b --- /dev/null +++ b/docs/Language-Overview.md @@ -0,0 +1,142 @@ +# Language Overview + +Summarizes the core syntax and semantics supported in the current +parser-focused phase of the language. + +## Imports and Aliases + +```arti +import std::memory; +import std::*; +import my_module::some_function; +import my_module::some_typename; + +using mem = std::memory; +using malloc = mem::mem_alloc; +using my_type = my_module::some_typename; +using my_func = my_module::some_function; +``` + +- `import module::symbol;` brings a specific symbol into scope. +- `import module::*;` imports all direct children of `module` (not recursive). +- `using` introduces aliases for modules, types, or functions. + +## Structs and Generics + +```arti +struct Point { + x: T, + y: T +} + +struct Rectangle { + top: Point::, + bot: Point:: +} +``` + +- Generic definitions use ``. +- Instantiations require `::<>` (turbofish) to disambiguate from comparisons. +- Fields use `name: Type` syntax. + +## Functions and Methods + +```arti +fn meaning_of_life() -> i32 { + return 42; +} + +fn scale(lhs: *Point::, rhs: T) -> Point:: { + return Point:: { + .x = lhs->x * rhs, + .y = lhs->y * rhs + }; +} + +fn add(this *Point::, other: *Point::) { + this->x += other->x; + this->y += other->y; +} +``` + +- Return types follow the parameter list via `->`. +- Methods use `this ` as the first parameter, enabling both member and + free-function call styles. + +## Enums and Variants + +```arti +enum Result { + Ok(T), + Err(E) +} + +return Result::::Err{ -1 }; +return Result::::Ok{}; +``` + +- `Result` demonstrates tagged unions with data payloads. +- Variants initialize with braces, optionally containing payloads. + +## Variables, Pointers, Qualifiers + +```arti +let x: i32 = 10; +let answer = meaning_of_life(); +def PI: f64 = 3.14159265358979; + +def ptr: *i32 = &answer; +let mutable_pointer: *$i32 = &x; +let complex_pointer: *$*$i32 = &mutable_pointer; +let null_int: ?i32 = null; +``` + +- `let` for mutable, `def` for immutable bindings. +- Qualifiers `*`, `$`, `?` apply to the immediate type to the right and can be + combined to express rich pointer semantics. + +## Slices and Literals + +```arti +let arrSlice: ?[]i32 = []i32 { 2, 4, 6, 8, 10 }; + +let full = arrSlice[:]; +let range = arrSlice[1:3]; +let head = arrSlice[:2]; +let tail = arrSlice[2:]; + +let memPtr = arrSlice.*; +let memLength = arrSlice.#; +let newSlice = memPtr.[memLength]; +``` + +- `[]Type { ... }` constructs slice literals. +- Slicing syntax mirrors Python with optional start/end. +- Specialized suffixes: + - `expr.*` raw pointer; + - `expr.#` length; + - `ptr.[len]` create slice from pointer + length. + +## Reflection + +```arti +def refl_info = foo.@; +def xalign = Point::::x.@alignment; +def type_size = Point::.@size; +``` + +- `. @` yields metadata for values, types, or struct members. +- Attributes include `@alignment`, `@size`, `@typename`, `@offset`. + +## Resource Management + +```arti +defer cleanup(); +errdefer { log_failure(); } +``` + +- `defer` schedules work at scope exit (LIFO order). +- `errdefer` runs only when the function returns an error variant. + +These constructs appear throughout idiomatic `artichoke` code and are supported by +the current parser. diff --git a/docs/Language-Patterns.md b/docs/Language-Patterns.md new file mode 100644 index 0000000..c59da16 --- /dev/null +++ b/docs/Language-Patterns.md @@ -0,0 +1,80 @@ +# Pattern Unwrapping & Binding + +`artichoke` supports unwrapping `Result` and optional values directly within +control-flow constructs. This section describes the available patterns. + +## `if` / `else` + +```arti +if (foo()) |ok_val| { + /* Ok branch */ +} else |err_val| { + /* Err branch */ +} +``` + +- Using `|name|` after the condition binds the success value (or error value in + the `else` branch) for the scope of that block. +- Works with any type that returns `Result` or `?` (optional) values. + +## `while` Patterns + +```arti +while (foo()) |ok_val| { + /* Loop continues while Ok */ +} else |err_val| { + /* Executes on Err */ +} + +while (foo.next()) |item| { + /* Iterator-style loop until optional becomes empty */ +} +``` + +- The first form keeps looping while the expression yields `Ok`. +- The iterator-style variant continues while the optional contains a value. + +## `match` Cases + +```arti +match (foo()) { + Result::::Ok |v| -> { + std::io::print("Success!"); + } + _ -> { /* Default */ } +} +``` + +- Cases accept type expressions and optional bindings (`|v|`). +- `_` handles the default/remaining patterns. + +## Range Loop Binding + +```arti +for (let element := returns_range_function()) { + /* element is bound for each iteration */ +} +``` + +- Range loops bind the element name chosen in the header. + +## Labels + +```arti +outer_loop := while (condition) { + inner_loop := for (let i = 0; i < 10; i += 1) { + if (i == 5) { break outer_loop; } + } +} +``` + +- Labels let you control nested loops using `break label;` or `continue label;`. + +## Error Reporting + +- When an unwrap clause is malformed (missing pipes, invalid identifier) the + parser emits diagnostics indicating the expected syntax, helping align code + with the documented patterns. + +These patterns appear throughout typical `artichoke` code and are supported by the +current parser. diff --git a/docs/Reference-Grammar.md b/docs/Reference-Grammar.md new file mode 100644 index 0000000..80832c9 --- /dev/null +++ b/docs/Reference-Grammar.md @@ -0,0 +1,368 @@ +# Grammar Reference + +Formal grammar aligned with the current parser implementation. + +``` +/* +================================================================================ +| | +| The Artichoke Programming Language | +| Official EBNF Grammar | +| | +================================================================================ +*/ + +/* --- Program Structure --- */ +/* A program is a sequence of top-level declarations and statements. */ + + = + * + + + = + "export" + | + + = + + | + | + | + + = + + | + | + | + | + | + + = + "module" "{" + ( + | + | + | + | )* + "}" + + = + "import" ";" + + = + ( "::" "*" )? + + = + "using" "=" ";" + + +/* --- Declarations --- */ +/* Rules for defining functions, structs, enums, and their components. */ + + = + "fn" ? "(" ? ")" ( "->" )? + + = + "this" ("," ( "," )* )? + | ( "," )* + + = + ":" + + = + "struct" ? "{" "}" + + = + ( "," )* + + = + ":" + + = + "enum" ? "{" "}" + + = + ( "," )* + + = + ( "(" ")" )? + + = + "<" ">" + + = + ( "," )* + + = + "typename" + + +/* --- Statements & Control Flow --- */ +/* Rules for code blocks, variable declarations, and control structures. */ + + = + "{" * "}" + + = + ";" + | + | ";" + | ";" + | ";" + | ";" + | ";" + | + | + | + | ";" + + = + + + = + ":" ( "=" )? + | "=" + + = + "let" + | "def" + + = + "if" "(" ")" ? + ? + + = + "else" + + = + + | ? + + = + "|" "|" + + = + ( ":=")? ( + + | + | + | + | + ) + + = + "for" "(" ( | )? ";" ";" ? ")" + + + = + "for" "(" ":=" ")" + + + = + "while" "(" ")" ? + ? + + = + "do" "while" "(" ")" + + = + "loop" + + = + "match" "(" ")" "{" * ? "}" + + = + "switch" "(" ")" "{" * ? "}" + + = + ( "|" "|" )? "->" + + = + "->" + + = + "_" "->" + + = + "break" ? + + = + "continue" ? + + = + "defer" ( | ) + + = + "errdefer" ( | ) + + = + "return" ? + + +/* --- Expressions & Operator Precedence --- */ +/* The full expression hierarchy, from lowest to highest precedence. */ + + = + ( ( | ) )? + + = + ( ( "||" | "or" ) )* + + = + ( ( "&&" | "and" ) )* + + = + ( )? + + = + ( )* + + = + ( )* + + = + ( )* + + = + ( )* + + = + * + + = + ( | )* + + +/* --- Primary Expressions & Literals --- */ +/* The highest-precedence expressions, including literals and grouped expressions. */ + + = + + | + | + | ( "{" "}" )? + + = + ( "::" "<" ">" )? + + = + "{" "}" + + = + + | + | + | + | + + = + "(" ")" + + = + "(" ")" + + = + ( ",")* ? + + = + ( | )? ","? + + = + ( "," )* + + = + "." "=" + + = + ( "," )* + + = + "null" + + = + "true" + | "false" + + = /* Assumed to be defined by the tokenizer */ + = /* Assumed to be defined by the tokenizer */ + = /* Assumed to be defined by the tokenizer */ + + +/* --- Operators --- */ +/* Definitions for all operator token sets. */ + + = "=" + = "+=" | "-=" | "*=" | "/=" | "%=" | "&=" | "|=" | "<<=" | ">>=" | "||=" | "&&=" + = "==" | "!=" | ">" | "<" | ">=" | "<=" + = "&" | "^" | "|" + = "<<" | ">>" + = "+" | "-" + = "*" | "/" | "%" + = "!" | "-" | "~" | "&" | "*" + + = + "[" + | "." + | "::" ( "::" "<" ">" )? + | "->" + | ".@" ? + | ".[" "]" + | ".#" + | ".*" + + = + ? + | ":" ? "]" + + = + "]" + | ":" ? "]" + +/* --- Type System --- */ +/* Rules for defining types, type names, and type qualifiers. */ + + = + ? + + = + ( "*" | "[]" ) ? + | "$" ? + | "?" ? + + = + ( "*" | "[]" ) ? + | "$" ? + + = + ( "*" | "[]" ) ? + | "?" ? + + = + ( "::" ( "::" "<" ">" )? )* + + = + ( "::" )* + + = + ( "," )* + + +/* --- Lexical Tokens & Base Definitions --- */ +/* The lowest-level building blocks of the language. */ + + = + + + = + + | + | + + = "_" | [a-z] | [A-Z] + = | + = "0" + = [1-9] + + = E /* Represents an empty terminal string */ + = /* End Of File */ +``` diff --git a/docs/Reference-Tokens.md b/docs/Reference-Tokens.md new file mode 100644 index 0000000..fe0d9e2 --- /dev/null +++ b/docs/Reference-Tokens.md @@ -0,0 +1,31 @@ +# Token Reference + +Token definitions used by the tokenizer. + +## Literals + +- `tkInteger` - decimal integers (`10`, `42`). +- `tkDecimal` - floating-point literals (`3.14159265358979`). +- `tkString` - double-quoted strings. +- `tkCharacter` - character literals. +- `tkIdentifier` - identifiers and names. +- `tkEOF` - end-of-file sentinel. + +## Operators + +Includes all operator tokens such as `::`, `->`, `.[`, `.#`, `.*`, `.@`, and the +compound assignments (`+=`, etc.). + +## Keywords + +`import`, `using`, `struct`, `enum`, `fn`, `return`, `let`, `def`, `if`, `else`, +`for`, `while`, `do`, `loop`, `match`, `switch`, `defer`, `errdefer`, `true`, +`false`, `null`, `typename`, `this`, `_` (default pattern), and logical aliases +`or`, `and`. + +## Notes + +- Keyword recognition uses a trie to ensure single-pass tokenization. +- Multi-character operators (`::`, `.[`, `.#`, `.*`, `.@`, `>>=`) rely on + lookahead logic. +- Generic parsing may split `>>` into `>` + `>` to disambiguate template closers.