Initial Parser Implementation and Feature Completion #1
Loading…
x
Reference in New Issue
Block a user
No description provided.
Delete Branch "parser-dev"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Pull Request: Initial Parser Implementation and Feature Completion
Description
This PR merges the
parser-devbranch intomain, completing the first major stage of theartichokecompiler.This introduces the complete parser for the
artichokeProgramming Language. Moving beyond the existing tokenizer, this implementation provides the full infrastructure required to transformartichokesource code into a structured Abstract Syntax Tree (AST).The parser is built using a hybrid approach: a Handwritten Recursive Descent parser for high-level program structure (modules, functions, statements) and a Pratt (Precedence Climbing) parser for expressions.
Some changes in the language grammar and definition were needed in order to keep a clean context-agnostic parser implementation which was the primary goal until now.
Core Architecture
1. Hybrid Parsing Strategy
2. The
artichokeType System (Initial Integration)TypeNodelogic to handle nested qualifiers including pointers (*), mutability ($), optionals (?), and slices ([]).TypesListparser.3. Syntax Resolution & Disambiguation
::< >): Solved the generic-vs-comparison ambiguity in expressions by implementing the::<>syntax.[]int {1, 2}) by allowing the parser to transition from "Expression Mode" to "Type Mode" when encountering slice starters.->token to function as both a pointer member access operator and a match-case delimiter without ambiguity.Key Language Features Supported
thisparameter support.if/elseandwhilewith variable unwrappers (|val|).loop,for,while,do-while).matchandswitchstatements with arrow-delimited cases.deferanderrdeferfor resource management.[start:end]), Reflection (.@), and Slice conversions (.#,.*).Technical Implementation Details
toDot(Graphviz) andtoStringvisitors for tree debugging.Unexpected<>result types to provide clear error reporting without compiler crashes.Signed-off-by: erick-alcachofa <erick@artichoke.dev> Complete the transition from a declarations-only parser to a functional imperative parser. This commit introduces the implementation for all major statement types, loop constructs, and core control flow logic. - **Match Case Update**: Updated `grammar.ebnf` to use pipe delimiters `|id|` for unwrapped variables in match cases, replacing the previous parenthetical syntax. - **Labels**: Implemented loop labeling using the `ident := loop` syntax. Labels are validated to ensure they only prefix valid loop constructs. - **Labels and Ranges**: Standardized the use of the `:=` operator for both loop labels (`label := loop`) and range-for declarations (`let i := range`). - **Conditional Branches**: - Fully implemented `if` and `else` statements. - Added support for optional variable unwrapping (e.g., `if (expr) |val|`). - Supported `else if` chaining by recursively parsing if-statements within else-branches. - **Loops**: - **C-Style For**: Implemented `for (init; cond; post)` with optional initializers and post-loop expressions. - **Range For**: Implemented `for (let i := range)` with mutability controls. - **While & Do-While**: Implemented standard condition-based loops. - **Infinite Loop**: Added the explicit `loop` keyword for infinite iteration. - **Loop Dispatch**: Added a lookahead mechanism in `parseForLoopStatement` to differentiate between C-style and Range-style loops based on token positioning. - **Variables**: Implemented `let`/`def` parsing within local scopes, including type annotations and initializers. - **Defer Logic**: Implemented `defer` and `errdefer` for scope-guarded execution. - **Jumps**: Implemented `break`, `continue` (with optional label targets), and `return` (with optional expressions). - **Match & Switch**: Fully implemented branch parsing, with possible default cases via the `_` (underscore) keyword. - **Expression Integration**: Stubbed `parseExpression` in a new `Expressions.cpp` to serve as the integration point for value parsing. - **OverloadSet**: Integrated `OverloadSet` utility in `Statements.cpp` to cleanly handle AST node variant visitation for label injection. - **Error Handling**: Standardized error reporting across all new paths using `langException`, providing specific "expected" messages for delimiters and keywords.Signed-off-by: erick-alcachofa <erick@artichoke.dev> This commit addresses several critical issues in the recursive descent parser, specifically regarding the handling of empty constructs, statement termination, and AST representation of nested scopes. These changes bring the implementation in line with the Artichoke EBNF specification. * **CodeBlock as Statement:** Added `CodeBlockStmtNode` to the `StatementNode` variant. This allows a bare `{}` to be treated as a valid statement, enabling manual scoping within functions. * **Visitor Support:** Updated `toDot.cpp` (Graphviz) and `toString.cpp` (Pretty-print) to support the new `CodeBlockStmtNode` during AST traversal. * **Empty Member Lists:** Implemented a pre-loop check for the closing brace `}` in `parseStruct` and `parseEnum`. This prevents the parser from attempting to parse members in empty declarations (e.g., `struct Empty {}`). * **Diagnostic Accuracy:** Enhanced the member-parsing loop to provide better error context. If a member is not followed by a comma or a closing brace, the parser now explicitly suggests `',' or '}'` as the expected tokens. * **Nested Scopes:** The parser now correctly identifies a `{` at the start of a statement and dispatches to `parseCodeBlock`. * **Empty Code Blocks:** Added a guard in the block-parsing loop to check for `}` immediately after `{`, allowing functions or nested scopes to be empty. * **C-Style For-Loops:** Replaced `match` with `matchAndConsume` for the initialization semicolon. This allows the parser to correctly handle loops where the initialization is omitted (e.g., `for (; 1; 1)`). * **Correctness:** Resolves parser hangs or errors when encountering empty blocks. * **Compliance:** Fully supports the EBNF definition of zero-or-more members/statements. * **Visuals:** AST diagrams now accurately reflect nested block structures.>>tokens 09c44f3b67Signed-off-by: erick-alcachofa <erick@artichoke.dev> Implement support for object literals using a unified syntax for both struct and slice initialization. Since the parser lacks the semantic context to distinguish between a struct or a slice at this stage, both are represented by the new `ObjectLiteral` AST node. initialization within curly braces following a type expression: * **Named Initializers**: Uses the `.field = value` syntax (e.g., `Point { .x = 10, .y = 20 }`). * **Positional Initializers**: Uses a comma-separated list of expressions (e.g., `[]i32 { 1, 2, 3 }`). * Renamed `StructLiteral` and `SliceLiteral` nodes to `ObjectLiteral`. * Refactored initialization helper nodes (e.g., `StructLiteralNamedFieldInit` is now `ObjectLiteralNamedFieldInit`). * Unified the representation in `Expressions.hpp` and `Literals.hpp` to use a single `ObjectLiteral` struct containing a `type` and an optional `initializer`. * Integrated the opening brace `{` (`opLSquirly`) as a high-precedence postfix operator (binding power 19). * Implemented parsing logic in `Expressions.cpp` to handle the transition from a type expression to an object initializer. * Updated `toDot` and `toString` visitors to handle the unified `ObjectLiteral` nodes and their respective initializer variants. * Improved robustness in `Declarations.cpp` by ensuring list parsing correctly handles closing braces in specific edge cases.Signed-off-by: erick-alcachofa <erick@artichoke.dev> Implement precedence capping in `parseExpression` for switch cases to prevent the parser from misinterpreting the case arrow (`->`) as a pointer member access operator. Additionally, increased the binding power of `ModuleAccess` (::) to ensure namespaced identifiers are correctly resolved within case patterns before hitting the precedence limit. - Use `PointerMemberAccess.right` as the precedence floor for cases. - Update `ModuleAccess` binding power to {23, 24}.Signed-off-by: erick-alcachofa <erick@artichoke.dev> Implement TypeExpression AST node to allow types to be used within expressions, enabling the parsing of anonymous slice and array initializers like `[]Type { ... }`. - Register `[` as a prefix-style token (NUD) in the Pratt parser. - Add `TypeExpression` node to AST and expression variants. - Update `toDot` and `toString` visitors for AST visualization. - Update frontend to open source files directly to fix issues at opening paths.