me/artichoke-lang

Initial Parser Implementation and Feature Completion #1

Merged

me merged 16 commits from parser-dev into main

2025-12-28 11:54:12 -06:00

Author	SHA1	Message	Date
erick-alcachofa	25486fbace	fix(parser): support optional start and end indices in slice ranges Signed-off-by: erick-alcachofa <erick@artichoke.dev> Update the SliceAccess postfix operator logic to handle the full variety of slice range syntaxes. This allows for open-ended slices by making the start and end expressions optional within the brackets. - Add logic to detect a leading colon for `[:end]` and `[:]` forms. - Support trailing colons for `[start:]` forms. - Differentiate between a single index access and a slice range based on the presence of the colon operator. - Update SliceRangeExprNode construction to handle optional boundaries.	2025-12-28 11:26:27 -06:00
erick-alcachofa	c2f37d5702	feat(parser): add support for type-initiated expressions Signed-off-by: erick-alcachofa <erick@artichoke.dev> Implement TypeExpression AST node to allow types to be used within expressions, enabling the parsing of anonymous slice and array initializers like `[]Type { ... }`. - Register `[` as a prefix-style token (NUD) in the Pratt parser. - Add `TypeExpression` node to AST and expression variants. - Update `toDot` and `toString` visitors for AST visualization. - Update frontend to open source files directly to fix issues at opening paths.	2025-12-28 10:21:24 -06:00
erick-alcachofa	f024334da5	fix(parser): resolve expression ambiguity in switch/match cases Signed-off-by: erick-alcachofa <erick@artichoke.dev> Implement precedence capping in `parseExpression` for switch cases to prevent the parser from misinterpreting the case arrow (`->`) as a pointer member access operator. Additionally, increased the binding power of `ModuleAccess` (::) to ensure namespaced identifiers are correctly resolved within case patterns before hitting the precedence limit. - Use `PointerMemberAccess.right` as the precedence floor for cases. - Update `ModuleAccess` binding power to {23, 24}.	2025-12-28 00:44:02 -06:00
erick-alcachofa	3180ca4662	feat(parser): implement object literals to unify struct and slice syntax Signed-off-by: erick-alcachofa <erick@artichoke.dev> Implement support for object literals using a unified syntax for both struct and slice initialization. Since the parser lacks the semantic context to distinguish between a struct or a slice at this stage, both are represented by the new `ObjectLiteral` AST node. initialization within curly braces following a type expression: * Named Initializers: Uses the `.field = value` syntax (e.g., `Point { .x = 10, .y = 20 }`). * Positional Initializers: Uses a comma-separated list of expressions (e.g., `[]i32 { 1, 2, 3 }`). * Renamed `StructLiteral` and `SliceLiteral` nodes to `ObjectLiteral`. * Refactored initialization helper nodes (e.g., `StructLiteralNamedFieldInit` is now `ObjectLiteralNamedFieldInit`). * Unified the representation in `Expressions.hpp` and `Literals.hpp` to use a single `ObjectLiteral` struct containing a `type` and an optional `initializer`. * Integrated the opening brace `{` (`opLSquirly`) as a high-precedence postfix operator (binding power 19). * Implemented parsing logic in `Expressions.cpp` to handle the transition from a type expression to an object initializer. * Updated `toDot` and `toString` visitors to handle the unified `ObjectLiteral` nodes and their respective initializer variants. * Improved robustness in `Declarations.cpp` by ensuring list parsing correctly handles closing braces in specific edge cases.	2025-12-28 00:12:43 -06:00
erick-alcachofa	8dd75e3b8a	feat(parser): support turbofish operator and specialize access expressions Signed-off-by: erick-alcachofa <erick@artichoke.dev> Overhaul the AST and parser logic to support explicit generic instantiation in expressions (e.g., `Result::<u32, u32>::Ok(0)`). This is achieved by implementing the "turbofish" operator (`::<>`) and specializing how member and module access are handled. * Added `GenericExpression` to represent generic instantiations in expressions. * Updated the Pratt parser to look for `<` immediately following a `::` (ModuleAccess) operator. If found, it parses a `GenericExpression` containing the generic arguments. * This change resolves the ambiguity between generic lists and comparison operators in the expression parser. * Renamed `PointerAccessExpression` to `PointerMemberAccessExpression`. * Refactored `MemberAccessExpression` and `PointerMemberAccessExpression` to store the member as an `ExpressionNode`. This allows the right-hand side of a `.` or `->` to be a complex expression (like a generic call). * Simplified `ModuleAccessExpression` to a binary `left`/`right` structure, separating scope resolution from generic instantiation. * Flattened the `Type` AST: replaced recursive `baseType` structures with a `Vector<TypeExpressionNode>` (`typeNodes`) to represent namespaced paths (e.g., `std::collections::Map`) more efficiently. * Removed redundant `NamespacedType` and `NamespacedIdentifier` nodes. * Simplified `GenericType` and `IdentifierType` to use direct `String` type names. * Refactored `parseType` to iterate through namespaced components and populate the new flattened `typeNodes` vector. * Updated the Pratt infix loop to correctly dispatch to `ModuleAccess`, `MemberAccess`, or `GenericExpression` based on the operator and lookahead tokens. * Adjusted `toDot` and `toString` visitors to match the new AST definitions.	2025-12-27 23:08:26 -06:00
erick-alcachofa	09c44f3b67	fix(parser): resolve generic nesting ambiguity by splitting `>>` tokens Signed-off-by: erick-alcachofa <erick@artichoke.dev> Refactor the termination logic for generic parameter lists in type parsing to correctly handle nested generics. By replacing manual peeking with `peekExpect(TokenV::opGt)`, the parser now correctly handles cases where two closing angle brackets appear consecutively (e.g., `List<List<Int>>`). Previously, the parser manually checked for a literal `>` token. If the lexer encountered `>>` (a right-shift operator), the parser would fail to recognize it as two closing brackets. The transition to `peekExpect` allows the tokenizer to "split" the `>>` token into two individual `>` tokens when a single closing bracket is expected, resolving the classic nested template ambiguity. Key changes: - Replaced manual token validation and error reporting with `peekExpect`. - Enabled support for nested generic types without requiring spaces between closing brackets. - Simplified the `keepParsing` loop state in `lib/src/Parser/Types.cpp`.	2025-12-26 23:54:17 -06:00
erick-alcachofa	5762497f56	feat(parser): implement Pratt expression parsing and refactor operator types Signed-off-by: erick-alcachofa <erick@artichoke.dev> Overhaul the expression parsing mechanism to utilize a Pratt (top-down operator precedence) parser. This change provides a more scalable and maintainable way to handle operator precedence and associativity compared to standard recursive descent. As part of this transition, the nomenclature for operators has been refined to reflect their position in the grammar (Prefix, Infix, Postfix) rather than their arity. * Renamed `UnaryOperator` and `UnaryExpression` to `PrefixOperator` and `PrefixExpression`. * Renamed `BinaryOperator` and `BinaryExpression` to `InfixOperator` and `InfixExpression`. * Renamed `ScopeAccessExpression` to `ModuleAccessExpression`. * Introduced `PostfixOperator` enum and associated logic for function calls, slicing, and reflection attributes. * Updated `toDot.cpp` and `toString.cpp` to support the new node types and renamed operators. * Added `Pratt.hpp` and `Pratt.cpp` to define `BindingPower` and map operators to their respective precedence levels. * Added `Operators.cpp` to handle token-to-operator mapping and classification (isPrefix, isInfix, isPostfix). * Refactored `Parser::parseExpression` to implement the core Pratt loop using binding power comparisons. * Moved literal parsing logic into a dedicated `Literals.cpp`. * Implemented explicit parsing methods for `Integer`, `Float`, `Char`, `String`, `Boolean`, and `Null` literals. * Added support for `this` and `_` (underscore) as identifier expressions. * Prefix: `!`, `-`, `~`, `&` (MemPtr), `` (DerefPtr). Infix: Arithmetic, Comparison, Bitwise, Logical, and all Compound Assignments. * Postfix: `()` (Call), `[]` (Slice/Access), `.#` (Slice length), `.` (Slice pointer), and `.@` (Reflection). Missing Literals: Struct literals and Array literals are not yet implemented in the new parsing flow. * Node Specialization: `MemberAccess`, `PointerMemberAccess`, and `ModuleAccess` currently use generic infix logic and need to be migrated to their specific AST node types. * Error Handling: Literal parsing (specifically `std::stold` and `std::stoul`) needs safety checks to prevent potential exceptions during conversion. * Diagnostics: Refine the error message for unexpected tokens in postfix expressions to explicitly list supported operators. * Generic Ambiguity: Generic type/function instantiation currently causes parsing conflicts with comparison operators (e.g., `Foo<T>`). This is a known issue that will be resolved by transitioning the grammar to a turbofish-style `::<...>` syntax.	2025-12-26 23:32:49 -06:00
erick-alcachofa	30d64d9b65	fix(parser): support empty blocks, support nested scoping, and refine loop lookahead Signed-off-by: erick-alcachofa <erick@artichoke.dev> This commit addresses several critical issues in the recursive descent parser, specifically regarding the handling of empty constructs, statement termination, and AST representation of nested scopes. These changes bring the implementation in line with the Artichoke EBNF specification. * CodeBlock as Statement: Added `CodeBlockStmtNode` to the `StatementNode` variant. This allows a bare `{}` to be treated as a valid statement, enabling manual scoping within functions. * Visitor Support: Updated `toDot.cpp` (Graphviz) and `toString.cpp` (Pretty-print) to support the new `CodeBlockStmtNode` during AST traversal. * Empty Member Lists: Implemented a pre-loop check for the closing brace `}` in `parseStruct` and `parseEnum`. This prevents the parser from attempting to parse members in empty declarations (e.g., `struct Empty {}`). * Diagnostic Accuracy: Enhanced the member-parsing loop to provide better error context. If a member is not followed by a comma or a closing brace, the parser now explicitly suggests `',' or '}'` as the expected tokens. * Nested Scopes: The parser now correctly identifies a `{` at the start of a statement and dispatches to `parseCodeBlock`. * Empty Code Blocks: Added a guard in the block-parsing loop to check for `}` immediately after `{`, allowing functions or nested scopes to be empty. * C-Style For-Loops: Replaced `match` with `matchAndConsume` for the initialization semicolon. This allows the parser to correctly handle loops where the initialization is omitted (e.g., `for (; 1; 1)`). * Correctness: Resolves parser hangs or errors when encountering empty blocks. * Compliance: Fully supports the EBNF definition of zero-or-more members/statements. * Visuals: AST diagrams now accurately reflect nested block structures.	2025-12-26 00:28:18 -06:00
erick-alcachofa	a3d5c0ac68	feat(parser): implement full statement parsing and control flow logic Signed-off-by: erick-alcachofa <erick@artichoke.dev> Complete the transition from a declarations-only parser to a functional imperative parser. This commit introduces the implementation for all major statement types, loop constructs, and core control flow logic. - Match Case Update: Updated `grammar.ebnf` to use pipe delimiters `\|id\|` for unwrapped variables in match cases, replacing the previous parenthetical syntax. - Labels: Implemented loop labeling using the `ident := loop` syntax. Labels are validated to ensure they only prefix valid loop constructs. - Labels and Ranges: Standardized the use of the `:=` operator for both loop labels (`label := loop`) and range-for declarations (`let i := range`). - Conditional Branches: - Fully implemented `if` and `else` statements. - Added support for optional variable unwrapping (e.g., `if (expr) \|val\|`). - Supported `else if` chaining by recursively parsing if-statements within else-branches. - Loops: - C-Style For: Implemented `for (init; cond; post)` with optional initializers and post-loop expressions. - Range For: Implemented `for (let i := range)` with mutability controls. - While & Do-While: Implemented standard condition-based loops. - Infinite Loop: Added the explicit `loop` keyword for infinite iteration. - Loop Dispatch: Added a lookahead mechanism in `parseForLoopStatement` to differentiate between C-style and Range-style loops based on token positioning. - Variables: Implemented `let`/`def` parsing within local scopes, including type annotations and initializers. - Defer Logic: Implemented `defer` and `errdefer` for scope-guarded execution. - Jumps: Implemented `break`, `continue` (with optional label targets), and `return` (with optional expressions). - Match & Switch: Fully implemented branch parsing, with possible default cases via the `_` (underscore) keyword. - Expression Integration: Stubbed `parseExpression` in a new `Expressions.cpp` to serve as the integration point for value parsing. - OverloadSet: Integrated `OverloadSet` utility in `Statements.cpp` to cleanly handle AST node variant visitation for label injection. - Error Handling: Standardized error reporting across all new paths using `langException`, providing specific "expected" messages for delimiters and keywords.	2025-12-25 23:27:06 -06:00
erick-alcachofa	923b8d7e2d	refactor(parser): move parser utility methods to source file Signed-off-by: erick-alcachofa <erick@artichoke.dev> Relocate core parsing utility methods from the header to the implementation file to reduce header bloat and improve compilation times. - Parser API: Moved the definitions of `consume()`, `matchAndConsume()`, and `match()` from `Parser.hpp` to `Parser.cpp`. - Cleanup: Removed an unused `<print>` include in `Types.cpp` discovered during the refactor. - Organization: Methods are now declared in the header and defined in the source file, maintaining a cleaner separation between interface and implementation.	2025-12-25 13:35:50 -06:00
erick-alcachofa	f83f7761e7	chore(license): Added NOTICE header to all source files Signed-off-by: erick-alcachofa <erick@artichoke.dev>	2025-12-25 13:17:08 -06:00
erick-alcachofa	b99f3586dc	chore(license): Added NOTICE header to all source files Signed-off-by: erick-alcachofa <erick@artichoke.dev>	2025-12-25 13:12:41 -06:00
erick-alcachofa	8911702c0d	refactor(parser): overhaul parsing logic and enhance error reporting Signed-off-by: erick-alcachofa <erick@artichoke.dev> Major refactoring of the Parser and Tokenizer components to improve code maintainability, strengthen error messaging, and streamline AST generation. This version intentionally focuses on top-level declarations, with statement parsing stubbed for the next development phase. - Path Sanitization: Added `sanitizePath` to extract filenames from input paths, ensuring consistent `unitName` identification regardless of directory depth. - Improved Output: Wrapped AST string output in Markdown code blocks and added a commented-out entry for the new DOT graph visualization. - Unified Consumption: Replaced manual token checks with a more robust `consume()` method that leverages `peekExpect()` for centralized error handling. - New Predicates: Introduced `match()` and `matchAndConsume()` helpers to handle optional tokens and branching logic without redundant peek/consume calls. - Exception Handling: Standardized the use of `langException` across all parsing functions, providing more descriptive "Expected X, found Y" messages. - Declarations: Refactored `parseTopLevelDeclaration` and sub-parsers (Module, Struct, Enum, Fn) to use the new matching patterns. - Looping Logic: Replaced recursive-style parsing loops with `while(keepParsing)` iterative blocks to prevent stack depth issues and clarify termination conditions (e.g., finding a closing brace or failing to find a comma). - Namespaced Identifiers: Rewrote `parseNamespacedIdentifier` to correctly handle multi-part paths (`A::B::C`) and edge cases. - Generic Support: Improved handling of generic parameter and argument lists, ensuring strict enforcement of delimiters like `<` and `>`. - Contextual Errors: Updated `peekExpect` to accept a custom `message` string, allowing the parser to describe what it was looking for (e.g., "Expected ';'"). - Token Lookahead: Enhanced `peek` and `peekExpect` reliability with better bounds checking and buffer management. - Removed `lib/src/Parser/AST/AST.cpp`: Deleted the monolithic AST stringification file in favor of the previously introduced modular implementations. - Build System: Updated `.gitignore` to ignore `cpm-package-lock.cmake`.	2025-12-25 11:41:08 -06:00
erick-alcachofa	e2fa44738f	Merge branch 'main' into parser-dev Signed-off-by: erick-alcachofa <erick@artichoke.dev>	2025-10-19 21:56:38 -06:00
erick-alcachofa	66eca2f24a	feat(Parser): Expanding parser capabilities (might clean later) Signed-off-by: erick-alcachofa <erick@artichoke.dev>	2025-10-16 23:22:19 -06:00
erick-alcachofa	552cda58e7	feat(Parser): Introduce AST toString and basic parser structure Signed-off-by: erick-alcachofa <erick@artichoke.dev> This commit introduces the foundational structure for the parser and Abstract Syntax Tree (AST). It includes a new `Parser.hpp` header that outlines the primary parsing functions for top-level declarations like `modules`, `structs`, `enums`, and `functions`. It also adds a `toString` function for the AST to aid in debugging and visualization. The commit also updates the `Expected.hpp` utility by adding new error codes like `ecUnexpectedToken`, `ecExpectedSemicolon`, `ecImportInsideModule`, and `ecUnimplemented` to provide more granular and descriptive parsing errors. The `Tokenizer` has been updated to use these new, more specific exceptions.	2025-10-15 16:12:19 -06:00