45 Commits

Author SHA1 Message Date
caa6d5d0d2
docs: Extended documentation and updated wiki
Signed-off-by: erick-alcachofa <erick@artichoke.dev>
2025-12-30 03:10:06 +00:00
8f15650a42
docs: add comprehensive language overview and syntax example
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

Introduces 'docs/example.arti' as an exhaustive showcase of the
`artichoke` programming language syntax and grammar.

This file serves as a technical specification and "living documentation"
for the language.

Key features demonstrated include:

- Module imports and symbol aliasing (using).
- Generic definitions vs. Turbofish (::<>) instantiations.
- Designated initializers (.field = expr) and member functions (this).
- Tagged enums and explicit error handling via Result<T, E>.
- Pointer (*), mutability ($), and optional (?) type qualifiers.
- Slice syntax, array literals, and specialized suffixes (.*, .#, .[]).
- Advanced control flow: Result/Optional unwrapping (|val|) and labeled
  loops.
- Compile-time reflection using the .@ operator.
2025-12-28 13:43:51 -06:00
bdc1a151aa
docs: update project status and sync EBNF with parser implementation
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

Updates the README to reflect the completion of the front-end parser
infrastructure and shifts the roadmap toward semantic analysis.
Simultaneously synchronizes the formal EBNF grammar with recent
syntactic changes.

- Formalize 'Global Turbofish' (::<>) for all type instantiations.
- Implement '.field = value' syntax for named field initializers.
- Add 'type_initiated_literal' to support anonymous type construction.
- Support optional boundaries in slice syntax ([:], [n:], etc.).
- Update README with a technical checklist and usage instructions for
  the artichoke-c frontend.
- Fix various EBNF syntax errors and missing angle brackets.
2025-12-28 12:33:12 -06:00
25486fbace
fix(parser): support optional start and end indices in slice ranges
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

Update the SliceAccess postfix operator logic to handle the full variety
of slice range syntaxes. This allows for open-ended slices by making the
start and end expressions optional within the brackets.

- Add logic to detect a leading colon for `[:end]` and `[:]` forms.
- Support trailing colons for `[start:]` forms.
- Differentiate between a single index access and a slice range based on
  the presence of the colon operator.
- Update SliceRangeExprNode construction to handle optional boundaries.
2025-12-28 11:26:27 -06:00
c2f37d5702
feat(parser): add support for type-initiated expressions
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

Implement TypeExpression AST node to allow types to be used within
expressions, enabling the parsing of anonymous slice and array
initializers like `[]Type { ... }`.

- Register `[` as a prefix-style token (NUD) in the Pratt parser.
- Add `TypeExpression` node to AST and expression variants.
- Update `toDot` and `toString` visitors for AST visualization.
- Update frontend to open source files directly to fix issues at opening
  paths.
2025-12-28 10:21:24 -06:00
f024334da5
fix(parser): resolve expression ambiguity in switch/match cases
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

Implement precedence capping in `parseExpression` for switch cases to
prevent the parser from misinterpreting the case arrow (`->`) as a
pointer member access operator.

Additionally, increased the binding power of `ModuleAccess` (::) to
ensure namespaced identifiers are correctly resolved within case
patterns before hitting the precedence limit.

- Use `PointerMemberAccess.right` as the precedence floor for cases.
- Update `ModuleAccess` binding power to {23, 24}.
2025-12-28 00:44:02 -06:00
3180ca4662
feat(parser): implement object literals to unify struct and slice syntax
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

Implement support for object literals using a unified syntax for both
struct and slice initialization. Since the parser lacks the semantic
context to distinguish between a struct or a slice at this stage, both
are represented by the new `ObjectLiteral` AST node.

initialization within curly braces following a type expression:
* **Named Initializers**: Uses the `.field = value` syntax (e.g.,
  `Point { .x = 10, .y = 20 }`).
* **Positional Initializers**: Uses a comma-separated list of
  expressions (e.g., `[]i32 { 1, 2, 3 }`).

* Renamed `StructLiteral` and `SliceLiteral` nodes to `ObjectLiteral`.
* Refactored initialization helper nodes (e.g.,
  `StructLiteralNamedFieldInit` is now `ObjectLiteralNamedFieldInit`).
* Unified the representation in `Expressions.hpp` and `Literals.hpp` to
  use a single `ObjectLiteral` struct containing a `type` and an
  optional `initializer`.

* Integrated the opening brace `{` (`opLSquirly`) as a high-precedence
  postfix operator (binding power 19).
* Implemented parsing logic in `Expressions.cpp` to handle the
  transition from a type expression to an object initializer.
* Updated `toDot` and `toString` visitors to handle the unified
  `ObjectLiteral` nodes and their respective initializer variants.

* Improved robustness in `Declarations.cpp` by ensuring list parsing
  correctly handles closing braces in specific edge cases.
2025-12-28 00:12:43 -06:00
8dd75e3b8a
feat(parser): support turbofish operator and specialize access expressions
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

Overhaul the AST and parser logic to support explicit generic
instantiation in expressions (e.g., `Result::<u32, u32>::Ok(0)`). This
is achieved by implementing the "turbofish" operator (`::<>`) and
specializing how member and module access are handled.

* Added `GenericExpression` to represent generic instantiations in
  expressions.
* Updated the Pratt parser to look for `<` immediately following a `::`
  (ModuleAccess) operator. If found, it parses a `GenericExpression`
  containing the generic arguments.
* This change resolves the ambiguity between generic lists and
  comparison operators in the expression parser.

* Renamed `PointerAccessExpression` to `PointerMemberAccessExpression`.
* Refactored `MemberAccessExpression` and
  `PointerMemberAccessExpression` to store the member as an
  `ExpressionNode`. This allows the right-hand side of a `.` or `->` to
  be a complex expression (like a generic call).
* Simplified `ModuleAccessExpression` to a binary `left`/`right`
  structure, separating scope resolution from generic instantiation.

* Flattened the `Type` AST: replaced recursive `baseType` structures
  with a `Vector<TypeExpressionNode>` (`typeNodes`) to represent
  namespaced paths (e.g., `std::collections::Map`) more efficiently.
* Removed redundant `NamespacedType` and `NamespacedIdentifier` nodes.
* Simplified `GenericType` and `IdentifierType` to use direct `String`
  type names.

* Refactored `parseType` to iterate through namespaced components and
  populate the new flattened `typeNodes` vector.
* Updated the Pratt infix loop to correctly dispatch to `ModuleAccess`,
  `MemberAccess`, or `GenericExpression` based on the operator and
  lookahead tokens.
* Adjusted `toDot` and `toString` visitors to match the new AST
  definitions.
2025-12-27 23:08:26 -06:00
09c44f3b67
fix(parser): resolve generic nesting ambiguity by splitting >> tokens
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

Refactor the termination logic for generic parameter lists in type
parsing to correctly handle nested generics. By replacing manual peeking
with `peekExpect(TokenV::opGt)`, the parser now correctly handles cases
where two closing angle brackets appear consecutively (e.g.,
`List<List<Int>>`).

Previously, the parser manually checked for a literal `>` token. If the
lexer encountered `>>` (a right-shift operator), the parser would fail
to recognize it as two closing brackets. The transition to `peekExpect`
allows the tokenizer to "split" the `>>` token into two individual `>`
tokens when a single closing bracket is expected, resolving the classic
nested template ambiguity.

Key changes:
- Replaced manual token validation and error reporting with
  `peekExpect`.
- Enabled support for nested generic types without requiring spaces
  between closing brackets.
- Simplified the `keepParsing` loop state in `lib/src/Parser/Types.cpp`.
2025-12-26 23:54:17 -06:00
5762497f56
feat(parser): implement Pratt expression parsing and refactor operator types
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

Overhaul the expression parsing mechanism to utilize a Pratt (top-down
operator precedence) parser. This change provides a more scalable and
maintainable way to handle operator precedence and associativity
compared to standard recursive descent.

As part of this transition, the nomenclature for operators has been
refined to reflect their position in the grammar (Prefix, Infix,
Postfix) rather than their arity.

* Renamed `UnaryOperator` and `UnaryExpression` to `PrefixOperator` and
  `PrefixExpression`.
* Renamed `BinaryOperator` and `BinaryExpression` to `InfixOperator` and
  `InfixExpression`.
* Renamed `ScopeAccessExpression` to `ModuleAccessExpression`.
* Introduced `PostfixOperator` enum and associated logic for function
  calls, slicing, and reflection attributes.
* Updated `toDot.cpp` and `toString.cpp` to support the new node types
  and renamed operators.

* Added `Pratt.hpp` and `Pratt.cpp` to define `BindingPower` and map
  operators to their respective precedence levels.
* Added `Operators.cpp` to handle token-to-operator mapping and
  classification (isPrefix, isInfix, isPostfix).
* Refactored `Parser::parseExpression` to implement the core Pratt loop
  using binding power comparisons.

* Moved literal parsing logic into a dedicated `Literals.cpp`.
* Implemented explicit parsing methods for `Integer`, `Float`, `Char`,
  `String`, `Boolean`, and `Null` literals.
* Added support for `this` and `_` (underscore) as identifier
  expressions.

* **Prefix**: `!`, `-`, `~`, `&` (MemPtr), `*` (DerefPtr).
* **Infix**: Arithmetic, Comparison, Bitwise, Logical, and all Compound
  Assignments.
* **Postfix**: `()` (Call), `[]` (Slice/Access), `.#` (Slice length),
  `.*` (Slice pointer), and `.@` (Reflection).

* **Missing Literals**: Struct literals and Array literals are not yet
  implemented in the new parsing flow.
* **Node Specialization**: `MemberAccess`, `PointerMemberAccess`, and
  `ModuleAccess` currently use generic infix logic and need to be
  migrated to their specific AST node types.
* **Error Handling**: Literal parsing (specifically `std::stold` and
  `std::stoul`) needs safety checks to prevent potential exceptions
  during conversion.
* **Diagnostics**: Refine the error message for unexpected tokens in
  postfix expressions to explicitly list supported operators.
* **Generic Ambiguity**: Generic type/function instantiation currently
  causes parsing conflicts with comparison operators (e.g., `Foo<T>`).
  This is a known issue that will be resolved by transitioning the
  grammar to a turbofish-style `::<...>` syntax.
2025-12-26 23:32:49 -06:00
30d64d9b65
fix(parser): support empty blocks, support nested scoping, and refine loop lookahead
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

This commit addresses several critical issues in the recursive descent
parser, specifically regarding the handling of empty constructs,
statement termination, and AST representation of nested scopes. These
changes bring the implementation in line with the Artichoke EBNF
specification.

* **CodeBlock as Statement:** Added `CodeBlockStmtNode` to the
  `StatementNode` variant. This allows a bare `{}` to be treated as a
  valid statement, enabling manual scoping within functions.
* **Visitor Support:** Updated `toDot.cpp` (Graphviz) and `toString.cpp`
  (Pretty-print) to support the new `CodeBlockStmtNode` during AST
  traversal.

* **Empty Member Lists:** Implemented a pre-loop check for the closing
  brace `}` in `parseStruct` and `parseEnum`. This prevents the parser
  from attempting to parse members in empty declarations (e.g., `struct
  Empty {}`).
* **Diagnostic Accuracy:** Enhanced the member-parsing loop to provide
  better error context. If a member is not followed by a comma or a
  closing brace, the parser now explicitly suggests `',' or '}'` as the
  expected tokens.

* **Nested Scopes:** The parser now correctly identifies a `{` at the
  start of a statement and dispatches to `parseCodeBlock`.
* **Empty Code Blocks:** Added a guard in the block-parsing loop to
  check for `}` immediately after `{`, allowing functions or nested
  scopes to be empty.

* **C-Style For-Loops:** Replaced `match` with `matchAndConsume` for the
  initialization semicolon. This allows the parser to correctly handle
  loops where the initialization is omitted (e.g., `for (; 1; 1)`).

* **Correctness:** Resolves parser hangs or errors when encountering
  empty blocks.
* **Compliance:** Fully supports the EBNF definition of zero-or-more
  members/statements.
* **Visuals:** AST diagrams now accurately reflect nested block
  structures.
2025-12-26 00:28:18 -06:00
a3d5c0ac68
feat(parser): implement full statement parsing and control flow logic
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

Complete the transition from a declarations-only parser to a functional
imperative parser. This commit introduces the implementation for all
major statement types, loop constructs, and core control flow logic.

- **Match Case Update**: Updated `grammar.ebnf` to use pipe delimiters
  `|id|` for unwrapped variables in match cases, replacing the previous
  parenthetical syntax.
- **Labels**: Implemented loop labeling using the `ident := loop`
  syntax. Labels are validated to ensure they only prefix valid loop
  constructs.
- **Labels and Ranges**: Standardized the use of the `:=` operator for
  both loop labels (`label := loop`) and range-for declarations (`let i
  := range`).

- **Conditional Branches**:
    - Fully implemented `if` and `else` statements.
    - Added support for optional variable unwrapping (e.g., `if (expr)
      |val|`).
    - Supported `else if` chaining by recursively parsing if-statements
      within else-branches.
- **Loops**:
    - **C-Style For**: Implemented `for (init; cond; post)` with
      optional initializers and post-loop expressions.
    - **Range For**: Implemented `for (let i := range)` with mutability
      controls.
    - **While & Do-While**: Implemented standard condition-based loops.
    - **Infinite Loop**: Added the explicit `loop` keyword for infinite
      iteration.
    - **Loop Dispatch**: Added a lookahead mechanism in
      `parseForLoopStatement` to differentiate between C-style and
      Range-style loops based on token positioning.

- **Variables**: Implemented `let`/`def` parsing within local scopes,
  including type annotations and initializers.
- **Defer Logic**: Implemented `defer` and `errdefer` for scope-guarded
  execution.
- **Jumps**: Implemented `break`, `continue` (with optional label
  targets), and `return` (with optional expressions).
- **Match & Switch**: Fully implemented branch parsing, with possible
  default cases via the `_` (underscore) keyword.

- **Expression Integration**: Stubbed `parseExpression` in a new
  `Expressions.cpp` to serve as the integration point for value parsing.
- **OverloadSet**: Integrated `OverloadSet` utility in `Statements.cpp`
  to cleanly handle AST node variant visitation for label injection.
- **Error Handling**: Standardized error reporting across all new paths
  using `langException`, providing specific "expected" messages for
  delimiters and keywords.
2025-12-25 23:27:06 -06:00
923b8d7e2d
refactor(parser): move parser utility methods to source file
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

Relocate core parsing utility methods from the header to the
implementation file to reduce header bloat and improve compilation
times.

- **Parser API**: Moved the definitions of `consume()`,
  `matchAndConsume()`, and `match()` from `Parser.hpp` to `Parser.cpp`.
- **Cleanup**: Removed an unused `<print>` include in `Types.cpp`
  discovered during the refactor.
- **Organization**: Methods are now declared in the header and defined
  in the source file, maintaining a cleaner separation between interface
  and implementation.
2025-12-25 13:35:50 -06:00
f83f7761e7
chore(license): Added NOTICE header to all source files
Signed-off-by: erick-alcachofa <erick@artichoke.dev>
2025-12-25 13:17:08 -06:00
b99f3586dc
chore(license): Added NOTICE header to all source files
Signed-off-by: erick-alcachofa <erick@artichoke.dev>
2025-12-25 13:12:41 -06:00
8911702c0d
refactor(parser): overhaul parsing logic and enhance error reporting
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

Major refactoring of the Parser and Tokenizer components to improve code
maintainability, strengthen error messaging, and streamline AST
generation.

This version intentionally focuses on top-level declarations, with
statement parsing stubbed for the next development phase.

- **Path Sanitization**: Added `sanitizePath` to extract filenames from
  input paths, ensuring consistent `unitName` identification regardless
  of directory depth.
- **Improved Output**: Wrapped AST string output in Markdown code blocks
  and added a commented-out entry for the new DOT graph visualization.

- **Unified Consumption**: Replaced manual token checks with a more
  robust `consume()` method that leverages `peekExpect()` for
  centralized error handling.
- **New Predicates**: Introduced `match()` and `matchAndConsume()`
  helpers to handle optional tokens and branching logic without
  redundant peek/consume calls.
- **Exception Handling**: Standardized the use of `langException` across
  all parsing functions, providing more descriptive "Expected X, found
  Y" messages.

- **Declarations**: Refactored `parseTopLevelDeclaration` and
  sub-parsers (Module, Struct, Enum, Fn) to use the new matching
  patterns.
- **Looping Logic**: Replaced recursive-style parsing loops with
  `while(keepParsing)` iterative blocks to prevent stack depth issues
  and clarify termination conditions (e.g., finding a closing brace or
  failing to find a comma).
- **Namespaced Identifiers**: Rewrote `parseNamespacedIdentifier` to
  correctly handle multi-part paths (`A::B::C`) and edge cases.
- **Generic Support**: Improved handling of generic parameter and
  argument lists, ensuring strict enforcement of delimiters like `<` and
  `>`.

- **Contextual Errors**: Updated `peekExpect` to accept a custom
  `message` string, allowing the parser to describe *what* it was
  looking for (e.g., "Expected ';'").
- **Token Lookahead**: Enhanced `peek` and `peekExpect` reliability with
  better bounds checking and buffer management.

- **Removed `lib/src/Parser/AST/AST.cpp`**: Deleted the monolithic AST
  stringification file in favor of the previously introduced modular
  implementations.
- **Build System**: Updated `.gitignore` to ignore
  `cpm-package-lock.cmake`.
2025-12-25 11:41:08 -06:00
e2fa44738f
Merge branch 'main' into parser-dev
Signed-off-by: erick-alcachofa <erick@artichoke.dev>
2025-10-19 21:56:38 -06:00
0a3971f18e
refactor(ast): refactor AST printing to support DOT graph and improved string output
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

This commit refactors the AST printing functionality by moving the
human-readable `toString` implementation into its own file
(`lib/src/Parser/AST/toString.cpp`) and introducing a new `toDot`
function in `lib/src/Parser/AST/toDot.cpp` for generating Graphviz DOT
format output.

The `AST.hpp` header is updated to declare both the new `toDot` function
and the modified `toString` function, which now uses an optional
`prefix` parameter for prettier tree output. The `Token.hpp`/`Token.cpp`
files are also adjusted to have `toString(const TokenV &)` return a
`std::string_view`, and `toString(const Token &)` provides a cleaner
string representation using only the token's value.
2025-10-19 21:54:24 -06:00
66eca2f24a
feat(Parser): Expanding parser capabilities (might clean later)
Signed-off-by: erick-alcachofa <erick@artichoke.dev>
2025-10-16 23:22:19 -06:00
552cda58e7
feat(Parser): Introduce AST toString and basic parser structure
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

This commit introduces the foundational structure for the parser and
Abstract Syntax Tree (AST). It includes a new `Parser.hpp` header that
outlines the primary parsing functions for top-level declarations like
`modules`, `structs`, `enums`, and `functions`. It also adds a
`toString` function for the AST to aid in debugging and visualization.

The commit also updates the `Expected.hpp` utility by adding new error
codes like `ecUnexpectedToken`, `ecExpectedSemicolon`,
`ecImportInsideModule`, and `ecUnimplemented` to provide more granular
and descriptive parsing errors. The `Tokenizer` has been updated to use
these new, more specific exceptions.
2025-10-15 16:12:19 -06:00
583b20230d
fix(Utils): Inherit from Ts in OverloadSet
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

This commit changes the `OverloadSet` utility class to publicly inherit
from its template parameters `Ts...`. This allows the `operator()` from
each provided type to be brought into the overload set, efectively
fixing it's functionality that would be broken otherwise.

It also includes the missing `<ranges>` header in the test utilities.
2025-10-15 16:02:27 -06:00
9626ac07c8
refactor(AST): Make optional some fields in AST nodes
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

This commit refactors several AST nodes to use `Optional<T>` for fields
that are not always present. This includes `attribute` in
`ReflectionExpression`, `elseBranch` in `IfStatement` and
`WhileStatement`, `defaultCase` in `MatchStatement` and
`SwitchStatement`, and `preLoop` and `postLoop` in `CForStatement`. This
change improves the robustness and clarity of the AST by explicitly
modeling optionality.
2025-10-15 15:58:06 -06:00
d979b10bee
feat(AST): Add Uninitialized and missing binary operators to common enums
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

This commit adds `Uninitialized` to several enums in `Common.hpp` to
ensure they are properly initialized.

It also adds missing binary operators like `BitAnd`, `BitXor`,
`Adition`, and `Multiplication`. This change improves the robustness and
functionality of the AST parser.
2025-10-15 15:53:37 -06:00
0e9995ce7e
feat(grammar): enhance aliases and for-loop initializers
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

This commit updates the language grammar to expand the capabilities of
`using` aliases and C-style `for` loops. It also refines where aliases
can be declared. This changes are made after re-analizing the grammar
while creating the AST node types.

* **Aliases:** A `using` alias can now map to any valid `<type>`, such
  as a pointer (`*i32`) or optional (`?string`), instead of just a
  simple `<namespaced_identifier>`.
* **For Loops:** The initializer in a C-style `for` loop can now be a
  general `<expression>` (e.g., `i = 0`) in addition to a full
  `<variable_declaration>`.
* **Scope:** Alias declarations are now restricted to the top level
  (declarations) and are no longer permitted as statements inside
  function bodies.

BREAKING CHANGE: Alias declarations (`using`) are no longer valid inside
function bodies and must be declared at a module or global scope.
2025-10-12 18:58:27 -06:00
5e94021ae5
feat(AST): Add node factory helper and missing identifier node
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

This commit introduces an utility factory function and structural
improvements to the Abstract Syntax Tree (AST).

* Adds a new `ASTNodePtr` C++20 concept to constrain template types to
  be `std::unique_ptr`s pointing to AST nodes.
* Introduces a `MakeNode<T>()` factory function that uses this concept
  to simplify and standardize the creation of new nodes.
* Fixed `NamespacedType` and added the missing `NamespacedIdentifier`
  node.
2025-10-12 18:55:34 -06:00
91aefc27b3
chore: Remove unused headers in Generator.hpp
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

Removed the unused <ranges> and <generator> includes from the
Generator.hpp header file.
2025-10-12 18:54:03 -06:00
c4c3d71cc4
feat(AST): Refactor AST nodes into a multi-file structure
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

This commit refactors the AST (Abstract Syntax Tree) to improve code
organization, clarity, and maintainability. The large single-file AST
definition has been split into multiple, logically grouped header files.

The key changes are:

- **New File Structure**: The single `Node.hpp` file is replaced by a
  modular structure consisting of `Common.hpp`, `Declarations.hpp`,
  `Expressions.hpp`, `Literals.hpp`, `Statements.hpp`, and a new central
  `AST.hpp` header.
- **Improved Naming**: All AST node structs and their aliases have been
  renamed to follow a consistent `[NodeName][NodeType]` convention, such
  as `StructDeclaration` and `StructDeclNode`.
- **Namespace Change**: The `node` namespace has been replaced by
  `arti::lang::ast::nodes` to provide better encapsulation and prevent
  naming conflicts.
- **Type Aliases**: Helper aliases like `String`, `Vector`, and
  `Variant` have been introduced to simplify the code.
2025-10-12 17:40:29 -06:00
9dcd5490e3
feat(AST): Define complete set of AST nodes
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

Introduces the comprehensive header file for the Abstract Syntax Tree,
providing the foundational structures for the parser and subsequent
compiler stages.

This initial version defines all node types required to represent the
language's grammar, including:
- Top-level program structure and module declarations.
- All statement types, including control flow, loops, and deferrals.
- A semantic expression tree designed for a Pratt parser (Unary, Binary,
  Function Calls, etc.).
- A robust, recursive type system for handling complex type signatures.

The design employs modern C++ for safety and clarity:
- `std::unique_ptr` establishes clear ownership of child nodes.
- `std::variant` provides type-safe polymorphism for Statement,
  Expression, and Declaration nodes.
- `std::optional` is used to accurately model optional grammar rules.
- `SourceLocation` is included in every node to support detailed error
  reporting.
2025-10-12 02:07:01 -06:00
de58de2a9a
fix(grammar): Correct ambiguities and restructure expression parsing
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

The grammar contained several structural issues and ambiguities,
particularly in expression parsing, operator precedence, and the
`export` keyword. This commit restructures significant parts of the
grammar to resolve these problems and improve its formal correctness,
making it more suitable for parser generation.

The most relevant changes include:

* **Centralized Export Handling:** Corrects the definition of exports by
  introducing a top-level `<declaration>` rule that distinguishes
  between `<exportable_declaration>` and `<non_exportable_declaration>`.
  This removes the repetitive and ambiguous `export?` prefix from
  multiple individual declarations (`module`, `struct`, `fn`, etc.).

* **Unified Postfix Operations:** Integrates scoped access into the
  suffix operations . This provides an unambiguous and unified
  definition for these common constructs.

* **Updated Identifier Chain Issue:** Several rules in the precedence
  chain ultimately resolved into starting with an identifier, this
  caused ambiguitiy and issues for parsing, this was refactored in order
  to correctly handle the cases.

* **Reduced Ambiguity in Statements:** Refactors complex rules like
  `<variable_declaration>` and `<else_statement>` into smaller, more
  explicit sub-rules (`<variable_declaration_tail>`,
  `<else_statement_tail>`). This eliminates potential parsing conflicts
  and improves the overall clarity of the grammar.

* **Simplified Access Expressions:** Removes the separate
  `<scoped_access_expression>` and `<reflection_expression>` rules.
  Their logic has been integrated directly into the more generic and
  powerful postfix expression system, simplifying the grammar.
2025-10-11 11:30:27 -06:00
6809b7ea1d
fix: remove typo in error messages
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

The error message string for unexpected tokens was prepended with an
erroneous 'O'. This commit removes the typo.
2025-10-11 11:28:40 -06:00
ffd66f1f86
feat: Added peekExpect method for token type validation in tokenizer
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

Introduced `peekExpect(std::size_t, TokenV)` to the Tokenizer class, enabling
token lookahead with explicit token type checks. This method returns an
`Unexpected` error with diagnostic info if the expected token type does not
match the peeked token.

Includes a special case handling (workaround) for distinguishing between
`>` and `>>` tokens when parsing the token stream.
2025-10-05 22:51:16 -06:00
f5be339f43
fix(grammar): Simplify and improve EBNF definition
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

The EBNF grammar definition contained several redundancies,
inconsistencies, and minor omissions. This commit refactors the grammar
to make it more concise, readable, and robust for parsing.

Key changes include:

- **Rule Simplification**: Redundant intermediate rules (`fn_params`,
  `statements`, `assign_expression`) have been removed. Rules like
  `code_block` and `import_target` are now more concisely expressed
  using standard EBNF operators (`?`, `*`).

- **EOF Enforcement**: The top-level `program` rule now requires an
  `<eof>` token. This is a crucial fix to ensure the parser consumes the
  entire file and fails on trailing invalid tokens.

- **Optional Generics**: Generic parameters (`<... >`) are now correctly
  marked as optional on `function`, `struct`, and `enum` declarations,
  which was the original intent.

- **Flexible For-Loops**: The update/increment expression (the third
  part) in a C-style `for` loop is now optional, aligning with behavior
  in languages like C and C++.

- **Primary Expressions**: Primary type expressions failed to parse
  correctly namespaced elements and types, now it's fixed and improved.
2025-10-05 14:49:50 -06:00
bb58b17528
fix: Fixed typo in keyword and added missing do and typename keywords
Signed-off-by: erick-alcachofa <erick@artichoke.dev>
2025-10-04 10:35:03 -06:00
e1b9e054f3
feat(test, tokenizer): Add test suite, in Tokenizer fixed catched issues and range-based API
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

This commit introduces a comprehensive test suite for the tokenizer
using the Catch2 framework. To support this and improve the project
structure, the build system and the tokenizer's API have been
significantly updated.

- Removed `cmake/testing.cmake` as it's no longer needed.
- A new `TokenizerRange` class provides a C++20-style range interface,
  allowing for simple `for-each` loop iteration over tokens. This is
  used extensively in the new tests.

- The CMake build system has been refactored:
    - An `ENABLE_TESTING` option (OFF by default) now controls whether
      the test suite is built.
    - The core library is now compiled into an object library, which is
      then used to produce both a shared (`.so`/`.dll`) and a static
      (`.a`/`.lib`) library. This improves build efficiency and provides
      more flexible linkage options.
    - The frontend executable now links against the static version of
      the library.

- Implemented tests for tokenizer using Catch2 framework, covering
  various cases like identifiers, keywords, numbers, etc. that already
  catched some issues in current implementation.

- Several parsing bugs and edge cases in the tokenizer were fixed,
  including the handling of unterminated strings and invalid numeric
  literals. The README has been updated with instructions for building
  and running tests.
2025-10-03 12:54:41 -06:00
0f8688d3ee
fix: Fix install path in library CMakeLists.txt
Signed-off-by: erick-alcachofa <erick@artichoke.dev>
2025-10-02 23:45:04 -06:00
db29d1f8ba
chore: Fix wiki link in readme
Signed-off-by: erick-alcachofa <erick@artichoke.dev>
2025-10-02 00:28:12 -06:00
f3cc5b90c8
chore: Fix naming of wiki source file
Signed-off-by: erick-alcachofa <erick@artichoke.dev>
2025-10-02 00:26:35 -06:00
bce60cfef8
chore: Updated readme
Signed-off-by: erick-alcachofa <erick@artichoke.dev>
2025-10-01 23:39:32 -06:00
e024c03134
chore: Removed unnecessary bold in titles
Signed-off-by: erick-alcachofa <erick@artichoke.dev>
2025-10-01 23:39:17 -06:00
d0599d374f
feat: Add language grammar and adjusted tokenizer
Signed-off-by: erick-alcachofa <erick@artichoke.dev>

This commit lays the foundational groundwork for the artichoke language
parser by introducing the formal language grammar specification.

The tokenizer was updated to include new operators and keywords, also
added the posibility to handle comments.

Key Additions:
- Implemented support for C-style block comments (`/* ... */`),
  including error handling for unclosed comments.
- Added all necessary tokens for missing keywords (e.g., `module`,
  `export`, `using`, `match`, `loop`) and operators (e.g., `+=`, `:=`,
  `.#`, `.*`, `.@`).
- The `Token` enum has been expanded to reflect the full language
  feature set.

Documentation:
- Added `docs/grammar.ebnf` which contains the official, well-structured
  EBNF grammar for the language.
- Added `docs/readme.md` providing a detailed technical overview of the
  language's features, syntax, and semantics.

BREAKING CHANGE: The `kwVariant` and `kwMut` tokens have been removed to
align with the updated language design defined in the new grammar.
2025-10-01 18:51:09 -06:00
f9051e1c21
fix: Minor fixes
Fixed some minor mistakes (wrong messages/errors) due to copy/pasting
code.

Fixed that digits weren't allowed in identifiers before.

Also minor improvements in some functions/code parts.
2025-06-30 00:31:10 -06:00
85dd8bc10f
chore: Added compile warnings
Enabled compilation warnings and solved compilation problems.

TODO: Is it correct to have the warnings `hard-coded` on the CMakeFiles?
2025-05-10 21:07:49 -06:00
85a34bdd65
feat: Added Token, Tokenizer, Generator, and some utilities
Initial version of Tokenizer and Token
Generator template for coroutines (used in tokenizer)
Utilities like string related functions, TrieMap, and error handling

TODO: Add tests for Tokenizer
TODO: Add tests for Generator
2025-03-10 01:20:23 -06:00
0f4474821d
chore: Added CMake project setup
Added CMake files to set up project build, also added the file tree
structure of the project and clangd related settings
2025-03-04 12:50:53 -06:00
0f68b149da
chore: Initial commit
Initial commit for repo, added .gitignore, LICENSE, NOTICE and README.md
files
2025-03-01 01:27:46 -06:00