Core Specification

Formal core specification v0.1 — the boundary between language frontends and code generation.

The Core Specification defines the formal boundary used by all language frontends. It is the contract that separates frontend concerns (parsing, keyword resolution, surface normalization) from backend concerns (semantic analysis, code generation, execution).

Version: 0.1 File: multilingualprogramming/core/ir.py


Core Object

CoreIRProgram is the typed container that wraps the shared Core AST:

1
2
3
4
5
6
7
8
from dataclasses import dataclass, field

@dataclass
class CoreIRProgram:
    ast: Program              # required: the Core AST
    source_language: str      # required: e.g., "en", "fr", "ja"
    core_version: str = "0.1" # default: current core version
    frontend_metadata: dict = field(default_factory=dict)  # optional

Fields:

Field Type Required Description
ast Program Yes The parsed Core AST
source_language str Yes ISO language code used to parse the source
core_version str No Core specification version (default: "0.1")
frontend_metadata dict No Frontend-specific metadata (author, file path, etc.)

Core Grammar (Minimal)

This minimal grammar captures the main contract between frontends and backends. The actual parser supports a richer grammar.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
Program      ::= Statement*

Statement    ::= LetDecl
               | Assign
               | IfStmt
               | ForStmt
               | WhileStmt
               | MatchStmt
               | FuncDef
               | AsyncFuncDef
               | ClassDef
               | TryStmt
               | WithStmt
               | ReturnStmt
               | YieldStmt
               | ImportStmt
               | ExprStmt
               | GlobalStmt
               | NonlocalStmt
               | DelStmt
               | AssertStmt
               | RaiseStmt
               | BreakStmt
               | ContinueStmt
               | PassStmt

LetDecl      ::= "LET" Identifier [":" TypeAnnotation] "=" Expr
Assign       ::= Target [":" TypeAnnotation] "=" Expr
AugAssign    ::= Target AugOp Expr

IfStmt       ::= "COND_IF" Expr ":" Block
                 ("COND_ELIF" Expr ":" Block)*
                 ("COND_ELSE" ":" Block)?

ForStmt      ::= "LOOP_FOR" Target "IN" Expr ":" Block
                 ("COND_ELSE" ":" Block)?

WhileStmt    ::= "LOOP_WHILE" Expr ":" Block
                 ("COND_ELSE" ":" Block)?

MatchStmt    ::= "MATCH" Expr ":"
                 ("CASE" Pattern ["IF" Expr] ":" Block)+

FuncDef      ::= ["ASYNC"] "FUNC_DEF" Identifier "(" Params ")"
                 ["->"] TypeAnnotation?  ":" Block

ClassDef     ::= "CLASS_DEF" Identifier ["(" Bases ")"] ":" Block

TryStmt      ::= "TRY" ":" Block
                 ("EXCEPT" [ExceptTarget] ":" Block)*
                 ("COND_ELSE" ":" Block)?
                 ("FINALLY" ":" Block)?

WithStmt     ::= ["ASYNC"] "WITH" ContextList ":" Block

Block        ::= INDENT Statement+ DEDENT

Expr         ::= Literal
               | Identifier
               | Call
               | BinaryOp
               | UnaryOp
               | Compare
               | BoolOp
               | Collection
               | Slice
               | FString
               | Lambda
               | Comprehension
               | Await
               | Walrus
               | Ternary
               | Yield
               | YieldFrom
               | Starred

Literal      ::= Integer | Float | String | Boolean | None | Complex

Collection   ::= List | Dict | Set | Tuple

Semantic Concept Tokens

Concept tokens are the boundary between frontend keyword resolution and backend parsing.

Concept Token Meaning Example surface forms
COND_IF Conditional if, si, wenn, もし, إذا
COND_ELIF Else-if elif, sinonsi, sonstewenn
COND_ELSE Else else, sinon, sonst, そうでなければ
LOOP_FOR For loop for, pour, für,
IN In (loop) in, dans, in, , في
LOOP_WHILE While loop while, tantque, solange,
BREAK Break break, arreter, abbrechen
CONTINUE Continue continue, continuer
PASS Pass pass, passer
LET Variable declaration let, soit, sei, 変数, مان
CONST Constant const, constante
GLOBAL Global scope global, mondial, 大域
NONLOCAL Nonlocal scope nonlocal, 非局所
DEL Delete del, supprimer
ASSERT Assert assert, affirmer
FUNC_DEF Function def def, déf, 定义, 関数, دالة, परिभाषा
RETURN Return return, retourner, 戻る
CLASS_DEF Class def class, classe, クラス, صنف, वर्ग
LAMBDA Lambda lambda, ラムダ
YIELD Yield yield, produire, 産出
YIELD_FROM Yield from yield from, より産出
ASYNC Async async, 非同期
AWAIT Await await, 待機
TRY Try try, essayer, 試す
EXCEPT Except except, sauf, 除いて
FINALLY Finally finally, finalement, 最終的に
RAISE Raise raise, soulever, 発生
WITH With with, avec, 付き
AS As as, comme, として
IMPORT Import import, importer, 取込
FROM From from, de, から
MATCH Match match
CASE Case case

Typing / Validation Rules

Current validation enforces:

  1. ast must be a Program node (instance check)
  2. source_language must be a non-empty string

Creating a CoreIRProgram:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
from multilingualprogramming.core.ir import CoreIRProgram
from multilingualprogramming.core.lowering import lower_to_core_ir
from multilingualprogramming import Lexer, Parser

# Parse source
lexer = Lexer(language="fr")
parser = Parser(language="fr")

source = "soit x = 42\nafficher(x)"
tokens = lexer.tokenize(source)
ast = parser.parse(tokens)

# Wrap in CoreIRProgram
core = lower_to_core_ir(ast, source_language="fr")

print(core.source_language)  # "fr"
print(core.core_version)     # "0.1"
print(type(core.ast).__name__)  # "Program"

Forward-Only Property

The system guarantees this compilation direction:

1
CS_lang → CoreAST → CoreIRProgram → Python/WASM

It does not guarantee:

  • Reconstruction of original source from Core IR
  • Lossless round-trip from Core back to any surface language
  • Source-level equivalence checking (only semantic equivalence)

This is by design: the project is a forward compilation framework, not a refactoring or source-transformation system.


Core Version History

Version Changes
0.1 Initial core specification. Basic statement/expression grammar.

Planned for 0.2:

  • Statement/expression sort checks
  • Typed annotation consistency validation
  • Lowering invariants for restricted subsets