Skip to content

6. Compiler Design: From AST to Python Runtime Engine (Compiler Architecture)

This might be the most hardcore chapter in this documentation. We must emphasize a core fact: The Nexa language is absolutely not just a SDK library with a bunch of @decorator hastily glossing over various API request abstraction classes mounted on top of Python!

It is a real language (Domain Specific Language, DSL), containing a complete frontend lexical analysis (Lexer/Parser), semantic static inference engine, and complex abstract syntax tree (AST), finally connecting to a powerful target code generator (Target Code Generator) that transpiles to the underlying runtime.

To help geeks who want to explore further or developers who want to contribute PRs to the Nexa core better onboard, this section will comprehensively dissect Nexa's Transpiler compilation principles and Runtime high-concurrency execution model design philosophy.


🛠️ Compilation Pipeline Overview: What Happens When You Hit nexa run?

When you've written an elegant main.nx and hit the compile run command in the terminal, its lifecycle will go through the following four ultra-fast stages:

1. Lexical & Parsing Engine

Nexa does not reuse Python's built-in Parser. In src/nexa_parser.py, we have implemented our own state-machine-based grammar parser (early versions used Lark/PLY-based Parser, now refactored).

The parser will tokenize and assemble statements with Nexa-specific natural expressions containing agent, match intent, >>, etc., line by line into a strict tree structure — AST processing nodes. At this step, any low-level errors such as missing semicolons or misaligned brackets will be directly intercepted.

2. Graph Discovery & Static Type Check

Ordinary languages usually just execute after compiling to AST, but as an agent network orchestration expert, Nexa adds an extra graph construction step here.

When detecting syntax like join(AgentA, AgentB).Summarizer, the compiler doesn't treat it as linear call text. Instead, it draws a Directed Acyclic Graph (DAG) of all agent call hierarchies in memory. It uses underlying graph algorithms (such as Kahn's Algorithm) to find which paths don't have upstream-downstream strong data dependencies, and labels them as "Parallelizable". Moreover, using the protocol mentioned earlier, the engine also performs strict static protocol connectivity cross-validation on node data input and output. If there's a type mismatch in parameter passing, you'll immediately get a big red error at compile time (rather than the expensive runtime phase)!

3. Python Async Coroutine Transpilation Layer

Nexa's current main target platform is Python. The backend code_generator.py traverses the just-optimized AST frontend tree, using the Visitor pattern to "translate" node by node:

  • Converts regular methods to async/await.
  • Automatically outputs protocol as class MyProtocol(BaseModel): (importing Pydantic).
  • Automatically builds powerful state machine engine call code.

4. Landing on Nexa Runtime Execution Engine

The transpiled xxx.py internally actually highly depends on and invokes the Nexa dedicated execution layer library src/runtime/ installed locally.


🆕 v1.0 AVM Architecture: Agent Virtual Machine

Nexa v1.0-alpha introduces the revolutionary Agent Virtual Machine (AVM) — a high-performance, securely isolated agent execution engine written in Rust.

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                    Nexa AVM Architecture                     │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │  Nexa       │  │  AST        │  │  Bytecode           │  │
│  │  Compiler   │→ │  Generator  │→ │  Generator          │  │
│  │  (Rust)     │  │  (Rust)     │  │  (Rust)             │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
│                              ↓                               │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │                    AVM Runtime (Rust)                    │ │
│  │  ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌─────────┐ │ │
│  │  │ Scheduler │ │ Memory    │ │ Tool      │ │ Sandbox │ │ │
│  │  │           │ │ Manager   │ │ Registry  │ │ (WASM)  │ │ │
│  │  └───────────┘ └───────────┘ └───────────┘ └─────────┘ │ │
│  └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Compiler Frontend (Lexer + Parser)

Technology Choices

  • Language: Rust
  • Lexical Analysis: Using logos crate for high-performance lexical analysis
  • Syntax Parsing: Recursive descent parser
  • AST: Type-safe abstract syntax tree representation

Lexer Features

// Nexa Token type definition (simplified version)
#[derive(Logos, Debug, PartialEq)]
enum Token {
    #[token("agent")]
    Agent,

    #[token("flow")]
    Flow,

    #[token("protocol")]
    Protocol,

    #[token("implements")]
    Implements,

    #[token(">>")]
    Pipeline,

    #[token("|>>")]
    FanOut,

    #[token("&>>")]
    Merge,

    // ... more Tokens
}

Bytecode Compiler

Bytecode Instruction Set Design

// Nexa bytecode instruction set
enum NexaOpcode {
    // Agent operations
    AgentCreate { name: StringId },
    AgentRun { agent_id: u32 },
    AgentClone { agent_id: u32 },

    // Flow control - DAG operators
    Pipeline { stages: Vec<u32> },      // >> pipeline
    Fork { agents: Vec<u32> },          // |>> fan-out (parallel wait)
    Merge { strategy: MergeStrategy },  // &>> merge/fan-in
    FireForget { agents: Vec<u32> },    // || fire-and-forget (v0.9.7+)
    Consensus { agents: Vec<u32> },     // && consensus voting (v0.9.7+)
    ConditionalBranch {
        condition: ConditionType,
        true_branch: u32,
        false_branch: u32
    },                                   // ?? conditional branch

    // Memory operations
    MemStore { key: StringId, scope: MemoryScope },
    MemLoad { key: StringId },

    // Tool calls
    ToolCall { tool_id: u32, args: Vec<Value> },

    // Control flow
    Jump { offset: i32 },
    JumpIf { condition: bool, offset: i32 },
    SemanticIf { pattern: StringId },

    // LLM calls
    LLMCall { model: StringId },
    LLMStream { model: StringId },

    // New primitives (v0.9.7+)
    Reason { context: StringId },       // Type-aware reasoning
    WaitForHuman { prompt: StringId },  // Human approval wait
    Break,                              // Loop break
}

AVM Runtime

Core Components

  1. Scheduler
  2. Tokio-based async task scheduling
  3. Priority queue
  4. Load balancing strategies

  5. Memory Manager

  6. Zero-copy message passing
  7. Arena allocator
  8. Vector virtual memory paging

  9. Tool Registry

  10. Dynamic WASM module loading
  11. Python FFI bindings

  12. Sandbox

  13. WASM runtime (wasmtime)
  14. Resource limits
  15. Permission control

🔒 WASM Security Sandbox

Introducing WebAssembly in AVM to provide strong isolation and cross-platform compatibility for external tool execution.

WASM Sandbox Architecture

┌─────────────────────────────────────────────────────────────┐
│                    WASM Sandbox Architecture                 │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌───────────────────────────────────────────────────────┐  │
│  │                    Host Interface                      │  │
│  │  ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐  │  │
│  │  │ stdin/stdout│ │ filesystem  │ │ network (HTTP)  │  │  │
│  │  └─────────────┘ └─────────────┘ └─────────────────┘  │  │
│  └───────────────────────────────────────────────────────┘  │
│                              ↓ WASI                         │
│  ┌───────────────────────────────────────────────────────┐  │
│  │                    WASM Runtime                        │  │
│  │  ┌─────────────────────────────────────────────────┐  │  │
│  │  │  Tool 1 (WASM)  │  Tool 2 (WASM)  │  Tool N     │  │  │
│  │  └─────────────────────────────────────────────────┘  │  │
│  └───────────────────────────────────────────────────────┘  │
│                                                              │
│  Resource Limits: CPU Time | Memory | Filesystem | Network  │
└─────────────────────────────────────────────────────────────┘

Permission Model

// Four-level permission model
enum PermissionLevel {
    None,       // No permissions
    Standard,   // Standard permissions (read, basic network)
    Elevated,   // Elevated permissions (write, more network)
    Full,       // Full permissions
}

// WASM resource limit configuration
struct WasmResourceLimits {
    max_memory_pages: u32,      // Maximum memory pages
    max_table_elements: u32,    // Maximum table elements
    max_cpu_time_ms: u64,       // Maximum CPU time
    max_file_size: u64,         // Maximum file size
    allowed_paths: Vec<PathBuf>, // Allowed access paths
    allowed_hosts: Vec<String>,  // Allowed access hosts
    network_enabled: bool,       // Whether network is allowed
}

impl Default for WasmResourceLimits {
    fn default() -> Self {
        Self {
            max_memory_pages: 256,    // 16MB
            max_table_elements: 1024,
            max_cpu_time_ms: 5000,    // 5 seconds
            max_file_size: 10 * 1024 * 1024, // 10MB
            allowed_paths: vec![],
            allowed_hosts: vec![],
            network_enabled: false,
        }
    }
}

⚡ Smart Scheduler

Dynamically allocating concurrent resources based on system load at the AVM layer.

Scheduling Strategies

Strategy Description Use Case
RoundRobin Round-robin allocation Even load
LeastLoaded Least loaded first Uneven tasks
Adaptive Adaptive adjustment Complex scenarios

Scheduler Features

  • Priority Queue: Task scheduling based on Agent priority
  • Load Balancing: RoundRobin/LeastLoaded/Adaptive strategies
  • DAG Topological Sort: Automatic dependency resolution and parallelism analysis
  • Resource Allocation: Memory, CPU core allocation optimization

🐄 COW Memory: Copy-On-Write State Branching (v1.0.3+)

Nexa v1.0.3-beta introduces COW (Copy-On-Write) memory mechanism, achieving O(1) complexity state branching, enabling efficient implementation of Tree-of-Thoughts and other advanced reasoning patterns.

Core Concept

COW Memory allows creating state branches without copying the entire state:

Original State (S0)
    │
    ├── Branch S1 (only records differences, not full copy)
    │   └── Branch S1' (continues recording differences)
    │
    └── Branch S2 (only records differences)
        └── Branch S2' (continues recording differences)

Working Principle

// COW Memory structure (simplified)
struct CowMemory {
    base_state: Arc<State>,          // Shared base state
    delta_changes: Vec<ChangeRecord>, // Only incremental changes
    parent_ref: Option<Arc<CowMemory>>, // Parent node reference
}

// Create branch: O(1) complexity
fn branch_state(current: &CowMemory) -> CowMemory {
    CowMemory {
        base_state: current.base_state.clone(),  // Shared, not copied
        delta_changes: Vec::new(),                // New delta list
        parent_ref: Some(Arc::new(current.clone())),
    }
}

Performance Advantages

Operation Traditional Copy COW Memory
Create Branch O(n) O(1)
Merge State O(n) O(k) where k = changes
Memory Usage n × branches n + k × branches
Backtrack O(n) O(1)

Use Cases

Tree-of-Thoughts Exploration

// Tree-of-Thoughts exploration example
agent Thinker {
    prompt: "Explore multiple solution approaches"
    memory: "cow"  // Enable COW memory
}

flow explore_solutions {
    problem = "How to optimize system performance?";

    // Create multiple thinking branches
    branch1 = Thinker.run(problem) |>> {
        "Approach A: Database optimization"
    };

    branch2 = Thinker.run(problem) |>> {
        "Approach B: Caching strategy"
    };

    branch3 = Thinker.run(problem) |>> {
        "Approach C: Architecture redesign"
    };

    // Consensus to find best solution
    best_solution = branch1 && branch2 && branch3;
    print(best_solution);
}

Multi-Path Reasoning

// Multi-path reasoning example
agent Reasoner {
    prompt: "Analyze problems from multiple perspectives"
    memory: "cow"
}

flow multi_path_reasoning {
    question = "Is this design decision correct?";

    // Create reasoning branches
    technical = Reasoner.run(question + " from technical perspective");
    business = Reasoner.run(question + " from business perspective");
    user = Reasoner.run(question + " from user experience perspective");

    // Combine decisions
    decision = technical && business && user;
    print(decision);
}

🔄 Work-Stealing Scheduler (v1.0.3+)

v1.0.3-beta also introduces Work-Stealing scheduler, implementing efficient concurrent task scheduling based on Actor model.

Core Mechanism

Work-Stealing scheduler allows idle Workers to "steal" tasks from busy Workers:

┌─────────────────────────────────────────────────┐
│              Global Task Queue                   │
└─────────────────────────────────────────────────┘
         │                │                │
         ▼                ▼                ▼
    ┌─────────┐      ┌─────────┐      ┌─────────┐
    │ Worker1 │      │ Worker2 │      │ Worker3 │
    │ [busy]  │      │ [idle]  │      │ [busy]  │
    └─────────┘      └─────────┘      └─────────┘
         │                │
         │   ← steal task │
         └────────────────┘

Scheduling Algorithm

// Work-Stealing scheduler core logic
struct WorkStealingScheduler {
    workers: Vec<Worker>,
    global_queue: Arc<Mutex<Vec<Task>>>,
}

impl Worker {
    fn run(&self) {
        loop {
            // 1. First get task from local queue
            if let Some(task) = self.local_queue.pop() {
                self.execute(task);
                continue;
            }

            // 2. Local empty, get from global queue
            if let Some(task) = global_queue.lock().pop() {
                self.local_queue.push(task);
                continue;
            }

            // 3. Global empty, steal from other workers
            self.steal_from_others();
        }
    }

    fn steal_from_others(&self) {
        for other in &other_workers {
            if other.local_queue.len() > THRESHOLD {
                // Steal half of tasks
                let stolen = other.local_queue.steal_half();
                self.local_queue.extend(stolen);
                break;
            }
        }
    }
}

Performance Advantages

Feature Traditional Scheduler Work-Stealing
Load Balancing Requires explicit allocation Automatic balance
Idle Workers Wait for tasks Proactively steal
Task Migration Requires extra overhead Zero-cost migration
Concurrency Limited by allocation strategy Maximize utilization

Usage Example

// Concurrent task processing example
@timeout(seconds=120)
agent ParallelProcessor {
    prompt: "Process multiple tasks concurrently"
}

flow parallel_workflow {
    tasks = ["task1", "task2", "task3", "task4", "task5"];

    // Work-Stealing automatically schedules all tasks
    results = tasks |>> ParallelProcessor;

    // Aggregate results
    final = results &>> Aggregator;
    print(final);
}

Configuration Options

// Enable Work-Stealing in project config
// nexa.config.toml
[scheduler]
type = "work_stealing"
workers = 8           # Number of workers
steal_threshold = 2   # Steal threshold
balance_interval = 100  # Load balance check interval (ms)

📄 Vector Virtual Memory Paging

AVM takes over memory and automatically performs vectorized swapping of conversation history.

Core Features

Feature Description
LRU/LFU/Hybrid eviction strategy Intelligent page replacement
Embedding vector similarity search Semantic relevance loading
Transparent page loading Imperceptible memory management
Automatic compression Old page summary compression

Memory Page Management

struct MemoryPage {
    id: PageId,
    content: Vec<Message>,
    embedding: Option<Vec<f32>>,
    last_accessed: Instant,
    access_count: u32,
    compressed: bool,
}

struct ContextPager {
    pages: HashMap<PageId, MemoryPage>,
    active_pages: Vec<PageId>,
    max_active_pages: usize,
    eviction_strategy: EvictionStrategy,
}

🌐 Core AST Node Design: A Perspective on the Codification of "Intent System"

To give you a more intuitive feel, let's take the match intent magic we saw in Chapter 2 as an example:

// Your .nx source code:
match req {
    intent("Check weather") => WeatherBot.run(req)
}

In the underlying parser, it will be encapsulated into the following Python dataclass-based tree node hierarchy (internal simplified version shown):

MatchIntentNode(
    target_variable=VarQueryNode("req"),
    cases=[
        IntentCase(
            condition_literal="Check weather",
            action=RunAgentNode(
                agent_name="WeatherBot", 
                arguments=[VarQueryNode("req")]
            )
        )
    ]
)

Subsequently, the code generator will intelligently transpile this into the following astonishing low-level async call mechanism — directly injected into the runtime scheduling engine, accomplishing in one line what would take you a hundred lines to write manually:

# Generated target Python code overview (extremely simplified version)
_intent_router = runtime.IntentEngine()
_match_res = await _intent_router.classify(
    query=req, 
    candidates=["Check weather", "Other"]
)
if _match_res == "Check weather":
    await runtime.Orchestrator.dispatch("WeatherBot", req)

This dimensional reduction transpilation successfully keeps developers' mental model at the upper-level "natural semantic field", while the underlying layer is taken over by cold and efficient matrix operations and coroutines.


⚙️ Runtime Black Box Revealed: High Concurrency and Defensive Systems

All generated pure .py files ultimately run on the black technologies we preset in the runtime/ namespace. This microkernel contains four guardians:

runtime.memory (Explicit Long-Short Term Memory Scheduling Module)

Mounts and prunes Context Window Management. You can implant sliding window protocols or RAG-based external summaries here, ensuring that long-running Agents won't overflow the context window (Token Limits).

runtime.orchestrator (Pipeline and Parallel Scheduling Center)

Nexa's most hardcore part. It controls the precise blocking and waking of >> pipeline flow, manages concurrent pulling of multiple models under join() actions, and implements the event loop that suspends state to yield to human approval.

runtime.evaluator (Defensive Loop Engine and Strong Validation Layer)

Makes natural language judgments on exit thresholds for loop until, performs implicit retries with reasons for malformed outputs that don't conform to protocol format. This Evaluator can wrap system errors as Context and forcefully throw them back to the large model, demanding it to recognize and correct mistakes!

runtime.rbac (Permission Access Control)

Defines security roles for different Agents or flows, ensuring the principle of least privilege for tool calls.


📊 Performance Comparison

Metric Python Transpiler Rust AVM
Compilation Time ~100ms ~5ms
Startup Time ~500ms ~10ms
Memory Footprint ~100MB ~10MB
Concurrent Agents ~100 ~10000

🔧 Debugging and Transparency

Reject Black Box Frameworks, Embrace Transparency

We deeply understand the extreme pain of troubleshooting deadlocks inside platforms like AutoGPT. Therefore, when you use the pure build mode nexa build xxx.nx, Nexa doesn't hide its transpiler logic. You can see readable source code named out_xxx.py in the same-level directory. You can debug it like a normal Python backend server, directly setting hard breakpoints on this transpiled generated code for single-step debugging! This highly transparent design has won unanimous praise from geeks.

Debug Commands

# Generate Python code
nexa build script.nx

# View generated bytecode (AVM mode)
nexa build script.nx --bytecode

# Runtime debug mode
nexa run script.nx --debug

# Performance profiling
nexa run script.nx --profile

📝 Chapter Summary

In this chapter, we gained deep understanding of:

  1. Compilation Pipeline: The complete process from lexical analysis to code generation
  2. AVM Architecture: High-performance execution engine written in Rust
  3. WASM Sandbox: Four-level permission model and resource limits
  4. Smart Scheduler: Priority queue and load balancing
  5. Vector Virtual Memory: LRU/LFU eviction strategies
  6. Runtime Components: memory, orchestrator, evaluator, rbac

快来问问agent吧!

Nexa Agent

Nexa 文档助手

我是Nexa文档AI助手,可以问我有关文档的一切!

由AI Hub提供支持