Pipeline Architecture¶

Quorvex AI supports six pipeline types, each designed for a specific workflow. This document explains why they exist, how they differ, and when to choose each one.

Why Multiple Pipelines¶

A single pipeline cannot serve all use cases well. Converting a markdown spec into a Playwright test requires different stages than extracting features from a PDF or autonomously exploring an unknown application. Rather than overloading one pipeline with conditional branches, the system provides focused pipelines that share common infrastructure (browser pool, memory system, agent runner) but differ in their stage composition.

flowchart LR
    INPUT["Input Type"] --> DECISION{What are you doing?}

    DECISION -->|"Spec to test"| NATIVE["Pipeline"]
    DECISION -->|"Spec to test<br/>(complex healing)"| HYBRID["Hybrid Pipeline"]
    DECISION -->|"PDF to tests"| PRD["PRD Pipeline"]
    DECISION -->|"Discover app"| EXPLORE["Exploration Pipeline"]
    DECISION -->|"Custom script"| SKILL["Skill Pipeline"]
    DECISION -->|"Legacy"| STANDARD["Standard Pipeline"]

Pipeline Decision Guide¶

Situation	Pipeline	Flag
You have a markdown spec and want a test	Default	(default)
Tests keep failing after 3 healing attempts	Hybrid	`--hybrid`
You have a PRD (PDF) and want tests for each feature	PRD	`--prd`
You want to discover what an app does	Exploration	`--explore URL`
You need network interception or multi-tab	Skill	`--skill-mode`
You need the original 4-stage pipeline	Standard	`--standard-pipeline`

The Subprocess Execution Model¶

All pipeline stages run as separate subprocesses. This is the most important architectural decision in the pipeline engine.

graph TB
    CLI["cli.py<br/>(main process)"]
    CLI -->|"subprocess.run()"| P["Planner<br/>process"]
    CLI -->|"subprocess.run()"| G["Generator<br/>process"]
    CLI -->|"subprocess.run()"| H["Healer<br/>process"]

    P -->|"init + cleanup"| SDK1["Claude Agent SDK"]
    G -->|"init + cleanup"| SDK2["Claude Agent SDK"]
    H -->|"init + cleanup"| SDK3["Claude Agent SDK"]

Why subprocesses? The Claude Agent SDK uses async cancel scopes internally. When a scope is cancelled during cleanup, it can silently discard the accumulated result text -- the exact output you need from the stage. By running each stage in its own process, SDK cleanup errors are fully isolated. If the generator's cleanup fails, the planner's output is safe on disk and the healer can still start fresh.

Trade-offs: - Process spawning adds ~1-2 seconds of overhead per stage - Communication happens through file artifacts (plan.json, .spec.ts, status.txt) and exit codes - Each process initializes its own SDK instance, consuming additional memory - Debugging requires checking per-stage logs rather than a single process

The reliability benefit outweighs these costs. A pipeline that occasionally loses its output is useless; a pipeline that is slightly slower but always produces results is production-ready.

Pipeline (Default)¶

The native pipeline uses a live browser at every stage, making it the most reliable option for test generation.

sequenceDiagram
    participant CLI as cli.py
    participant SC as Smart Check
    participant NP as Planner
    participant NG as Generator
    participant PW as Playwright Runner
    participant NH as Healer

    CLI->>SC: Check for existing test code
    alt Existing code passes
        SC-->>CLI: Exit success (skip pipeline)
    end

    CLI->>NP: Stage 1 - Plan
    NP->>NP: Browse app via MCP, identify selectors
    NP-->>CLI: plan.json

    CLI->>NG: Stage 2 - Generate
    NG->>NG: Execute steps in live browser
    NG-->>CLI: tests/generated/*.spec.ts

    CLI->>PW: Stage 3 - Execute
    alt Pass
        PW-->>CLI: Exit success
    else Fail
        CLI->>NH: Stage 4 - Heal (up to 3 attempts)
        NH->>NH: test_debug MCP + browser_snapshot
        NH-->>CLI: Fixed code
    end

Smart Check (Stage 0)¶

Before running the full pipeline, the CLI looks for existing generated code. If a test file already exists and passes, the pipeline exits immediately. If it exists but fails, the system enters healing-only mode. Full regeneration only happens when no code exists or healing fails.

This optimization matters in CI/CD where specs are re-run frequently. A spec that was generated yesterday does not need regeneration today -- it just needs validation.

Why Browser at Every Stage¶

The planner browses the target application to discover actual selectors, page structure, and navigation flows. The generator executes each test step in a live browser, capturing the selectors that actually work. The healer uses test_debug to pause at the failure point and inspect the real page state.

This "browser-first" approach produces more reliable tests than text-only planning because it works with what the application actually renders, not what the spec describes.

Hybrid Healing¶

When the native healer's 3 attempts are not enough, hybrid mode escalates to Ralph Validator for deeper, architectural fixes.

Aspect	Healer	Ralph Validator
Max attempts	3	Up to 17 more (20 total)
Context	Fresh each attempt	Persistent conversation history
Fix scope	Selector, timing, assertion	May rewrite test structure
Completion signal	Test passes	`TESTS_PASSING` token

Why two healers? The native healer is fast and targeted -- it fixes obvious selector mismatches and timing issues. But some failures require understanding the broader test context: a login flow that changed its redirect behavior, or a form that now uses a multi-step wizard. Ralph maintains conversation history across attempts, building a deeper understanding of the failure pattern.

The cost is time. Hybrid mode can run for 20 iterations, each involving a full test execution. Use it when the native healer fails on a spec you know should work.

PRD Pipeline¶

Converts a PDF Product Requirements Document into multiple test specs and generated tests.

flowchart TB
    PDF["PDF PRD"] --> EXTRACT["Extract text"]
    EXTRACT --> FEATURES["Identify features<br/>(target ~15, max 25)"]
    FEATURES --> CHUNK["Chunk content<br/>(~1500 tokens each)"]
    CHUNK --> STORE["Store in ChromaDB"]

    STORE --> LOOP

    subgraph LOOP["Per Feature"]
        PLAN["Planner queries<br/>PRD chunks via RAG"]
        PLAN --> GEN["Generate test"]
        GEN --> HEAL["Heal if needed"]
    end

Why RAG over full context? PRDs can be 50+ pages. Passing the entire document to each pipeline stage would exceed context limits and dilute focus. By chunking the PRD and storing it in ChromaDB, each feature's planner retrieves only the relevant sections via semantic search. This keeps the AI focused on the specific feature being tested.

Exploration Pipeline¶

Autonomous AI-powered application discovery. The agent explores an app to catalog pages, flows, API endpoints, and form behaviors.

Why exploration? Many teams adopt Quorvex AI for applications that lack documentation. Rather than requiring users to write comprehensive specs from scratch, the exploration pipeline discovers what the application does, then feeds those discoveries into requirements generation and RTM creation.

Exploration results flow downstream:

flowchart LR
    EXP["Exploration<br/>Session"] --> REQ["Requirements<br/>Generator"]
    REQ --> REQS["Structured<br/>Requirements"]
    REQS --> RTM["RTM<br/>Generator"]
    SPECS["Existing<br/>Specs"] --> RTM
    RTM --> MATRIX["Traceability<br/>Matrix"]

The exploration agent uses configurable strategies (goal-directed, breadth-first, depth-first) and respects interaction limits to prevent runaway sessions.

Skill Pipeline¶

For scenarios beyond what MCP tools support: network interception, multi-tab coordination, custom retry logic, performance measurement.

Skill scripts are plain JavaScript files using the Playwright API directly. The skill runner manages browser launch, timeout enforcement, and result capture. This provides an escape hatch when the declarative spec format cannot express a test requirement.

See the decision matrix in .claude/skills/playwright/SKILL.md for when to use skill mode versus a standard spec.

Web Dashboard Integration¶

When pipelines run via the dashboard, the flow adds browser pool management and process tracking:

sequenceDiagram
    participant FE as Dashboard
    participant API as FastAPI
    participant BP as BrowserPool
    participant PM as ProcessManager
    participant CLI as Pipeline

    FE->>API: POST /specs/{id}/run
    API->>BP: Acquire browser slot
    BP-->>API: Slot granted
    API->>PM: Spawn pipeline process
    PM->>CLI: subprocess.Popen(cli.py ...)

    loop Status polling
        FE->>API: GET /runs/{id}
        API-->>FE: Progress + status
    end

    CLI-->>PM: Process exits
    PM->>BP: Release slot
    PM->>API: Update DB

The ProcessManager tracks running processes and supports cancellation. ProgressReporter writes progress events that the API polls, giving the dashboard real-time feedback without WebSocket complexity.

System Overview -- How pipelines fit into the larger architecture
Browser Pool -- Resource management for concurrent pipeline runs
Infrastructure -- How pipelines run in Docker and Kubernetes
Getting Started -- Run your first pipeline
Pipeline Modes -- Practical guide for choosing a pipeline