CLI-first Agent-ready Tree-sitter powered

Tree-sitter testing oracle for AI code.

A CLI harness for mutation testing, property testing, fuzzing, coverage feedback, behavior replay, and evolutionary analysis across Rust, Go, Python, TypeScript/JavaScript, and future Tree-sitter language plugins.

Rust today Go today Python today TypeScript + JavaScript today Tree-sitter symbols Language plugin API Budgets, timeouts, CI policy
scan symbols mutate paths score confidence
Install
curl -fsSL https://github.com/Jacobious52/veritas/releases/latest/download/install.sh | sh
Paste This For Your Agent
Install and use Veritas for this repo.

Run:
curl -fsSL https://github.com/Jacobious52/veritas/releases/latest/download/install.sh | sh
veritas review-ai
veritas verify --changed --profile ci
veritas score
veritas repair-prompt

Use .veritas/ feedback, replay results, mutation survivors, and the evolution suite to improve tests before marking the change done.
Example report changed branch
72confidence: needs stronger tests
  • mutation survived refund_cents <= available_cents -> refund_cents < available_cents
  • fuzz seed saved " 12.34 " reproduced parser drift
  • replay drift AuthorizeRefund("support", 500) changed from allowed to denied
  • agent next step promote assertion candidate, rerun, keep only if the mutant dies
veritas verify --changed --profile ci
veritas evolve --dry-run
veritas repair-prompt

Tree-sitter language plugins, one verification core

Veritas keeps every language from reinventing the testing dashboard. Tree-sitter gives plugins stable symbols, line ranges, methods, and risk surfaces; the core turns those into the same reports, confidence score, AI feedback, and CI policy.

Rust
Workspace detection, public symbol discovery, proptest generation, cargo test, optional llvm-cov, AST-scoped mutation checks, replay, and generated regression artifacts.
Go
Module and package graph discovery, scoped package tests, fuzz harnesses, coverage, mutation campaigns, reverse dependency selection, corpus persistence, and replay.
Python
Tree-sitter symbol discovery, unittest execution, coverage.py integration when enabled, simple executable mutation checks, multi-argument replay, and the same artifact contract.
TypeScript/JavaScript
Package/config/source detection, Tree-sitter TypeScript/TSX/JavaScript symbols, class methods, arrow functions, executable Bun properties, replay, and mutation checks.
Plugin contract
Language plugins own discovery, generated artifacts, command budgets, and mutation operators while the core owns scoring, baselines, replay manifests, and AI repair prompts.
Shared artifacts
.veritas/report.json, SARIF, JUnit, Markdown, assertion candidates, replay results, mutation campaigns, corpus entries, evolution suites, and repair prompts.

Examples that scale with the risk

Start with one boundary. Keep going until the report has a mutation campaign, fuzz corpus, replay manifest, and selected evolution suite your AI agent can act on. Rust, Go, Python, and TypeScript/JavaScript use plugins today; the core model is built around tree-sitter-backed language plugins so additional languages can share the same reports and AI loop.

Real Evolution Demo
cargo run -p veritas-cli -- --root examples/go-evolution-loop verify --lang go --target .
cargo run -p veritas-cli -- --root examples/go-evolution-loop score
cargo run -p veritas-cli -- --root examples/go-evolution-loop evolve --dry-run

Before

  • 58% mutation score
  • 4 surviving parser mutants
  • 4 assertion candidates
  • 55 confidence score

Selected Candidate

  • ParseInvoiceTotal
  • 95% fitness
  • Add the smallest assertion that kills the mutant
  • Artifacts: go_suite.json, go_campaign.json, assertions/go_*.json

After

  • 91% mutation score
  • 0 surviving mutants
  • 16 replay cases retained
  • 98 confidence score
value, ok := ParseInvoiceTotal("1000000")
assert ok && value == 1000000

value, ok = ParseInvoiceTotal("not-a-number")
assert !ok && value == 0

value, ok = ParseInvoiceTotal("1000001")
assert !ok && value == 0
1. Boundary mutantSmall function, high consequence.
money / permissions
if refund_cents <= available_cents {
    approve_refund()
}
if refund_cents < available_cents {
    approve_refund()
}
mutation survived

Veritas records the survivor, writes an assertion candidate, persists a repro note, and ranks a test that should kill the mutant.

veritas verify --changed --profile ci
veritas promote-regression --index 0
2. Properties and fuzzingInputs are messy, generated code is optimistic.

Invoice parser an AI just "simplified"

func ParseTotal(raw string) int {
    cents, _ := strconv.Atoi(raw)
    return cents
}

A generated fuzz target feeds `""`, `" 1200 "`, `"12.34"`, and large values. The parser panic or bad normalization becomes a replayable seed instead of a one-off CI surprise.

property test fuzz harness corpus entry

What Veritas writes

FuzzParseTotal
seed: " 12.34 "
expected: no panic, stable cents

.veritas/corpus/go_0.json
.veritas/repros/go_0.md

The next run can replay the same seed instead of rediscovering it by luck.

3. Behavior replayReview old/new semantics, not just signatures.

Permission API before an AI refactor

AuthorizeRefund("admin", 5000)  => true
AuthorizeRefund("support", 500) => true
AuthorizeRefund("guest", 500)   => false

Veritas turns public API behavior into replay cases before the change, then makes drift explicit after the change.

differential replay public API old/new behavior

Replay manifest

{
  "name": "permission_boundary",
  "inputs": ["admin", "support", "guest"],
  "assertion": "privileged, delegated, denied"
}

Replay results become assertion candidates when behavior drift matters.

4. Evolution suiteLet the report choose the next best tests.

A real next-test queue

survivor: ApplyServiceFeeCents + -> -
survivor: CapDiscountPercent > -> >=
seed: " 12.34 "
replay: permission_boundary

Veritas ranks what to try first, so the agent does not spray random tests across the repo.

candidates8
selected6
fitness73%
top actionkill mutant

Selected candidate

{
  "target_id": "go:score.go:ApplyServiceFeeCents",
  "strategy": "add_assertion",
  "status": "selected",
  "fitness": { "score_percent": 95 },
  "keep_if": "mutant moves from lived to killed"
}
5. AI verification loopThe agent gets a work queue, not a vague warning.

Agent prompt becomes concrete

veritas review-ai
veritas verify --changed --profile ci
veritas score

The agent reads `.veritas/ai/agent_feedback.md` and `.veritas/evolution/go_suite.json`, promotes selected candidates into tests, reruns Veritas, and keeps only improvements.

Snapshot after verification

Confidence: medium
Positive: 6 selected evolutionary candidates
Risk: 2 surviving mutants
Next: add assertions for:
- ApplyServiceFeeCents fee direction
- CapDiscountPercent boundary at 100

The testing hub for AI development

Rust and Go are the deepest production paths today, with Python and TypeScript/JavaScript now on the same plugin path. Veritas discovers symbols with tree-sitter and runs each language through a plugin, so new languages can share the same mutation campaign, replay, corpus, evolution, and confidence model without rewriting the core.

Mutation testing
Domain-aware mutants for auth, money, parsing, serialization, permissions, booleans, comparisons, returns, and error paths.
Property testing
Generated and discovered properties are scored for assertion strength, determinism, and no-panic coverage.
Fuzzing
Go fuzz targets and Rust repro paths feed persistent corpus entries and replay commands.
Coverage feedback
Coverage gaps become agent-readable prompts for the next test-generation pass.
Differential replay
Public API behavior cases make old/new semantic drift reviewable before acceptance.
Evolution suites
Ranked next-generation candidates tell agents which assertions, seeds, replay checks, or budget fixes to try first.
Confidence score
veritas score rolls findings, survivors, corpus replay, properties, budgets, and generated tests into one readiness signal.

Built toward serious repos, not toy demos

The goal is a verification hub that an AI agent can use safely on a large monorepo: verify the risky changed surface first, keep commands inside budgets, explain what happened, and leave durable artifacts for CI and reviewers.

Today

  • Changed-target selection: git hunks map to Tree-sitter symbol ranges when possible.
  • Scoped execution: Go package graphs and reverse dependencies keep tests focused; Rust workspaces scan package sources.
  • Safety budgets: command timeouts, fuzz caps, mutation caps, CI profiles, policy gates, and optional Rust systemd CPU/memory limits.
  • Telemetry: every report tracks discovery, generation, test, coverage, replay, synthesis, and total phase timings.
  • Confidence harness: richer fixtures, seeded mutation examples, benchmark suites, and external canaries keep the tool honest.

Performance goals

  • Scale horizontally: plugin-safe schedulers for concurrent fuzz, mutation, replay, and package work.
  • Spend budget where risk is highest: adaptive mutation sampling from changed code, semantic domains, and historical survivor data.
  • Cache what is stable: reusable symbol graphs, package graphs, baselines, corpus entries, and coverage summaries.
  • Make agents accountable: repair prompts should name the next test, the proof command, and the artifact that improved.
  • Stay generic: every scale feature should land in the plugin contract so Rust, Go, Python, TypeScript/JavaScript, and future Tree-sitter languages benefit together.
scopechanged first
limitsbudgeted
reportstimed
futureparallel