Tree-sitter testing oracle for AI code.
A CLI harness for mutation testing, property testing, fuzzing, coverage feedback, behavior replay, and evolutionary analysis across Rust, Go, Python, TypeScript/JavaScript, and future Tree-sitter language plugins.
curl -fsSL https://github.com/Jacobious52/veritas/releases/latest/download/install.sh | sh
Install and use Veritas for this repo.
Run:
curl -fsSL https://github.com/Jacobious52/veritas/releases/latest/download/install.sh | sh
veritas review-ai
veritas verify --changed --profile ci
veritas score
veritas repair-prompt
Use .veritas/ feedback, replay results, mutation survivors, and the evolution suite to improve tests before marking the change done.
- mutation survived refund_cents <= available_cents -> refund_cents < available_cents
- fuzz seed saved " 12.34 " reproduced parser drift
- replay drift AuthorizeRefund("support", 500) changed from allowed to denied
- agent next step promote assertion candidate, rerun, keep only if the mutant dies
veritas verify --changed --profile ci
veritas evolve --dry-run
veritas repair-prompt
Tree-sitter language plugins, one verification core
Veritas keeps every language from reinventing the testing dashboard. Tree-sitter gives plugins stable symbols, line ranges, methods, and risk surfaces; the core turns those into the same reports, confidence score, AI feedback, and CI policy.
Workspace detection, public symbol discovery, proptest generation, cargo test, optional llvm-cov, AST-scoped mutation checks, replay, and generated regression artifacts.
Module and package graph discovery, scoped package tests, fuzz harnesses, coverage, mutation campaigns, reverse dependency selection, corpus persistence, and replay.
Tree-sitter symbol discovery, unittest execution, coverage.py integration when enabled, simple executable mutation checks, multi-argument replay, and the same artifact contract.
Package/config/source detection, Tree-sitter TypeScript/TSX/JavaScript symbols, class methods, arrow functions, executable Bun properties, replay, and mutation checks.
Language plugins own discovery, generated artifacts, command budgets, and mutation operators while the core owns scoring, baselines, replay manifests, and AI repair prompts.
.veritas/report.json, SARIF, JUnit, Markdown, assertion candidates, replay results, mutation campaigns, corpus entries, evolution suites, and repair prompts.Examples that scale with the risk
Start with one boundary. Keep going until the report has a mutation campaign, fuzz corpus, replay manifest, and selected evolution suite your AI agent can act on. Rust, Go, Python, and TypeScript/JavaScript use plugins today; the core model is built around tree-sitter-backed language plugins so additional languages can share the same reports and AI loop.
cargo run -p veritas-cli -- --root examples/go-evolution-loop verify --lang go --target .
cargo run -p veritas-cli -- --root examples/go-evolution-loop score
cargo run -p veritas-cli -- --root examples/go-evolution-loop evolve --dry-run
Before
- 58% mutation score
- 4 surviving parser mutants
- 4 assertion candidates
- 55 confidence score
Selected Candidate
- ParseInvoiceTotal
- 95% fitness
- Add the smallest assertion that kills the mutant
- Artifacts:
go_suite.json,go_campaign.json,assertions/go_*.json
After
- 91% mutation score
- 0 surviving mutants
- 16 replay cases retained
- 98 confidence score
value, ok := ParseInvoiceTotal("1000000")
assert ok && value == 1000000
value, ok = ParseInvoiceTotal("not-a-number")
assert !ok && value == 0
value, ok = ParseInvoiceTotal("1000001")
assert !ok && value == 0
if refund_cents <= available_cents {
approve_refund()
}
if refund_cents < available_cents {
approve_refund()
}
Veritas records the survivor, writes an assertion candidate, persists a repro note, and ranks a test that should kill the mutant.
veritas verify --changed --profile ci
veritas promote-regression --index 0
Invoice parser an AI just "simplified"
func ParseTotal(raw string) int {
cents, _ := strconv.Atoi(raw)
return cents
}
A generated fuzz target feeds `""`, `" 1200 "`, `"12.34"`, and large values. The parser panic or bad normalization becomes a replayable seed instead of a one-off CI surprise.
What Veritas writes
FuzzParseTotal
seed: " 12.34 "
expected: no panic, stable cents
.veritas/corpus/go_0.json
.veritas/repros/go_0.md
The next run can replay the same seed instead of rediscovering it by luck.
Permission API before an AI refactor
AuthorizeRefund("admin", 5000) => true
AuthorizeRefund("support", 500) => true
AuthorizeRefund("guest", 500) => false
Veritas turns public API behavior into replay cases before the change, then makes drift explicit after the change.
Replay manifest
{
"name": "permission_boundary",
"inputs": ["admin", "support", "guest"],
"assertion": "privileged, delegated, denied"
}
Replay results become assertion candidates when behavior drift matters.
A real next-test queue
survivor: ApplyServiceFeeCents + -> -
survivor: CapDiscountPercent > -> >=
seed: " 12.34 "
replay: permission_boundary
Veritas ranks what to try first, so the agent does not spray random tests across the repo.
Selected candidate
{
"target_id": "go:score.go:ApplyServiceFeeCents",
"strategy": "add_assertion",
"status": "selected",
"fitness": { "score_percent": 95 },
"keep_if": "mutant moves from lived to killed"
}
Agent prompt becomes concrete
veritas review-ai
veritas verify --changed --profile ci
veritas score
The agent reads `.veritas/ai/agent_feedback.md` and `.veritas/evolution/go_suite.json`, promotes selected candidates into tests, reruns Veritas, and keeps only improvements.
Snapshot after verification
Confidence: medium
Positive: 6 selected evolutionary candidates
Risk: 2 surviving mutants
Next: add assertions for:
- ApplyServiceFeeCents fee direction
- CapDiscountPercent boundary at 100
The testing hub for AI development
Rust and Go are the deepest production paths today, with Python and TypeScript/JavaScript now on the same plugin path. Veritas discovers symbols with tree-sitter and runs each language through a plugin, so new languages can share the same mutation campaign, replay, corpus, evolution, and confidence model without rewriting the core.
Domain-aware mutants for auth, money, parsing, serialization, permissions, booleans, comparisons, returns, and error paths.
Generated and discovered properties are scored for assertion strength, determinism, and no-panic coverage.
Go fuzz targets and Rust repro paths feed persistent corpus entries and replay commands.
Coverage gaps become agent-readable prompts for the next test-generation pass.
Public API behavior cases make old/new semantic drift reviewable before acceptance.
Ranked next-generation candidates tell agents which assertions, seeds, replay checks, or budget fixes to try first.
veritas score rolls findings, survivors, corpus replay, properties, budgets, and generated tests into one readiness signal.Built toward serious repos, not toy demos
The goal is a verification hub that an AI agent can use safely on a large monorepo: verify the risky changed surface first, keep commands inside budgets, explain what happened, and leave durable artifacts for CI and reviewers.
Today
- Changed-target selection: git hunks map to Tree-sitter symbol ranges when possible.
- Scoped execution: Go package graphs and reverse dependencies keep tests focused; Rust workspaces scan package sources.
- Safety budgets: command timeouts, fuzz caps, mutation caps, CI profiles, policy gates, and optional Rust systemd CPU/memory limits.
- Telemetry: every report tracks discovery, generation, test, coverage, replay, synthesis, and total phase timings.
- Confidence harness: richer fixtures, seeded mutation examples, benchmark suites, and external canaries keep the tool honest.
Performance goals
- Scale horizontally: plugin-safe schedulers for concurrent fuzz, mutation, replay, and package work.
- Spend budget where risk is highest: adaptive mutation sampling from changed code, semantic domains, and historical survivor data.
- Cache what is stable: reusable symbol graphs, package graphs, baselines, corpus entries, and coverage summaries.
- Make agents accountable: repair prompts should name the next test, the proof command, and the artifact that improved.
- Stay generic: every scale feature should land in the plugin contract so Rust, Go, Python, TypeScript/JavaScript, and future Tree-sitter languages benefit together.