Train coding models through executable verification.

Code that runs, not code that sounds right.

The planned model learns from compiler results, unit tests, patch application, runtime errors, repository checks, and benchmark harnesses.

Targeting real coding evaluations

The benchmark plan covers LiveCodeBench, HumanEval+, MBPP+, MultiPL-E, Aider editing, RepoQA, SWE-bench Verified, and long-horizon SWE-style tasks.

No benchmark claims before evaluation

The repo has a local verifier demo and a training plan. Public scores come only after the checkpoint, harness, compute, and contamination notes are recorded.

Training stack

The planned path is baseline evaluation, core SFT, edit SFT, multilingual compiler loops, long-repo training, patch search, CodeWorldModel, verifier RL, and one-shot distillation.

benchHumanEval+, MBPP+, LiveCodeBench

repoAider editing, RepoQA, SWE-bench Verified

trainSFT, edit training, compiler feedback, patch search

computecompute plan, measured reports, contamination notes

Roadmap: from verifier demo to trained model.

The first public step is the local verifier demo. The next steps are dataset review, baseline evaluation, small checkpoints, edit training, repository tasks, and measured benchmark reports.

Compute support is described in the funding brief, but the site focuses on the project: training code models with execution feedback and publishing measured results.

contact@avixosec.xyz