The two new Quantization-Aware-Training 4-bit Gemma 4 models
(gemma4-31b-qat dense, gemma4-26b-a4b-qat MoE), self-hosted
and free, over all 9 corpus cases. Three non-leaking prompting arms
(open / plan-first / CWE-checklist) × 2 trials/arm
at temperature 0.5. Multi-file cases (nomad/craft/ghost) audit every baseline file;
a case is detected if ANY file localizes within 10 lines of the planted
hunk (the benchmark's gate). Motivation: at 8-bit the 26B-A4B MoE tied the
field-leading 4/9 detection, so we feel out the QAT-4bit edges as we did Qwen 3.6.
Cell = case-detected trials / completed trials for that arm (a case counts if any of its files hit). green = every trial, amber = some, red = none, — = no data. Hard-miss cases (baseline ≤1) bold.
| case | baseline | open | plan | checklist |
|---|---|---|---|---|
| GHSA-w52v-v783-gw97 CWE-89 · JShit anchor | 17/20 | 0/1 | 1/2 | 1/2 |
| GHSA-j273-m5qq-6825 CWE-22 · Javaguard | 12/20 | 1/2 | 2/2 | 1/1 |
| GHSA-f26g-jm89-4g65 CWE-77 · Rustmedium | 11/19 | 0/2 | 1/2 | 0/2 |
| CVE-2026-7474 CWE-22 · Gomedium-hard | 7/21 | 0/2 | 0/2 | 0/2 |
| GHSA-9f49-8x56-jmjc CWE-416 · Chard | 2/21 | 0/1 | 0/1 | 0/1 |
| GHSA-mpxh-8fq3-x8mh CWE-787 · Chard | 1/17 | 0/2 | 0/1 | 0/1 |
| GHSA-x9h5-r9v2-vcww CWE-122 · Chard | 1/20 | 0/1 | — | — |
| GHSA-cc7p-2j3x-x7xf CWE-863 · PHPhardest | 0/20 | 0/2 | 1/2 | 0/2 |
| CVE-2026-5199 CWE-639 · Gohardest | 0/20 | 2/2 | 0/2 | 0/2 |
| case | baseline | open | plan | checklist |
|---|---|---|---|---|
| GHSA-w52v-v783-gw97 CWE-89 · JShit anchor | 17/20 | 2/2 | 2/2 | 2/2 |
| GHSA-j273-m5qq-6825 CWE-22 · Javaguard | 12/20 | 2/2 | 2/2 | 2/2 |
| GHSA-f26g-jm89-4g65 CWE-77 · Rustmedium | 11/19 | 2/2 | 0/2 | 0/2 |
| CVE-2026-7474 CWE-22 · Gomedium-hard | 7/21 | 1/2 | 0/2 | 1/2 |
| GHSA-9f49-8x56-jmjc CWE-416 · Chard | 2/21 | 0/2 | 0/2 | 0/2 |
| GHSA-mpxh-8fq3-x8mh CWE-787 · Chard | 1/17 | 0/2 | 0/2 | 0/2 |
| GHSA-x9h5-r9v2-vcww CWE-122 · Chard | 1/20 | 0/2 | 0/2 | 0/1 |
| GHSA-cc7p-2j3x-x7xf CWE-863 · PHPhardest | 0/20 | 2/2 | 0/2 | 0/2 |
| CVE-2026-5199 CWE-639 · Gohardest | 0/20 | 1/2 | 1/2 | 0/2 |
Cases each arm detected in at least one trial, split by whether the baseline solved the case (regression guard) or missed it (the real question). Denominators count only cases with a completed run.
| model | arm | solved-cases | hard-miss-cases |
|---|---|---|---|
| gemma4-26b-a4b-qat | open | 1/5 | 1/4 |
| gemma4-26b-a4b-qat | plan | 3/5 | 1/3 |
| gemma4-26b-a4b-qat | checklist | 2/5 | 0/3 |
| gemma4-31b-qat | open | 4/5 | 2/4 |
| gemma4-31b-qat | plan | 2/5 | 1/4 |
| gemma4-31b-qat | checklist | 3/5 | 0/4 |
gemma4-31b-qat (dense, 10.20.30.1) and
gemma4-26b-a4b-qat (MoE, 10.20.30.2), self-hosted llama-server, free,
QAT-4bit weights.seed=-1 randomizes per request. A timeout/infra_error reads as
no-data, never a miss.Generated from nelson-gemma-promptlab.db · 2 models · 9 cases × 3 arms × 2 trials.