Gemma 4 (QAT-4bit) prompt-lab — full 9-case corpus

The two new Quantization-Aware-Training 4-bit Gemma 4 models (gemma4-31b-qat dense, gemma4-26b-a4b-qat MoE), self-hosted and free, over all 9 corpus cases. Three non-leaking prompting arms (open / plan-first / CWE-checklist) × 2 trials/arm at temperature 0.5. Multi-file cases (nomad/craft/ghost) audit every baseline file; a case is detected if ANY file localizes within 10 lines of the planted hunk (the benchmark's gate). Motivation: at 8-bit the 26B-A4B MoE tied the field-leading 4/9 detection, so we feel out the QAT-4bit edges as we did Qwen 3.6.

Headline
gemma4-26b-a4b-qat: best arm solves 3/5 solved-tier cases and cracks 1/4 hard-miss cases.
gemma4-31b-qat: best arm solves 4/5 solved-tier cases and cracks 2/4 hard-miss cases.
(union = detected by at least one trial of that arm; localization-only, no truth-judge yet — confirm the hard-case hits with the judge before claiming them.)

Detection matrix

Cell = case-detected trials / completed trials for that arm (a case counts if any of its files hit). green = every trial, amber = some, red = none, = no data. Hard-miss cases (baseline ≤1) bold.

gemma4-26b-a4b-qat

casebaselineopenplanchecklist
GHSA-w52v-v783-gw97
CWE-89 · JShit anchor
17/200/11/21/2
GHSA-j273-m5qq-6825
CWE-22 · Javaguard
12/201/22/21/1
GHSA-f26g-jm89-4g65
CWE-77 · Rustmedium
11/190/21/20/2
CVE-2026-7474
CWE-22 · Gomedium-hard
7/210/20/20/2
GHSA-9f49-8x56-jmjc
CWE-416 · Chard
2/210/10/10/1
GHSA-mpxh-8fq3-x8mh
CWE-787 · Chard
1/170/20/10/1
GHSA-x9h5-r9v2-vcww
CWE-122 · Chard
1/200/1
GHSA-cc7p-2j3x-x7xf
CWE-863 · PHPhardest
0/200/21/20/2
CVE-2026-5199
CWE-639 · Gohardest
0/202/20/20/2

gemma4-31b-qat

casebaselineopenplanchecklist
GHSA-w52v-v783-gw97
CWE-89 · JShit anchor
17/202/22/22/2
GHSA-j273-m5qq-6825
CWE-22 · Javaguard
12/202/22/22/2
GHSA-f26g-jm89-4g65
CWE-77 · Rustmedium
11/192/20/20/2
CVE-2026-7474
CWE-22 · Gomedium-hard
7/211/20/21/2
GHSA-9f49-8x56-jmjc
CWE-416 · Chard
2/210/20/20/2
GHSA-mpxh-8fq3-x8mh
CWE-787 · Chard
1/170/20/20/2
GHSA-x9h5-r9v2-vcww
CWE-122 · Chard
1/200/20/20/1
GHSA-cc7p-2j3x-x7xf
CWE-863 · PHPhardest
0/202/20/20/2
CVE-2026-5199
CWE-639 · Gohardest
0/201/21/20/2

Arm summary (union over 2 trials)

Cases each arm detected in at least one trial, split by whether the baseline solved the case (regression guard) or missed it (the real question). Denominators count only cases with a completed run.

modelarmsolved-caseshard-miss-cases
gemma4-26b-a4b-qatopen1/51/4
gemma4-26b-a4b-qatplan3/51/3
gemma4-26b-a4b-qatchecklist2/50/3
gemma4-31b-qatopen4/52/4
gemma4-31b-qatplan2/51/4
gemma4-31b-qatchecklist3/50/4

Method & caveats

Generated from nelson-gemma-promptlab.db · 2 models · 9 cases × 3 arms × 2 trials.