29 competitors — 63 case hits across 252 audited cases
| # | Competitor | Size | Detect | det+½ | Hits/Elig | Partial | Precision | FP/case | Other real | Cost/case | Latency | Tokens/case | Cases |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | gpt-5.5-pro† | large | 50%† | 50% | 2/4 | — | 100% | 0.00 | 2 | $22.82 | 576s | 581k | 4 |
| 2 | mimo-v2.5-pro ★ | large | 44% | 50% | 4/9 | 1 | 100% | 0.00 | 5 | $0.08 | 475s | 397k | 9 |
| 3 | gpt-5.5 ★ | large | 44% | 44% | 4/9 | — | 100% | 0.00 | 4 | $1.12 | 191s | 766k | 9 |
| 4 | opus-4.8 ★ | large | 44% | 44% | 4/9 | — | 91% | 0.11 | 6 | $0.73 | 137s | 501k | 9 |
| 5 | gemini-3.5-flash | medium | 44% | 44% | 4/9 | — | 78% | 0.22 | 3 | $0.68 | 181s | 381k | 9 |
| 6 | deepseek-v4 (alias) ★ | large | 44% | 44% | 4/9 | — | 75% | 0.22 | 2 | $0.10 | 91s | 623k | 9 |
| 7 | gemma4-26b-a4b† ★ | small | 43%† | 43% | 3/7 | — | 100% | 0.00 | — | $0.00 | 638s | 329k | 7 |
| 8 | qwen3.7-max† | large | 38%† | 44% | 3/8 | 1 | 100% | 0.00 | 5 | $0.32 | 447s | 332k | 8 |
| 9 | qwen3.6-27b† | small | 38%† | 38% | 3/8 | — | 67% | 0.38 | 3 | $0.00 | 1278s | 733k | 8 |
| 10 | minimax-m3 | large | 33% | 33% | 3/9 | — | 86% | 0.11 | 2 | $0.23 | 488s | 718k | 9 |
| 11 | glm-5.2 | large | 33% | 39% | 3/9 | 1 | 75% | 0.22 | 3 | $0.49 | 305s | 329k | 9 |
| 12 | gemini-3.1-pro-preview | large | 33% | 44% | 3/9 | 2 | 59% | 1.00 | 9 | $1.45 | 242s | 334k | 9 |
| 13 | hy3-preview | large | 25% | 31% | 2/8 | 1 | 69% | 0.44 | 7 | $0.02 | 386s | 167k | 9 |
| 14 | haiku-4.5 | small | 25% | 25% | 2/8 | — | 53% | 0.78 | 6 | $0.35 | 201s | 1.6M | 9 |
| 15 | nemotron-3-nano-omni | small | 22% | 22% | 2/9 | — | 100% | 0.00 | 1 | $0.00 | 311s | 64k | 9 |
| 16 | nex-n2-pro | large | 22% | 22% | 2/9 | — | 100% | 0.00 | 1 | $0.00 | 207s | 914k | 9 |
| 17 | sonnet-4.6 | medium | 22% | 22% | 2/9 | — | 80% | 0.33 | 10 | $0.45 | 207s | 324k | 9 |
| 18 | north-mini-code | small | 22% | 22% | 2/9 | — | 67% | 0.22 | 1 | $0.00 | 593s | 579k | 9 |
| 19 | gemma4-31b | small | 22% | 33% | 2/9 | 2 | 60% | 0.44 | 4 | $0.00 | 1390s | 233k | 9 |
| 20 | glm-5.1 | large | 22% | 22% | 2/9 | — | 54% | 0.67 | 5 | $0.55 | 733s | 654k | 9 |
| 21 | nemotron-3-super-120b | small | 22% | 22% | 2/9 | — | 50% | 0.44 | 2 | $0.04 | 491s | 371k | 9 |
| 22 | laguna-xs.2 | small | 11% | 11% | 1/9 | — | 100% | 0.00 | — | $0.09 | 380s | 896k | 9 |
| 23 | kimi-k2.6 | large | 11% | 11% | 1/9 | — | 80% | 0.11 | 3 | $0.35 | 928s | 447k | 9 |
| 24 | owl-alpha | large | 11% | 22% | 1/9 | 2 | 40% | 0.67 | 3 | $0.00 | 406s | 611k | 9 |
| 25 | nemotron-3-ultra | medium | 11% | 11% | 1/9 | — | 33% | 0.89 | 3 | $0.35 | 1846s | 666k | 9 |
| 26 | kimi-k2.7-code | large | 11% | 17% | 1/9 | 1 | 33% | 0.67 | 2 | $0.51 | 438s | 605k | 9 |
| 27 | laguna-m.1 | medium | 0% | 11% | 0/9 | 2 | 9% | 1.11 | 1 | $0.00 | 465s | 858k | 9 |
| 28 | mistral-medium ★ | medium | 0% | 0% | 0/9 | — | — | 0.00 | — | $0.00 | 63s | 253k | 9 |
| 29 | vibethinker-3b | small | 0% | 0% | 0/9 | — | — | 0.00 | — | $0.00 | 65s | 3k | 9 |
Detect = case hits / eligible (hits + partials + genuine misses); undetermined, refused, and auth/infra-excluded cases are not in the denominator. Partial = cases localized to the right spot but judged a different bug — right place, wrong bug. It is an eligible non-hit (it sits in the denominator where it would otherwise be a miss), so it never moves Detect or the ranking; det+½ (= (hits + 0.5·partials) / eligible) shows its half-credit value informationally. Precision = true findings / (true + false positives). Other real = confirmed real bugs the model found that are not the planted target CVE (extra capability, but not counted as detection). Cost/latency are the competitor's own spend per audited case. ★ = on a Pareto frontier below.
† Partial coverage: this competitor completed fewer than the full 9 cases (see the Cases column). Its detection rate is therefore based on fewer audited cases and is not directly rank-comparable with full-corpus competitors — read it alongside the Cases count, not the rank.
Quality = detection rate x precision (precision treated as 1.0 when a competitor reported no scorable findings). Green points are non-dominated — no other competitor is at least as good on quality while also cheaper/faster. Size is shown in the table; it is categorical, so it is not used as a numeric Pareto axis.
These charts omit gpt-5.5-pro (4/9): a competitor that audited fewer than 75% of the 9 cases measures its quality over a smaller, self-selected subset, so the point is not comparable with the full-corpus competitors — and a cost-capped probe sits so far out on the cost axis that every other competitor collapses into one indistinguishable cluster, making the trade-off unreadable. Its position would also imply a quality ranking the partial run does not establish. It remains in the leaderboard table above (marked †).
Mean total tokens (prompt + completion, with the ReAct loop's resent context counted each turn) per audited case; the trailing number is mean latency/case. Bars are linear, so brute-force models dwarf frugal ones. Data-quality caveat: these are the tokens the provider's API reported — some OpenAI-compatible endpoints under-report usage (a near-zero bar with input ≈ output is the tell), so a suspiciously short bar may mean broken metering rather than a frugal model, and that competitor's cost/case is then an underestimate.
| Competitor | CVE-2026-5199 | CVE-2026-7474 | GHSA-9f49-8x56-jmjc | GHSA-cc7p-2j3x-x7xf | GHSA-f26g-jm89-4g65 | GHSA-j273-m5qq-6825 | GHSA-mpxh-8fq3-x8mh | GHSA-w52v-v783-gw97 | GHSA-x9h5-r9v2-vcww |
|---|---|---|---|---|---|---|---|---|---|
| gpt-5.5-pro | miss | HIT | HIT | excl | excl | excl | excl | excl | miss |
| mimo-v2.5-pro | miss | HIT | miss | miss | HIT | part | HIT | HIT | miss |
| gpt-5.5 | miss | HIT | miss | miss | HIT | HIT | miss | HIT | miss |
| opus-4.8 | miss | miss | miss | miss | HIT | HIT | miss | HIT | HIT |
| gemini-3.5-flash | miss | HIT | miss | miss | HIT | HIT | miss | HIT | miss |
| deepseek-v4 (alias) | miss | HIT | miss | miss | HIT | HIT | miss | HIT | miss |
| gemma4-26b-a4b | excl | miss | HIT | miss | HIT | miss | miss | HIT | excl |
| qwen3.7-max | miss | HIT | miss | miss | HIT | part | excl | HIT | miss |
| qwen3.6-27b | miss | miss | miss | miss | HIT | HIT | excl | HIT | miss |
| minimax-m3 | miss | miss | miss | miss | HIT | HIT | miss | HIT | miss |
| glm-5.2 | miss | HIT | miss | miss | HIT | part | miss | HIT | miss |
| gemini-3.1-pro-preview | miss | part | miss | miss | HIT | HIT | part | HIT | miss |
| hy3-preview | miss | miss | miss | miss | HIT | HIT | refu | part | miss |
| haiku-4.5 | miss | miss | miss | miss | jerr | HIT | miss | HIT | miss |
| nemotron-3-nano-omni | miss | miss | miss | miss | HIT | miss | miss | HIT | miss |
| nex-n2-pro | miss | miss | miss | miss | miss | HIT | miss | HIT | miss |
| sonnet-4.6 | miss | miss | miss | miss | miss | HIT | miss | HIT | miss |
| north-mini-code | miss | miss | miss | miss | miss | HIT | miss | HIT | miss |
| gemma4-31b | miss | HIT | miss | part | miss | part | miss | HIT | miss |
| glm-5.1 | miss | miss | miss | miss | miss | HIT | miss | HIT | miss |
| nemotron-3-super-120b | miss | miss | miss | miss | miss | HIT | miss | HIT | miss |
| laguna-xs.2 | miss | miss | miss | miss | miss | miss | miss | HIT | miss |
| kimi-k2.6 | miss | miss | miss | miss | miss | miss | miss | HIT | miss |
| owl-alpha | miss | part | miss | miss | miss | part | miss | HIT | miss |
| nemotron-3-ultra | miss | miss | miss | miss | miss | miss | miss | HIT | miss |
| kimi-k2.7-code | miss | miss | miss | miss | miss | part | miss | HIT | miss |
| laguna-m.1 | miss | miss | miss | part | miss | miss | miss | part | miss |
| mistral-medium | miss | miss | miss | miss | miss | miss | miss | miss | miss |
| vibethinker-3b | miss | miss | miss | miss | miss | miss | miss | miss | miss |
HIT = detected; part = right spot, wrong bug (half credit, eligible non-hit); miss = looked, found nothing; jerr = judge undetermined (out of denominator); refu = model refused the task (out of denominator, never a miss); excl = auth/infra failure (never a miss); · = not run. A HIT marked n/N was found in only n of N trials (flaky); a bare HIT was found in every trial.