MLB Stadium
2026 Contest - Mid-Season Diagnostic

The Field Is Down 95 Units

All four AIs underwater. 314 picks graded. Estimated -17.5% ROI across the contest. Here is the full statistical breakdown of what's gone wrong - by the numbers.

Combined Record (Mar 25 - Apr 10)
138-155-21
314 graded picks across 4 AI models
Combined Units
-95.08u
Estimated ROI: -17.5%
Combined Win Rate
47.1%
Break-even at -110: 52.4%
Days Underwater
14 / 17
All 4 AIs net negative since March 28

AI Performance Summary

RankAIRecordWin %UnitsROIAvg UnitAvg OddsImplied Prob
#1 Claude 40-46-6 46.5% -18.30u -13.7% 1.45u -55 53.0%
#2 Gemini 27-30-4 47.4% -21.26u -18.0% 1.93u -121 57.3%
#3 ChatGPT 32-36-7 47.1% -26.18u -20.0% 1.75u -101 56.8%
#4 Grok 39-43-4 47.6% -29.34u -18.1% 1.88u -109 56.9%

ROI is estimated using cumulative units divided by (avg unit size x graded picks). Avg odds and implied probability use parsed pick odds (~98% coverage of slate).

Cumulative Units By Date - The Rise and Fall

DateClaudeChatGPTGeminiGrok
03-28+18.64u+11.96u+2.30u+23.32u
03-29+17.34u+8.48u+0.17u+18.59u
03-30+12.84u-8.72u-16.32u-9.41u
04-01+12.84u-8.72u-16.32u-9.41u
04-03+13.64u-9.55u-11.82u-12.49u
04-04+13.05u-10.30u-8.32u-8.98u
04-05-0.94u-18.70u-14.08u-20.73u
04-06-1.64u-19.02u-18.23u-19.99u
04-07-4.94u-20.22u-19.38u-22.19u
04-08-18.30u-26.18u-21.26u-29.34u
04-10-18.30u-26.18u-21.26u-29.34u

Snapshots from each daily picks page after grading. Note: 04-08 and 04-10 show identical values because both pages were updated together after the April 10 slate was graded.

Peak vs Crash - Drawdown From Season High

AIPeak UnitsPeak DateCurrentDrawdown
Claude +18.64u 03-28 -18.30u -36.94u
ChatGPT +11.96u 03-28 -26.18u -38.14u
Gemini +2.30u 03-28 -21.26u -23.56u
Grok +23.32u 03-28 -29.34u -52.66u

Every AI hit their season high on March 28 - the end of opening weekend. The market figured them out fast.

The April 5 Massacre - Worst Single Day

AIDay RecordDay Units
Claude 5-10-2 -13.99u
ChatGPT 5-7-1 -8.40u
Gemini 2-5-1 -5.76u
Grok 4-7-2 -11.75u
Combined 16-29-6 -39.90u

A single Sunday slate erased 40 units across the four AIs. Claude alone dropped 13.99u that day - more than half the season's losses for some AIs.

Pick Distribution By Bet Type

AIMLRLTotalTTF5Total Picks
Claude 22 (25%)
32u
8 (9%)
11u
36 (40%)
49u
7 (8%)
12u
16 (18%)
25u
89
ChatGPT 17 (24%)
26u
5 (7%)
10u
32 (46%)
55u
5 (7%)
10u
11 (16%)
25u
70
Gemini 10 (17%)
18u
4 (7%)
9u
30 (50%)
58u
6 (10%)
14u
10 (17%)
17u
60
Grok 10 (12%)
16u
12 (14%)
24u
42 (51%)
84u
7 (8%)
14u
12 (14%)
19u
83

All four AIs lean Totals-heavy (overs and unders dominate). Grok and Claude post the most volume; ChatGPT and Gemini are more selective. F5 = First 5 Innings, TT = Team Total, RL = Run Line, ML = Moneyline.

Favorite vs Underdog Lean - The Chalk Problem

AIFav PicksFav UnitsDog PicksDog UnitsAvg OddsAvg Implied Prob
Claude 48 (70%) 66.0u 21 (30%) 33.5u -55 53.0%
ChatGPT 54 (82%) 94.5u 12 (18%) 18.5u -101 56.8%
Gemini 42 (89%) 84.0u 5 (11%) 8.5u -121 57.3%
Grok 57 (85%) 107.0u 10 (15%) 18.0u -109 56.9%

Every AI bets favorites significantly more often than underdogs. Gemini is the most extreme (89% favorites), Grok next (85%). When you pay -110 to -130 on most of your slate and hit 47%, the math is brutal: you need 52.4% to break even at -110.

Distribution By Odds Range

AIHeavy FavBig FavSmall FavPick'emSmall DogBig DogHeavy Dog
Claude 2
3%
9
13%
31
45%
8
12%
17
25%
2
3%
0
0%
ChatGPT 4
6%
18
27%
27
41%
11
17%
6
9%
0
0%
0
0%
Gemini 3
6%
15
32%
17
36%
8
17%
4
9%
0
0%
0
0%
Grok 5
7%
19
28%
27
40%
8
12%
8
12%
0
0%
0
0%

The largest single bucket for every AI is "Small Fav (-150 to -110)" - the worst possible price range to live in. You're paying juice but not getting enough win probability to make up for it. Heavy underdogs (+150 or longer) are nearly absent from every slate.

Distribution By Unit Size

AI0.5u1u1.5u2u2.5u3u+
Claude 828272061
ChatGPT 217192087
Gemini 4913111311
Grok 1171821198

Most picks live in the 1-2u range. Top plays at 2.5u+ are rare across the board. Every AI is sized similarly - none are "swinging for the fences."

The Diagnosis - Why The Field Is Down 95 Units

Problem #1: Chalk Addiction

All four AIs spend the majority of their slate betting favorites priced between -110 and -150. The average odds across the field is roughly -95 to -125, with implied probabilities of 53% to 57%. To break even on a -120 line you need to win 54.5% of the time. The combined field is hitting 47.1%. That's a 7-point gap - and 7 points of edge against you compounds violently across 314 picks.

Problem #2: The Hot Start Was a Mirage

Every AI peaked on March 28 - just four days into the season. Claude was +18.64u, Grok was +23.32u, and the field looked sharp. From March 29 through April 10, the combined number went from +56u to -95u - a 151-unit reversal in 13 days. That's not bad luck. That's the market closing the gap on whatever edges existed in the small spring sample.

Problem #3: April 5 Was a Bloodbath

One Sunday slate erased 40 units in a single day. Claude dropped 13.99u, Grok dropped 11.75u, ChatGPT dropped 8.40u, Gemini dropped 5.76u. The slate had nearly every AI on heavy chalk and everything went sideways. When you live on favorites, you don't have plus-money parlays to bail you out on a bad day.

Problem #4: Volume Without Edge

Claude posted 90+ picks (most volume). Grok posted 84. The combined slate is 314 picks in 17 days - 18 picks per day. When your win rate is 47%, more volume just means more losses. Gemini posts the fewest (61) and is the only AI under -22u. Selective sizing matters less than selective frequency.

Problem #5: Totals Heavy, And Totals Are 50/50

Totals (overs and unders) are the largest bet type for every AI - 30 to 42 picks each. Totals are notoriously close to 50/50 propositions in the long run, and the lines are sharp. You can't outwork the market on a -110 total without a real edge. The AIs don't appear to have one.

The Math, Cleanly

Combined: 138 wins, 155 losses, 21 pushes. Estimated risk: ~542 units across all picks. Net: -95.08u. That's an estimated -17.5% ROI. For context, a coin flip at -110 over 314 picks is expected to lose about -15u due to vig alone. The AIs are losing roughly 6x what pure chance would cost - meaning the picks are actively negative-EV, not just unlucky.