AI Analysis

Can AI Actually Pick MLB Games? What We Have Learned Testing Models in 2026

February 20, 2026 · 5 min read · Daily MLB Picks

Here's the honest truth about using AI to pick baseball games: it's better than a coin flip and worse than you probably hoped. We've spent months now throwing every tool we can find at this problem, from custom machine learning models to ChatGPT to purpose-built AI sports prediction platforms, and the results are genuinely fascinating. They're also a little humbling. The short answer is yes, AI can pick MLB games with a real edge. The longer answer involves a lot of caveats, some ugly spring training data, and the kind of hard-won lessons you only get by actually running the experiments instead of theorizing about them.

Where AI Prediction Models Stand Right Now

Let's talk numbers, because that's what this is really about. Platforms like Leans AI (running a model called Remi) are reporting a 53 to 58 percent win rate against the spread across major sports, with MLB specifically sitting around 54.5 percent ATS. We know, we know, 54.5 percent doesn't sound like it belongs on a billboard. But here's the thing: in sports betting, consistently clearing 53 percent is where you start actually making money after the vig eats its share. So 54.5 percent? That's not nothing. That's a real edge if you can sustain it.

On the research side, things get even more interesting. Studies using logistic regression and random forest algorithms have hit test accuracies of 58.9 and 59.6 percent respectively for predicting MLB game outcomes. When those models get to cheat a little by incorporating betting odds data, accuracy jumps to 61.1 percent. And if you really want to get your hopes up, deeper neural network approaches like Artificial Neural Networks and Support Vector Machines have reached accuracies above 94 percent in controlled research settings. We should probably mention that real-world performance tends to be, well, considerably less impressive than lab conditions. But the trajectory is clear: these tools are getting better, and they're getting better fast.

Combined ML models incorporating betting odds data have reached 61.1% prediction accuracy in MLB game outcome testing

What ChatGPT Gets Right and Where It Falls Apart

We've been putting ChatGPT through the wringer, and one of the more illuminating tests we've found came from Techopedia, who used it to make picks on 10 sports bets. Their approach was refreshingly simple: ask the model for a full matchup breakdown covering injuries, lineups, odds, matchup tendencies, and implied probabilities, then just take whatever pick it spit out. No second-guessing, no adjustments. Pure AI, for better or worse.

The results? Look, they were revealing in the best possible way. ChatGPT was genuinely solid at structured information analysis, pulling together public data points into reasoning that actually made sense. But it couldn't account for bad officiating, weather shifts, in-game injuries, or any of the chaotic stuff that actually swings outcomes in ways nobody sees coming. It works strictly with publicly available information and has zero access to the proprietary models that shape sportsbook lines. That's a hard ceiling, and no amount of clever prompting is going to punch through it.

For what we're doing here at Daily MLB Picks, this lines up exactly with what we've seen ourselves. LLMs are incredible research assistants. They can synthesize pitcher matchup history, parse platoon splits, and flag relevant trends faster than any human analyst could dream of. But asking them to be the one making the final call on a pick? That's asking a librarian to pitch the ninth inning. Wrong tool for the job.

Claude Has Been the Standout and It's Not Even Close

We need to talk about Claude, because out of everything we've tested, this one caught us off guard. Anthropic's model has been genuinely exceptional at baseball analysis in ways that the other tools haven't matched. Where ChatGPT tends to give you a competent but somewhat generic breakdown of a matchup, Claude digs into the stuff that actually matters for prediction: bullpen sequencing, platoon splits against specific pitch types, how a team's run prevention changes in day games versus night games. It doesn't just compile data. It thinks about the data in a way that feels closer to how an actual sharp bettor breaks down a game.

What's really stood out is how Claude handles context. You give it a pitcher's game log, a team's recent offensive splits, the park factors, and the weather forecast, and it comes back with analysis that connects all of those dots in ways we weren't expecting. It caught a bullpen fatigue angle on a Guardians game last season that none of our other tools flagged, and that ended up being the difference in the outcome. It's not just regurgitating numbers. There's a quality of reasoning underneath that makes us genuinely excited to run it through a full 162-game season.

We're planning to feature Claude heavily in our 2026 AI prediction testing. We want to see what happens when you pair its analytical depth with real-time Statcast data and our custom matchup models over a full slate of games, night after night, for six months straight. If it can sustain the kind of edge we've been seeing in our preliminary testing, this could be the tool that pushes our overall accuracy into territory that actually moves the needle for bettors. We'll have full transparency on every pick, every result, every win and every loss. No hiding the bad nights. That's the whole point.

Claude's analytical depth on pitcher matchups, bullpen fatigue, and situational splits has made it our most promising AI tool heading into the 2026 MLB season

Spring Training is the Hardest Test for Any Model

OK, here's where we have to get real about something that makes this work genuinely difficult. Spring training data is basically a minefield for machine learning models, and it's not hard to see why. Managers are splitting innings between 15 different pitchers. Lineup cards look like someone let a toddler play with a roster spreadsheet. Non-roster invitees are getting extended looks right alongside All-Stars who are out there working on a new slider grip instead of trying to win a baseball game.

All those patterns that ML models depend on, stuff like pitcher performance against specific lineups, bullpen usage tendencies, batting order optimization, none of it works the way it does during the regular season. Every model we've tested shows a noticeable accuracy drop when we feed it spring training data, and honestly, it would be weird if they didn't. The signal-to-noise ratio is atrocious. Players aren't trying to win. They're trying to get ready. Those are fundamentally different activities, and models that can't tell the difference are going to struggle.

That said, spring training isn't completely useless, and this is actually one of the more exciting parts of what we're exploring. Pitcher velocity readings, swing decisions, and batted-ball quality metrics from Statcast can serve as early indicators of player readiness heading into the regular season. The tricky part, the part we're still figuring out, is knowing which spring training signals actually matter and which ones are just noise from a reliever throwing two innings against a lineup of minor leaguers who'll be in Double-A by April.

The ABS Challenge System Adds a New Variable

Here's something that's got us genuinely excited and a little nervous at the same time. MLB's rolling out the Automated Ball-Strike Challenge System across all games in 2026, including spring training. It uses Hawk-Eye tracking technology to let pitchers, hitters, and catchers challenge ball-strike calls by tapping their cap or helmet. Think of it like replay review, but for every pitch. During Spring Training 2025 testing across 288 games, there were an average of 4.1 challenges per game, with challenges succeeding 52.2 percent of the time.

For prediction models, this is a whole new headache in the best possible way. A more accurate strike zone could shift pitcher performance profiles, alter walk rates, and change how batters approach counts in ways we haven't seen before. One detail that caught our eye: defensive players succeeded on challenges at a 54.4 percent rate compared to 50 percent for hitters during testing, which suggests pitchers may get a slight boost from the system. We won't really know until we've got a few weeks of regular season data under the new rules, though. For now, we're treating it as the most interesting unknown in our models.

What We Are Actually Doing With AI at Daily MLB Picks

So after all this testing, all these experiments, all these moments of staring at spreadsheets wondering if we're onto something or fooling ourselves, here's where we've landed for 2026. We're not letting any single AI model make the final call. That would be lazy, and frankly, the data doesn't support it. Instead, we're running a layered system. Custom models handle the data aggregation, hunting for statistical edges in pitcher matchups, park factors, and platoon splits. LLMs handle the qualitative research, pulling together injury reports, managerial tendencies, and situational context into something a human can actually digest quickly. And then a human brain makes the call.

We're not going to sugarcoat this: the best AI sports prediction tools in 2026 are delivering somewhere in the range of 54 to 61 percent accuracy on MLB games, depending on methodology. That's genuinely useful, especially when you pair it with disciplined bankroll management. But if someone's telling you their AI model hits at 80 or 90 percent on real baseball games, they're either lying or they haven't tested it on enough games yet. The inherent randomness of a 162-game season, where the best teams lose 60 games and the worst teams win 60, creates a natural accuracy ceiling that nobody's convincingly broken through. Not us, not anybody.

We'll keep testing. We'll keep iterating. And we'll keep telling you exactly what we find, including the parts that don't make us look like geniuses. That's the whole point of this project. Spring training gives us a few weeks to recalibrate before the real games start in late March, and we plan to use every single one of them.