March Madness 2021: Assessing Our Statistical Model’s Performance

Mark J. Rebilas – USA TODAY

Last month, I released my modeled predictions for the NCAA Division I men’s basketball tournament. It was the second version of the machine learning model after it performed exceptionally well in 2019. Unfortunately, the bracket had a lot less green this time around.

Despite Gonzaga obviously being favored to win the tournament by every predictive model, they collapsed with an embarrassing performance in the national title game at the hands of the Baylor Bears. While the model correctly picked three of the Final Four teams, it did not pick Baylor to advance to the national title game, let alone win it all. As one-third of the maximum number of points for an ESPN bracket come from the final three games, Baylor’s unexpected success obviously had quite a toll on the model’s final performance. However, we can still evaluate its performance as we did two years ago.

Hits & Misses


Our model started off with a 3-1 record in the First Four after picking all four of the underdogs. Most of the win probabilities were quite close and all of the First Four games were pretty close too, so it doesn’t really mean much.

In the initial article, I said that “USC and Oregon both have relatively solid odds to make it to the Elite Eight.” The six-seed Trojans had a modeled probability of 45% to beat three-seed Kansas (they did) and the seven-seed Oregon Ducks had a modeled probability of 45% to beat the two-seed Iowa Hawkeyes (they did). The model also favored USC to beat Oregon in their matchup and for Gonzaga to annihilate USC (although that wasn’t a hot take). Not a bad job in the West region.

Houston advancing to the national title game was the surprising pick from the model. While they didn’t, they at least made it to the Final Four instead of the one-seed Illinois Fighting Illini. But they also had a historically easy route to the Final Four, so who knows how good they really were?


The East region was a bit of a mess. UCLA obviously shook everything up — the model didn’t pick them to advance past the first round and they ended up winning the entire region. Alabama lost early, Texas lost early, Connecticut was a sleeper pick and they lost in the first round, etc.

Alright, I don’t think anyone saw Oral Roberts winning two games. The model’s performance in the South region also wasn’t superb, but it’s at least a bit more understandable. I don’t think anyone saw Oral Roberts winning two games. The model also did not like Arkansas’ chances in the region, and while Arkansas did advance to the Elite Eight, they didn’t make it convincing.

Baylor annihilating Houston in the Final Four cost our bracket 160 points, so it was obviously an impactful miss. It would’ve been more understandable if the game was close (after all, Houston was given a 63% win probability, nowhere near a guarantee), but the Cougars didn’t look like they belonged on the same court as the eventual champion Baylor Bears.

Comparison to Other Models

Compared to the three most prominent March Madness models, The Spax comes in at third ahead of FiveThirtyEight and behind Model284 and SportsLine.

The Spax’s model was the highest performing by a slim margin through the Elite Eight, as it had a good start in the Round of 64 while also correctly picking three of the Final Four teams (unlike FiveThirtyEight, who incorrectly had Illinois in the Final Four instead of Houston).

Model284 and SportsLine pulled ahead in the Final Four where their model forecasted Baylor to move on to the national title game. That one game, being worth 160 points, ended up making the difference. All four models incorrectly predicted that Gonzaga would win the championship.


Every year, I submit a bracket to the ESPN Tournament Challenge which simply follows the model’s pick. Last year’s bracket finished with 1360 points (ranked 138,776th in the world, 99.2 percentile). I created 24 variations of that bracket, none of which performed any better. This time around, the model’s bracket finished with 920 points (ranked 3,115,289th in the world, 78.8 percentile). Not so good. I did create 15 variations, though. Unfortunately, I was overconfident in Houston beating Baylor, so I only made one bracket in which Baylor won the championship. This bracket was fantastic, accumulating 1400 points and ranking 64,554th in the world (99.6 percentile). Of course, no bracket pool lets you submit 25 entries, so that doesn’t mean much.

How can our model be improved for next year? There are plenty of ideas I’d like to implement into the model for future tournaments, such as accounting for injuries, a superstar 1 factor, tournament experience, etc.

While the model was unable to live up to the crazy standards set by its performance in 2019, it wasn’t too far off from how other models performed. Maybe it’ll have more success next year.

  1. It seems as if low-seeded teams with high-scoring guards are more likely to pull off upsets. It would be interesting to explore the data to see if this relationship actually exists.

Leave a Reply

Your email address will not be published.