Ahmed Cheema, Author at The Spax

March Madness 2023: Tournament Simulation Results

Ahmed Cheema — Wed, 15 Mar 2023 09:30:00 +0000

Marvin Gentry – USA TODAY Sports

In my last article, I posted this year’s iteration of my annual model for predicting the outcome of the NCAA Division I Men’s Basketball Tournament. I provided the modeled win probability of all 2278 possible matchups along with an “expected” bracket representing the outcome if the modeled favorite wins every game. Check that out here.

That bracket isn’t terribly practical, though. Just because the model gives Team A a 55% chance of winning doesn’t mean they’re going to win. That’s not much different from a coinflip! Furthermore, it doesn’t really reflect the other matchups that a team may have to go through. If Team A has a win probability of 20% against Team B in the second round, that doesn’t look too good for them. However, what if Team B’s win probability in the first round against Team C is only 51% and Team A’s win probability against Team C would be 80%? The “expected” bracket doesn’t show this, but that suddenly makes Team A’s outlook much better.

That’s where simulations come in. We can use the modeled win probabilities to simulate the entire tournament. We can run this simulation any number of times and compile the results to see how likely each time is to reach each round of the tournament assuming that the modeled win probabilities are accurate. I ran 10,000 simulations of the tournament – here are the results.

South Region

The top overall seed Alabama Crimson Tide are unsurprisingly the favorites to make it to the Final Four and win the region. Furthermore, their 12.5% probability of winning the tournament tops the region with No. 2 Arizona being the next best at 5.2%. Arizona looks to be a formidable team in their own right with a solid 20% likelihood of making the Final Four.

Afterwards, there’s a drop-off before the next tier of teams: No. 3 Baylor and No. 5 San Diego State. Their odds of making the Final Four, National Championship, and winning it all are all virtually the same. Everyone else is a long way down and not really worth a mention.

Midwest Region

Well, this one looks like a two man show.

There’s a lot to say about the Houston Cougars. They’re the favorites in Vegas to win the tournament and they boast one of the most stifling defenses in the country. With star Marcus Sasser set to return from injury shortly, the Cougars are poised to show off their best team in years. That’s saying a lot, given that they’re coming off of a Final Four appearance in 2021 and an Elite Eight loss in 2022.

Then we have the No. 2 Texas Longhorns. Texas is ranked 18th in offensive efficiency in 11th in defensive efficiency according to KenPom.com – with a 26-8 record and a recent Big 12 title, this balanced roster is coming into the tournament with some positive momentum.

The Cougars have an astonishing 44.6% likelihood of making the Final Four according to our model, with Texas following them up at 23.0%. The two favorites have 20.8% and 5.5% respective probabilities to win the entire tournament. No one else in the region is over one percent. It’s clear that one of these teams should be the ones to make it out of the Midwest.

East Region

No. 1 Purdue leads the East with a 5.7% probability of winning the tournament, followed closely behind by No. 2 Marquette at 4.9%. Spoiler alert, but this is the lowest mark for a region’s top team. Of all four regions, our model has the East as the least likely to field the tournament’s eventual champion.

Purdue and Marquette and virtually neck-and-neck for Final Four odds. Marquette is viewed as having a more favorable path to the Elite Eight, while Purdue is favored in a matchup between the two teams.

Afterwards, we have No. 4 Tennessee and No. 5 Duke – sleepers to steal the region and sneak into the Final Four, but not really viewed as massive title contenders (2.0% and 1.2% championship odds respectively).

West Region

Finally, we’ve got the West. Here’s a fun one – take a look at how much this model hates the No. 1 Kansas Jayhawks. It has the remaining top five seeds as more likely to make it to the Final Four – even No. 5 Saint Mary’s! If you want a one seed to fade, the Jayhawks may be the one for you.

At the top, we have the No. 2 UCLA Bruins with a fantastic 36.2% probability to win the region and a very nice 13.5% mark of winning the tournament. That’s behind Houston but slightly ahead of Alabama for the 2nd best overall. In order, they’re followed up by No. 3 Gonzaga, No. 4 Connecticut, No. 5 Saint Mary’s, and No. 1 Kansas.

Commentary

I think simulation analysis has a lot of value in revealing details in how the bracket shaped paths for the top teams. Last year, my simulations revealed that Kansas had the highest probability of making it to the Final Four. They weren’t viewed as the best team in the tournament and were not favored to win it, but the simulations demonstrated that they had the easiest path to the Final Four. Picking the Final Four teams correctly is a huge boost, so knowing which teams have easy/hard paths is immensely important.

This time around, Kansas isn’t so fortunate. They look like a fade candidate rather than a team poised to win their region. On the other hand, Houston has the highest Final Four probability in the tournament – if you want a safe Final Four pick, they might be your best bet!

Unfortunately, these margins are very small. We’re just playing the numbers here – anything can happen once these teams step on the court. There’s a reason no human has selected a perfect bracket yet, and there’s a reason it’s called March Madness. Don’t put too much stock on the statistics – there’s a lot that they can’t quantify. Most pre-tournament analyses like this one did not foresee the No. 8 North Carolina Tar Heels embarking on a miraculous Final Four run last season. We’re guaranteed to see more unexpected results this March, so don’t expect perfection. If you want to win your bracket pool, all you can do is play the numbers and hope for the best.

The post March Madness 2023: Tournament Simulation Results appeared first on The Spax.

March Madness 2023: Forecasting the NCAA Division I Men’s Basketball Tournament

Ahmed Cheema — Wed, 15 Mar 2023 09:00:00 +0000

Petre Thomas – USA TODAY Sports

For the fourth time in this website’s history, I’ll be training a model to predict the outcome of March Madness, the annual postseason tournament to crown a champion of men’s Division I college basketball.

Over the past four years, The Spax’s modeled bracket has varied in performance. Since a hot debut in 2019 where our model correctly predicted three of the Final Four teams (including five-seed Auburn) and the national champion Virginia Cavaliers, it has certainly cooled down.

The Spax’s March Madness model’s bracket performance through the years

Of course, scoring a lot of points is generally going to be difficult if you don’t predict the champion correctly. In two straight years of picking Gonzaga to win it all, the Bulldogs were not able to finish the job. Granted, most models in both years had Gonzaga winning it – they were the national favorites for a reason. That’s why an optimal bracket strategy would not be to simply pick the team with the highest probability to win each game like I do for the purpose of this analysis – rather, you should consider each team’s win probability relative to the public’s assessment of their win probability. Maybe some one-seed is modeled as the favorite to win their region, but not by that much, so you choose to pick the five-seed that has a solid 40% modeled probability to beat them head-to-head.

Anyway, let’s get to this year’s picks:

First Four

I slacked off this year and didn’t get this article out before the play-in games occurred, so all four of these games have actually ended by now. The forecast winner ended up going 3-1, the sole error being in Pittsburgh’s 60-59 win over Mississippi State.

South Region

Top Team: The top overall seed in the country is the University of Alabama, and they’re unsurprisingly the clear top dogs in their region as well. It would be a shock to see them fall prior to the Elite Eight, and they would also be favored to beat the two-seed Arizona Wildcats in that matchup. With top NBA Draft prospect Brandon Miller leading the way, Alabama has a chance to make a transformational run.

Potential Disappointment: There’s not a whole lot to say here – both No. 2 Arizona and No. 3 Baylor are forecast to make it to the Sweet Sixteen, and losing to one another wouldn’t really be a huge disappointment. However, it’s certainly possible that one of them falls in the Round of 32 – failing to make it to Sweet Sixteen because they suffered a loss to No. 6 Creighton or No. 10 Utah State would certainly be a slap in the face for either program.

Sleeper Pick: This is probably the least interesting region, to be frank. I don’t think there’s a great sleeper pick, but if I had to pick one I would probably go with No. 6 Creighton. I think there’s a chance that they snag a couple of early wins to find themselves in a Sweet Sixteen matchup against No. 2 Arizona.

Most Probable First-Round Upsets: No. 10 Utah State over No. 7 Missouri (70.70 percent chance)

Midwest Region

Top Team: The Alabama Crimson Tide are the top overall seed in the tournament, but according to sportsbooks, the Houston Cougars are the favorites to win it all. The Cougars have had a great run over the past five or so years, with coach Kelvin Sampson leading them to the Elite Eight in 2022, the Final Four in 2021, and the Sweet Sixteen in 2019. That’s a fantastic run, especially considering the injuries they endured (missing their most talented player in Marcus Sasser in 2022). They’ve already had postseason success, but the 2023 Cougars are even better than past iterations of the team and now all eyes are on a title. Sasser is entering the tournament with an injury and is questionable to play in the first round, but with a healthy lineup, the Cougars should be the safest Final Four pick in the tournament.

Potential Disappointment: The model has a Round of 32 match up between No. 3 Xavier and No. 6 Iowa State as a virtual coinflip. That early of a loss would be a let down for the Musketeers, and with a tough potential outing against Texas in the next round anyway, expectations shouldn’t be too high for Xavier.

Sleeper Pick: I’ll go with another six-seed here, this time the Iowa State Cyclones. Like I said, they’ve a shot to knock off Xavier if they can get past the first round. However, something interesting I noticed is that Mississippi State would’ve been forecast to beat Iowa State in the Round of 64. As you know, they lost to Pittsburgh in the First Four. Pittsburgh, on the other hand, are not at all favored to beat Iowa State. These sort of discrepancies are unsurprisingly common because the transitive property doesn’t apply to sports, but I wouldn’t be shocked to see Pittsburgh upset Iowa State.

Most Probable First-Round Upsets: No. 10 Penn State over No. 7 Texas A&M (37.73 percent chance), No. 13 Kent State over No. 4 Indiana (38.82 percent chance), No. 14 Kennesaw State over No. 3 Xavier (31.85 percent chance)

East Region

Best Team: The No. 1 Purdue Boilermakers enter the tournament with high expectations after 7’4 superstar Zach Edey led them to a 29-5 record and a Big Ten title. They’re tied with the Jayhawks for the 3rd best odds (Houston, Alabama) of winning the tournament per Bovada, and both our model and sportsbooks view them as the favorite to win the East.

Potential Disappointment: … but it’s not necessarily going to be a walk in the park. After the first round, the “easiest” predicted matchup for Purdue is in the Sweet Sixteen against Duke. In that affair, the Blue Devils are given a 34.87% win probability. Memphis has a modeled win probability of 37.74% in the Round of 32, and Marquette pose a threat of their own in the Elite Eight. It’s clear that the two candidates for disappointing one-seed are Purdue and Kansas this year.

Sleeper Pick: Watch out for the No. 8 Memphis Tigers. They’re coming off of a win over the Houston Cougars (albeit sans Sasser) to win the AAC Tournament as they enter the Big Dance with some good momentum. The same is true for Purdue, of course, who will undoubtedly be favored to win a potential matchup between the two teams in the Round of 32, but I wouldn’t write off the Tigers completely.

Most Probable First-Round Upsets: No. 10 USC over No. 7 Michigan State (40.47 percent chance), No. 11 Providence over No. 6 Kentucky (49.33 percent chance), No. 14 Montana State over No. 3 Kansas State (31.37 percent chance)

West Region

Top Team: The forecast winners of this region are the two-seed UCLA Bruins. While it’s always sticks out when we don’t pick a one-seed to advance, this result isn’t actually an upset – the Bruins are currently the favorites in Vegas to win the West region at +275, followed by Kansas at +350. The Bruins’ program has had a modern resurgence, with a Final Four appearance in 2021 and a Sweet Sixteen run in 2022. They’ll almost certainly make that three straight Sweet Sixteen appearances this year, with a key potential matchup against Gonzaga testing their ability to go the distance.

Potential Disappointment: Whenever a one-seed doesn’t get out of their region, it can be considered a disappointing tournament for them. The defending champion Kansas Jayhawks enter the West as the top seed, but as I mentioned before, they are not favored to come out on top. Furthermore, they’re actually forecast to fall to four-seed Connecticut in the Sweet Sixteen. It is clear that the Jayhawks will not have an easy road to the Final Four.

Sleeper Pick: Well, Connecticut’s a four-seed predicted to make the Elite Eight, so it suffices to say that they’re a solid sleeper pick to throw into your bracket. There’s not much else I’d buy into – I think picking Arkansas or Arizona State to get into the Sweet Sixteen would be a little too risky.

Most Probable First-Round Upsets: No. 11 Arizona State over No. 6 TCU (52.59 percent chance), No. 10 Boise State over No. 7 Northwestern (55.78 percent chance)

Final Four

The model forecasts the Final Four matchups to beat Houston v. UCLA and Alabama v. Purdue, with the favorites winning and moving onto the National Championship Game. The Cougars and the Crimson Tide are the clear two favorites to win the tournament, so it’s only fitting that they’d meet in the title game. If they do, we predict Houston to have the edge.

Needless to say, this is a pretty chalky Final Four. Just because this is viewed as the most likely outcome does not mean that it’s going to happen – it almost certainly won’t! Maybe Kansas will make it in over UCLA. Maybe Connecticut will get in over either. Perhaps Purdue falls in the Sweet Sixteen and Marquette takes their place. Or maybe some random double-digit seed comes in and takes a spot. Who knows? A good bracket probably shouldn’t pick these four teams in the end.

I’ve only shown a fraction of the possible matchups that can occur in March Madness. I’ll provide the win probabilities for every single one of them, though! If you’d like to see the model’s projections for matchups not listed above, you can search for them in table below which consists of 2,278 rows, one for every possible matchup. Just type the name of the two teams separated by a space. For instance, if you want to search for the Houston-Kansas matchup, type “Houston Kansas” into the search bar to get the win probability (69.34% for Houston).

In the next article, we’ll take a look at simulation results using these win probabilities.

The post March Madness 2023: Forecasting the NCAA Division I Men’s Basketball Tournament appeared first on The Spax.

Schadenfreude: Which NBA Teams’ Losses Drew the Most Attention?

Ahmed Cheema — Sun, 17 Apr 2022 05:35:39 +0000

Alonzo Adams – USA Today Sports

Yeah, yeah, I know. You read the title and immediately knew the number-one answer. It’s not exactly a surprise to anyone who has paid attention to the NBA this season, and I certainly made no attempt to hide it with my choice of thumbnail. Statistics don’t have to be surprising, though – let’s dive into the numbers anyway.

Background

Reddit is a popular social media network split into various communities based on different interests. Each community is known as a subreddit and they’re referred to with the prefix ‘r/’ before the community name. There’s a subreddit out there for anything you can think of. For example, you can browse r/sports for general sports discussion or r/soccer for general soccer discussion. Even more specifically, you can browse r/barca for news and discussion specifically pertaining to F.C. Barcelona.

In this article, we’ll focus on the NBA subreddit (r/nba), which is the most popular sport-specific community on the website.

At the time of this article being written, r/nba has over 4.6 million subscribers. r/soccer is the next closest sport-specific subreddit but still has more than one million fewer subscribers than r/nba. It is one of the most active communities on Reddit and is constantly buzzing with basketball discussion.

After every NBA game, any user can submit a post onto r/nba called a “Post Game Thread.” A typical post game thread (PGT) has a title including the two teams who played, the score of the game, and perhaps any other significant details (like a particularly outstanding individual performance). As an example, the title of the post game thread after today’s game between the Timberwolves and Grizzlies was titled “The Minnesota Timberwolves (1-0) pull out the Game 1 road win against the Memphis Grizzlies (0-1), 130-117, behind a 36 point playoff debut from Anthony Edwards.” The content of the post includes the box score for the game and it currently has 1531 comments as users discuss the game. Users can also upvote or downvote the post – a submission’s score is determined by subtracting the upvotes from the downvotes. The MIN-MEM post game thread has a score of 5781 with an upvote ratio of 97% (so 97% of the votes were upvotes, the other 3% were downvotes).

The goal of this article is to analyze post game threads from every game this season in order to determine which teams’ losses receive more positive attention that their wins. A year ago, I conducted the same analysis regarding the 2019-20 season. It was a rather simple project – for any given team, I obtained the median score of a post game thread in their wins and their losses and subtracted the two values. This year, I wanted to make the analysis more complex.

Methodology

I started by obtaining as many post game threads from the 2020-21 season as I could using the PushShift API on Python. If the original poster of a post game thread deleted the post after the fact (or their account was deleted), it was not included in this analysis. Thus, the data set is not fully complete. In the end, we obtained a post game thread for 73.2% of 2020-21 regular season games.

The next step was to collect relevant characteristics for these games that I thought may affect how high the score would be. This includes the difference in score, whether the game goes to overtime, the Vegas odds (so as to determine whether the game was an upset), whether the game went to overtime, etc. In addition, I included time variables like day of the week and hour of the day which may impact how many viewers a game had. Finally, a regression was ran using these independent variables to predict the dependent variable of a post game thread’s score.

Now we could turn to the 2022 regular season. I obtained 1117 post game threads (accounting for 90.8% of games played) along with the aforementioned relevant variables that may affect a post game thread’s final score. Then the previous regression was used to predict the score for each game and the actual score was subtracted from this value to represent the “score over expectation” (SOE).

The median SOE was calculated for each team in their wins and losses and these values were subtracted to determine their median SOE difference. A positive value suggests that their wins received more upvotes than their losses, while a negative value suggests that their losses were more well-received. Note: a logarithm was applied to the dependent variable to deal with heteroskedasticity.

Results

In the graph below I’ve plotted each team’s median SOE difference along with their win percentage.

In general, teams that win more receive more attention for their losses. That makes sense – the Suns were clearly the best team in the NBA in the regular season, so it’s bigger news when they lose games compared to their wins. The biggest tanking teams like the Rockets and Thunder receive more upvotes when they won, which also makes sense – a Houston win wasn’t exactly a common occurrence. The disparity isn’t as drastic as it is for winning teams like the Suns, though – teams don’t really care all that much about the worst teams in the league.

There are some winning teams that get more love for their wins than their losses. The Raptors and Cavaliers are the most obvious ones in this category, likely due to their great performance relative to preseason expectations. It was hard not to enjoy the Cavaliers’ success with a young core of Garland, Mobley, and Allen even with an incredibly injury-riddled season. It’s also not surprising to see the Warriors get more positive attention for wins than for losses. Despite being a recent dynasty, the Warriors somehow felt like a bit of comeback story due to their lack of success over the past two seasons. And of course, Steph Curry is one of the most beloved players in the NBA.

On the other side, the Nets lacked a spectacular record yet received far more attention for their losses than their wins. Why? Well, they entered the season with championship expectations and a big three of Kyrie Irving, James Harden, and Kevin Durant. Naturally, most fans are gonna be rooting against them.

There are also some variables that are unaccounted for. A big one is whether a comeback occurred. The Utah Jazz were extraordinarily adept at choking away large leads in the second half of games. If the Jazz lose a game in which they held a 25 point second half lead, obviously there will be more post game discussion than if they had lost in “normal” fashion.

Of course, the biggest thing that stands out is the Los Angeles Lakers – right there in the bottom center of the graph. A subpar team that everyone likes to see lose more than any other team.

The Los Angeles Lakers

The graph above clearly showed that the disparity in score for post game threads corresponding to Lakers wins and losses was greater than that for any other team despite their incredible mediocrity as a team. That’s not exactly a surprise, Let’s look into the numbers a bit more, though.

Shown below is a box plot of the score of post game threads after Lakers wins and after Lakers losses.

Only one Lakers win prompted a post game thread that had a higher score (4914) than the average post game thread for a Lakers loss. All it took was a 56 point performance from LeBron James against the Warriors in a tight win. And there were still 19 Lakers losses that led to post game threads with a higher score than that incredible game. An interesting parallel is a Bucks win on November 17th where Giannis dropped 47/9/3 against a LeBron-less Lakers team. This thread hit a score of 5635, 721 points higher than the LeBron 56 point performance over the Warriors. Why?

Some theorize that having a larger fanbase can drive positive narratives surrounding that team on the NBA subreddit. However, the Lakers have the largest fanbase in the league. Even though the Bucks are coming off of a championship, the size of their Internet following is meager relative to the Lakers’. However, the Lakers-Warriors post game thread had a 94% upvote ratio versus 96% for the Bucks-Lakers thread. One may point out that the Warriors also have an incredibly large fanbase that may have actually downvoted the post game thread for their team losing. By that logic, though, Lakers fans could do the same for the Bucks loss.

It seems that the factor that actually matters is what other fans think of a team. The Lakers aren’t a big enough fanbase to outweigh the hate that virtually every other fanbase has for their team.

We all know the Lakers aren’t exactly a team that’s widely adored among NBA fans. Most fans can be grouped into one of two categories: Lakers fans or Lakers haters. Our previous analysis from the 2020 season found that the Lakers were also the team that r/nba users enjoyed seeing lose the most. That was a bit different, though – the Lakers ended up winning the NBA Finals and were clearly a contender. The opposite was true this season – despite massive expectations, the Lakers fell short and ended up not only missing the playoffs, but even falling short of qualifying for the play-in tournament. Perhaps falling completely short of preseason expectations added to the enjoyment fans got out of seeing the Lakers lose.

Of course, there’s also the LeBron factor. Most fans know that LeBron James is one of the most polarizing figures in sports – he perhaps has more “haters” than any other athlete in the four major American pro sports leagues. It’s no surprise to see that hate translate to the team he plays for.

Last time I did a project aimed at answering this question, I wasn’t fully satisfied with the results due to the lack of variables accounted for. This time around, I’m more content with the depth of analysis done and I think the results speak for themselves.

The post Schadenfreude: Which NBA Teams’ Losses Drew the Most Attention? appeared first on The Spax.

March Madness 2022: Tournament Simulation Results

Ahmed Cheema — Tue, 15 Mar 2022 11:46:40 +0000

Eric Gay – Associated Press

In yesterday’s article, I described our annual logistic regression model for predicting the outcome of the NCAA Division I Men’s Basketball Tournament. I applied it to this year’s tournament to get the model’s “expected” bracket and the win probability for each team in every possible matchup. You can click the link here to see all of that good stuff.

Another good way to show the results is through a simulation – we can use the modeled win probabilities to simulate the entire tournament in less than a second. For example, I just ran a simulation of the tournament and in this hypothetical scenario, the Final Four teams are Gonzaga, Villanova, Kentucky, and Kansas. The Jayhawks went on to top the Gonzaga in the championship game. Not at all difficult to imagine that playing out.

We can run this simulation any number of times and compile the results to see how likely each time is to reach each round of the tournament assuming that the modeled win probabilities are accurate. We’ll go through each region to see the results.

West Region

Gonzaga won the West in 34.0% of our simulations, ahead of Duke at 26.1% and Texas Tech at 16.9%. There’s a drop-off afterwards before we get No. 4 Arkansas at 5.2% and No. 5 Connecticut at 6.2%. In terms of who can actually win the tournament, we see the same pattern of a “Big 3” with Gonzaga at 14.3%, Duke at 7.8%, and Texas Tech at 3.5%.

South Region

Despite the previous article having Houston in the Final Four in the situation in which the model is perfectly accurate, Arizona does have a substantially higher chance of reaching the Final Four in our simulated runs. The one-seed Wildcats reach the Final Four in 33.9% of simulations versus 20.7% for Houston followed by 15.7% for No. 3 Tennessee and 15.0% for No. 2 Villanova. These are also the only four teams who won the tournament in more than 1% of simulations.

Houston is a super interesting case. Most metrics love them, but there is plenty reason for concern – a weak schedule, two key injuries, and the fact that they’ll probably run into Arizona in the Sweet Sixteen. I would personally wager that the analytics are overrating the Cougars’ chances, so I’d be weary of picking them over Arizona to make the Final Four. Their performance in the tournament is probably the thing I’m most interested in keeping an eye on.

East Region

Now we have the East, the most tightly contested region. Baylor is favored to win it at 27.1% followed by Kentucky, Purdue, and UCLA at 24.8%, 18.7%, and 16.3% respectively. That’s an incredibly tight race between the top four seeds.

It’s worth noting that the Baylor Bears are dealing with injuries right now (not explicitly accounted for in the model) which should give even more reason for you to not pick them in your Final Four. I also do not think any teams outside of the top four seeds in the region are worth picking – I would choose between Kentucky, Purdue, and UCLA.

Midwest Region

Finally, the Kansas Jayhawks won the Midwest in 37.8% of simulations. Auburn represented the region in 21.0% of the iterations followed by the five-seed Iowa Hawkeyes at 15.3%. Impressively, the 11-seed Iowa State Cyclones managed to break into the Final Four in 4.5% of the simulations.

Based on this data, it also seems that the model views the Jayhawks’ path as being the easiest. It’s not easy seeing any other team coming out of the region and winning the entire tournament.

Commentary

The two most important factors in picking a successful March Madness bracket is to have an accurate Final Four and accurately pick the champion. And part of having a strong FInal Four is to be smart with the seeds that you select. On average, 1.7 one seeds make it to the Final Four, so you don’t want to go all “chalk.” Baylor is probably not a one-seed that should be selected, leaving you to pick between Gonzaga/Kansas/Arizona.

It is also fairly common for a seven-seed or lower to make it to the Final Four. A few interesting picks to make it to the Final Four are No. 8 San Diego State (3.0%), No. 9 Memphis (2.5%), and No. 11 Iowa State (4.5%). Or maybe you’d go with slightly higher seeded sleepers like No. 5 Iowa (15.3%), No. 4 UCLA (16.3%), and No. 5 Houston (20.7%). Based on ESPN bracket data, all six of these “sleepers” are picked to advance to the Final Four at a lower rate than they actually do in simulations – perhaps they’re strong values pick in larger pools.

As for who’s going to win… well, that’s a tough question. Gonzaga (14.3%), Kansas (14.0%), and Arizona (12.8%) won the tournament most often during these 10,000 simulations, followed by a drop-off before we get teams like Houston, Duke Baylor, and Kentucky. Gonzaga is actually given a significantly higher chance of winning by sportsbooks (25% implied odds) and the public (27.8% of ESPN brackets picking them to win). Whether it’s rightful or not, the model does not view Gonzaga’s chances as highly as others. In this case, Kansas is an interesting contrarian pick as just 8.5% of brackets have them winning it all.

Unfortunately, these margins are very small. We’re just playing the numbers here – anything can happen once these teams step on the court. There’s a reason no human has selected a perfect bracket yet, and there’s a reason it’s called March Madness. Don’t put too much stock on the statistics – there’s a lot that they can’t quantify. Most pre-tournament analyses like this one did not foresee the No. 11 UCLA Bruins embarking on a miraculous Final Four run last season. We’re guaranteed to see more unexpected results this March, so don’t expect perfection. If you want to win your bracket pool, all you can do is play the numbers and hope for the best.

The post March Madness 2022: Tournament Simulation Results appeared first on The Spax.

March Madness 2022: Modeling the NCAA Division I Men’s Basketball Tournament

Ahmed Cheema — Mon, 14 Mar 2022 07:15:00 +0000

Trevor Ruszkowski – USA Today Sports

Since creating this website in November 2018, I’ve created yearly statistical models for the purpose of predicting the annual NCAA Division I Men’s Basketball Tournament.

The first time was in 2019 when we successfully forecasted the national champion Virginia Cavaliers and three of the four Final Four teams (including the five-seed Auburn Tigers). The predicted bracket finished in the 99th percentile of ESPN’s Tournament Challenge.

The tournament returned in 2021 after the COVID-19 pandemic forced the cancellation of the 2020 edition. We weren’t quite as successful this time around – while the model picked three of the four Final Four teams again (including the two-seed Houston Cougars), the Baylor Bears unexpectedly bested the Gonzaga Bulldogs in the National Championship Game. Not picking the correct champion (which is worth by far the most points) meant the model’s corresponding bracket finished in just the 79th percentile of all brackets submitted to ESPN. Although I did submit a variation of it that had Baylor correctly besting Gonzaga in the national championship and finished in the 99.6th percentile, most bracket pools don’t allow multiple entries so that isn’t particularly relevant.

In any case, I believe the first two iterations of The Spax’s March Madness model have been reasonably successful and I’m glad to bring it back again this year with some changes to its methodology, including a bonus for games previously won in the tournament. For example, if we were predicting the probability of Georgia State (16) beating Boise State (8) in the Round of 32, we’d automatically factor in the fact that Georgia State must have beaten Gonzaga (1) in the first round in order for that matchup to even take place. A few other variables have been added as well, including one for a team’s tournament experience.

With that said, let’s get started by taking a look at the First Four matchups.

First Four

Notre Dame v. Rutgers is viewed as essentially a coinflip while Texas Southern and Indiana winning their respective games seems to be a “safer” bet. Meanwhile, Bryant is predicted to have a ~58% probability of besting Wright State to earn a R64 matchup against Arizona.

Most of these odds roughly line up with sportsbooks’ odds with the exception of Bryant vs. Wright State – Bryant will be entering that game as 3.5 point underdogs.

West Region

Top Team: Gonzaga will be playing their third consecutive tournament as the top seed in their region. While they definitely aren’t as viewed as strongly as they were last year (NCG loss vs. Baylor), they are still considered the favorite to win the tournament and they have the best modeled probability of reaching the Final Four. An upset before then would not be the craziest thing to happen, though – instead of an 88% win probability in the Round of 32 like last season, the Bulldogs have “just” a 75% win probability of beating Memphis in the second round this year. While they should make it out of the region, their path is not the easiest it could be and it’ll definitely be something to watch.

Potential Disappointment: The Duke Blue Devils will be seeking a magical farewell run for the legendary Coach K, but it will not be an easy path for them at all. While their potential Round of 32 matchup against Michigan State isn’t considered as worrisome by the model as some fans & analysts view it, Texas Tech is given a slight edge over them in the Sweet Sixteen matchup. Interestingly, though, if Duke and Gonzaga do matchup in the Elite Eight, the Blue Devils have a respectable modeled win probability of 42.44%. I could realistically see Duke losing at any point after the first round so I’m not sure exactly what to expect.

Sleeper Pick: It’s not obviously looking at the bracket, but the nine-seed Memphis Tigers are an interesting team to watch this year. Given how good Gonzaga is and what Memphis’ seeding is, a 25% win probability is relatively interesting. In a potential Sweet Sixteen matchup against five-seed Connecticut, Memphis has a modeled 43.15% win probability and a 29.05% modeled win probability in an Elite Eight bout against Texas Tech. None of these are even close to guaranteed and obviously even their first round matchup is viewed as a tight one, but I do think Memphis is the most interesting mid-low seed in the region.

Most Probable First-Round Upsets: No. 10 Davidson over No. 7 Michigan State (36.43 percent chance)

South Region

Top Team: According to our model, the five-seed Houston Cougars would represent the South region in the Final Four if the modeled favorite won every game. This may come off as a surprise, but analytics do seem to like the Cougars – they rank 4th in KenPom’s rankings and 2nd in BartTorvik’s. It’s not at all difficult to make the case that Houston may be underseeded. On the other hand, two of their four most important players are likely out for the season (Marcus Sasser and Tramon Mark). The model does not explicitly account for injuries (it does place extra weight on more recent games where Mark and Sasser did not play, though) so this should definitely be something to keep in mind when filling out a bracket. You don’t want to just put one seeds in the Final Four so naturally Houston is a pretty good Final Four pick to go against the grain, but giving them a 51.01% win probability over Arizona is probably unrealistic. They will also have to go through Illinois and Tennessee to reach the Final Four, so there’s no reason for excessive optimism – it’ll be quite the tough path. Hell, UAB is even a popular 12 seed upset pick in the first round.

Potential Disappointment: Anytime a one-seed doesn’t make it out of the Sweet Sixteen, it can be considered a disappointment. And in a hypothetical third round matchup against Houston, the top seed Arizona Wildcats have a modeled win probability of 48.99%. While that’s basically a 50/50, it is an interesting possibility for an early exit for the team with the second-best odds of winning the tournament according to sportsbooks. It’s worth noting that Arizona would have modeled win probabilities of 67.96% and 76.19% in hypothetical Elite Eight matchups against No. 2 Villanvoa and No. 3 Tennessee respectively. If they manage to get past Houston, or escape a matchup against them, they have as good of a chance of making the Final Four as the other one seeds.

Sleeper Pick: This region has the potential to be a bit wild – a couple first round teams to watch are No. 13 Chattanooga and and No. 11 Michigan. The model gives Michigan a 60% probability of pulling off the first round upset while the Chattanooga has a relatively high 33% probability for the massive upset over Illinois.

Most Probable First-Round Upsets: No. 10 Loyola Chicago over No. 7 Ohio State (41.92 percent chance), No. 11 Michigan over No. 6 Colorado State (60.00 percent chance), No. 13 Chattanooga over No. 4 Illinois (32.89 percent chance)

East Region

Top Team: The one-seed Baylor Bears are the modeled favorites to come out of the East. The Bears won the 2021 NCAA Division I Men’s Basketball Tournament (also as the one-seed) with a blowout win over the undefeated Gonzaga Bulldogs in the National Championship Game. While going for back-to-back titles will be an incredibly difficult task, it would not at all be a surprise to see them in the Final Four again …

Potential Disappointment: … but it also wouldn’t be surprising to see an earlier exit for the Baylor Bears. They’re not expected to have an easy time in the East – the very real threat of potential matchups against No. 2 Kentucky, No. 3 Purdue, and No. 4 UCLA pose true danger to the Bears. Baylor would have modeled win probabilities of 62.65%, 60.40%, and 62.49% in these matchups respectively. While that makes them consistent favorites, they’re far from a sure thing. Any of those top four teams could realistically be seen in the Final Four. In particular, the two-seed Kentucky Wildcats are a very formidable threat to win the East and disappoint the Baylor Bears.

Sleeper Pick: In my opinion, the four-seed UCLA Bruins are the most interesting team in the East. I expect them to advance to a Sweet Sixteen matchup against No. 1 Baylor where they have a solid 37.51% modeled win probability. And in hypothetical Elite Eight matchups against No. 2 Kentucky and No. 3 Purdue, the Bruins have modeled win probabilities of 35.77% and 53.65% respectively. In 2020, the No. 11 Bruins unexpectedly made a miraculous run all the way to the Final Four (and went toe-to-toe with Gonzaga) and they look even better in 2021.

Most Probable First-Round Upsets: No. 10 San Francisco over No. 7 Murray State (64.36 percent chance), No. 11 Virginia Tech over No. 6 Texas (36.43 percent chance)

Midwest Region

Top Team: Bill Self has coached the Kansas Jayhawks to the one-seed for the ninth time and after dominating the Big 12 Tournament, they look like a team ready for the big dance. The one-seed Jayhawks should probably not have much of a problem advancing to the Sweet Sixteen (although they’d “only” have a modeled win probability of 71.26% in a R32 matchup against SDSU). Iowa and Auburn may not be walks in the park but Kansas could have a harder path.

Potential Disappointment: The No. 11 Iowa State Cyclones actually have an above 50% modeled win probability in the first round against No. 6 LSU – despite the Tigers being 3.5 point favorites in Vegas, the model really doesn’t like their chances against the Cyclones. And in a potential second round matchup against the No. 3 Wisconsin Badgers, the Cyclones have a modeled win probability of 43.02% – pretty remarkable for an 3 vs. 11 matchup. As such, the three-seed Badgers are a glaring candidate for an early exit according to the model.

Sleeper Pick: The model clearly considers the aforementioned Iowa State Cyclones as a possibility for a surprising run. Looking elsewhere for another sleeper, their in-state rivals serve as an interesting candidate. The Iowa Hawkeyes are expected to get past Richmond and South Dakota State in the first two rounds to meet the one-seed Jayhawks in the Sweet Sixteen. And a 30.58% modeled win probability against Kansas is not at all bad. 42.68% modeled win probability versus No. 2 Auburn in the Elite Eight and a whopping 68.21% win probability against No. 3 Wisconsin. And just for fun, Iowa would have the 60.81% edge in an in-state Elite Eight matchup.

Most Probable First-Round Upsets: No. 11 Iowa State over No. 6 LSU (55.78 percent chance)

Final Four

And we’ve reached the Final Four! We have three one-seeds (Gonzaga, Baylor, Kansas) and a five-seed in the Houston Cougars. Funny enough, it reminds me of last season where the model forecasted Houston beating Baylor in the Final Four to advance to the NCG where they were predicted to lose to Gonzaga.

This time around, Houston is barely favored over the Jayhawks while Gonzaga has a decent edge (as far as you can expect from a 1 vs. 1 matchup) over Baylor in a rematch of last season’s National Championship Game. And once again, the model foresees a Gonzaga vs. Houston matchup in the championship, with Gonzaga coming out on top again.

If we predict the same championship matchup every year, it has to happen eventually. Right?

Jokes aside, it would be odd if Gonzaga wasn’t favored. They are viewed by the favorites by most statistical models, Vegas’ odds, and public opinion. But we play the games for a reason – anything can happen. If Kentucky reaches the Final Four instead of Baylor, Gonzaga’s modeled win probability would drop to 57.65% – a much closer matchup.

Or maybe we get No. 2 Duke vs. No. 1 Baylor (50.86% win probability for Baylor). Or No. 2 Duke vs No. 2 Kentucky (56.76% win probability for Kentucky). Or No. 1 Arizona vs. No. 1 Kansas (58.28% win probability for Kansas).

Needless to say, there’s a lot of possible matchups in March Madness. We’ve only presented a subset of the possibilities so far. If you’d like to see the model’s projections for matchups not listed above, you can search for them in table below which consists of 2,278 rows, one for every possible matchup. Just type the name of the two teams separated by a space. For instance, if you want to search for the Gonzaga-Houston matchup, type “Gonzaga Houston” into the search bar to get the win probability.

In the next article, we’ll take a look at simulation results using these win probabilities.

The post March Madness 2022: Modeling the NCAA Division I Men’s Basketball Tournament appeared first on The Spax.

Mathematically Optimizing an NBA Player Guessing Game

Ahmed Cheema — Wed, 02 Mar 2022 21:49:30 +0000

David Richard – USA Today Sports

Background

If you’ve been paying attention to social media at all recently, you’ve probably heard about Wordle. It’s a popular game where users are given six attempts at guessing that day’s mystery word. You guess a five-letter word and are given clues that help you reach the answer for that day. It has become a daily exercise for many users around the world and thanks to its creative design (you’ve probably seen those colorful diagrams consisting of green, yellow, and black squares), it has become a ubiquitous staple of life in many social circles.

Unsurprisingly, the success of Wordle has sparked the creation of variants. These parodies include copies of the game ported to different languages or more difficult versions of it, such as Quordle where users essentially have to solve four Wordles at the same time. There are also many variants of Wordle that are guessing games for a different topic, such as Worldle where users try to guess a country or Nerdle, a math spinoff.

As an avid NBA fan, one Wordle variant that caught my eye was Poeltl. Named after San Antonio Spurs center Jakob Poeltl, the game gives you eight attempts at guessing a mystery NBA player. In each guess, you input an active NBA player and you are then giving clues based on some characteristics.

For example, suppose my first guess for Poeltl #6 (03/02/2022) was CJ McCollum. Here’s the output:

McCollum’s team is not highlighted in green or yellow. If it was green, it would mean that the mystery player is currently on that team. If it was yellow, it would mean that the mystery player was previously on that team. As it is not highlighted at all, we know that the mystery player never played for the New Orleans Pelicans.

However, the conference is highlighted in green. As such, we know that the mystery player is currently on one of the other fourteen teams in the Western Conference.

We also know the player is not in the Southwest Division. That leaves the Pacific and Northwest divisions – only ten possible teams!

We are told that the player is a guard as well. If the color of the position was yellow, it would mean that the player could be a G-F or F-G (both a guard and forward). As that is not the case, we know that the mystery player is either solely a point guard or solely a shooting guard.

We also know that the player’s height is not 6’3. The arrow pointing up indicates that the true height is greater than 6’3, but the yellow color tells us that our guess was within two inches. Thus, the mystery player can only be 6’4 or 6’5.

The same mechanic can be reversed to help us with age. We know the mystery player is less than 30-years-old, and because McCollum’s age is not yellow, it means that our guess was not within two years of the mystery player’s true age. Thus, the player we’re looking for is at most 27-years-old.

Finally, jersey number. We got lucky! The mystery player shares the same number on their jersey as CJ McCollum. Otherwise, we would have used the same logic as we did with height and age to narrow our options down (use arrows & the presence or lack thereof of the yellow highlight).

That’s a lot of information to process. We got a lot of good information out of it, though – there can’t be that many 6’4 or 6’5 guards with the jersey number #3 on one of ten teams in the NBA who are also at most 27-years-old. In fact, only three players meet this criteria. The hard part, of course, is identifying who they are. In this case, the answer was Jordan Poole, 6’4 2–year-old shooting guard for the Golden State Warriors (Pacific Division).

Goal

After my first time playing this game, my first thought was that it was pretty fun and well-designed. My second thought was, “I wonder what the most optimal first guess is.” It was a popular question among fans of Wordle (and controversial). Some people liked to go with a word stacked with vowels like “adieu” while some analyses supported words like “salet” or “crate.”

In the case of Poeltl, I wondered what characteristics would make a good starting guess. Maybe you want a journeyman guy who has been on a lot of teams. Perhaps you’d like someone with average height and age so the arrows are more meaningful. How about a 6’6 journeyman 27-year-old with a jersey number in the 20s? That seems optimal. Does that exist? If they do, is that even an optimal guess? How much does it really matter?

I was bored and wanted to tackle these questions, so I decided to whip up Python and algorithmically determine the best starting guess in Poeltl.

Methodology

The first step was data collection – I needed all of the information that is a part of the game. That means obtaining the height, age, position, team, and former teams of every active NBA player. That could be complicated for something like position which isn’t really objective. Fortunately, experimentation made it obvious that the game got all of its data from nba.com/stats, so I did the same.

I put together a data frame of 496 players currently on an NBA roster. As far as I can tell, these players are the only ones you are allowed to guess in the game. Go ahead and try guessing Matt Ryan (no, not the quarterback) and their jersey number will come up as N/A. Why? They don’t have a jersey number listed on the Celtics’ team page. I was fairly confident that my data aligned perfectly with the data Poeltl uses.

I coded a function that takes two inputs: a guess and an answer. It outputs a dictionary representing the information that a user would hypothetically learn if they inputted that guess into Poeltl when the mystery player is that answer. Then once that dictionary is inputted into another function, the players that meet that criteria are outputted. Here’s an example using our previous CJ McCollum guess:

PossibleAnswers(GuessInformation('CJ McCollum','Jordan Poole')).player

['Jordan Poole', 'Terence Davis', 'Trent Forrest']

We already know the answer is Jordan Poole, so GuessInformation('CJ McCollum','Jordan Poole') will output a dictionary representing the information we learned from that first guess. If you wanted to use this to cheat prior to knowing the answer you could always input that manually, but that’s not the point here. Based on the information we received, PossibleAnswers reveals that there were only three possibile answers, one of which was obviously the true mystery player.

But suppose the mystery player was, I don’t know, Matt Ryan (again, not the quarterback). How many possibilities would there have been after our first guess?

len(PossibleAnswers(GuessInformation('CJ McCollum','Matt Ryan')).player)

Okay, wow. Not nearly as useful. The only information you really receive is that the mystery player is in the East, is not a guard, is taller than 6’5, and is younger than 28-years-old. That leaves a lot of possibilities.

As a matter of fact, based on possibilities leftover afterwards, Matt Ryan is the worst-case-scenario mystery player for a first guess of CJ McCollum.

For 50.4% of possible mystery players, a first guess of CJ McCollum would leave six or less possibilities based on the information given by the game. But there is that 13.9% of possible mystery players for which the McCollum guess would leave over 100 possibilities for. Overall, the mean number of possibilities left by a first guess of CJ McCollum is 22.3 and the average reduction in number of possibilities is 95.5%.

One issue – this assumes that all 496 active NBA players are possible answers to the game. Common sense tells us that a well-designed game would not allow this to be a possibility. Did you even know that a player named Matt Ryan was in the NBA? I sure didn’t until now.

According to an interview with the game’s creator Gabe Danon, there are “only 300 or so mystery players in Danon’s pool of candidates.” Hm. How do we tackle that?

I took a slightly unscientific approach and went through all 496 players and decided whether or not I thought they were relevant enough to be an answer. There were more objective methods I could’ve taken, but I’m confident enough in my NBA fandom to think that this process would be reasonable enough. I generally took out players I never heard of (while using games played and minutes per game as a reference to make sure I didn’t take out actual key players) or players who I knew of but didn’t think would be put into the game.

I think I was fairly generous. I ended up with 353 players. Maybe a little more than “only 300 or so” (Or would you use that wording to describe the number 353? Probably not, right?) but I think it’s preferrable to removing too many players. In any case, I do not think the results of this analysis would change dramatically.

I proceeded to track the mean number of possibilities left by a first guess of a player, the maximum number of possibilities left, and the average percent reduction in number of possibilities for all 496 possible guesses. I did this under the false assumption that any of the 496 players could be the answer, and then once more under the more reasonable assumption that the answer could only be in a self-filtered list of 353 players. Onto the results!

Results

The table below contains the results of our analysis for the scenario in which we assume that all 496 active NBA players could be the mystery player on a given day.

The first guess that provides information eliminating the most possibilities on average is Lindy Waters III, who is currently signed by the Oklahoma City Thunder on a two-way contract. The most options that you could be left with after guessing Waters first is just 15 out of 496.

LaMar Stevens is similar – 6’6, 24-years-old, and wears #8. Or Derrick Jones who is a 25-year-old 6’6 forward wearing #5 on their jersey.

What makes him such a great guess? Well, the median NBA jersey number is 13 and Waters rocks the number 12. The median age of active NBA players is 25, Waters is 24-years-old. Waters also happens to have the same height as the average NBA player at 6’6. By effectively splitting the playerbase in half with each of these three numeric characteristics, you’re bound to cut down on the possible answers with a first guess of Lindy Waters III.

How about the worst guesses? The worst guess by far is Boban Marjanovic, the 33-year-old 7’4 center for the Dallas Mavericks who wears a relatively high jersey number of 51. The percentage of players with a higher height, age, and jersey number than Boban are 5.4%, 0.0%, and 3.6% respectively. Yeah, you’re getting very little information there. In general, the worst guesses appear to be big men, specifically ones that are especially young or old.

Next, we apply our analysis to the filtered subset of players that were deemed to be realistic answers.

There is not much substantial change here – in fact, the correlation coefficient between the mean percent decrease for the two analyses is a whopping 0.997. The physical characteristics of the players removed from the data set are not significantly different from those that were not removed, so there is no reason to think that the results change all that much.

With that said, Lindy Waters III is no longer the most optimal guess. That title now belongs to Ish Wainright. Wainright is a 6’5 27-year-old forward who wears #12 for the Phoenix Suns. The average number of valid possibilities after a first guess of Ish Wainright is just three, while the maximum is 12. Wainright is three years older than Waters, which makes sense – the more ‘relevant’ players in the NBA (who are more likely to be a possible mystery player) are probably going to be older than the young guys who are relatively unknown. Thus, the most optimal guess on the filtered data set should be a bit older.

If you want an optimal guess that has the best shot at also being a potential answer, Wainright is probably not your best bet. Cody Martin might be a good shot, as he averages 26.8 minutes per game for the Charlotte Hornets and is essentially just as good of a first guess as Wainright. Other options include Taurean Prince, Jae’Sean Tate, and Derrick Jones Jr. Jaylen Brown and Zach LaVine are similarly fantastic picks if you want All-Star caliber guys.

Looking Past the First Guess

If you want to get the most theoretical information out of your first guess, Ish Wainright is your man. But what about the second guess? The third? Should we be considering how the game might play out after your first pick?

Recall that the function GuessInformation(guess,answer) outputs a dictionary representing the information learned from a guess. We can set guess="Ish Wainright" and iterate all 353 candidate mystery players as answer to get the 353 possible information dictionaries for Ish Wainright. For each one, we can then cut our data set down to the players that are still possible answers based on the information given, and we can then repeat the previous process to find the best second guess in each situation and track how many possible valid answers there still are after the second guess. If that number is equal to 1, it means our algorithm can get to the right answer within three guesses.

Using Ish Wainright as our first guess, the mean number of guesses needed to get to the mystery player is approximately 2.425 and the maximum is three. Using this strategy, it’ll never take more than three guesses to win Poeltl assuming that our list of 353 players includes all of the “300 or so” actual possible answers. Doing the same with Cody Martin results in a average of 2.419 guesses and a maximum of four. We could go through and do this for every player, but that would be computationally time-consuming so I’ll leave it at the top ten first guesses in the previous table:

There isn’t really much of a noticeable difference in any of these, similar to the best Wordle words. Given that its maximum attempts to guess the mystery player is just three while also eliminating the most possibilities on the first guess, I’d stick with Ish Wainright as the best first guess if you don’t care about the hole-in-one chance. Otherwise, Cody Martin might be the better choice.

Conclusion

I used Lindy Waters III as my first guess on the last two Poeltls (#5 and #6). Both times I was able to guess the mystery player on my second try because the information from the first guess actually left just one possibility (which is true about one-third of the time for the most optimal guesses).

However, I’m sure if I continued with this strategy I’d have some moments where I was unable to identify the right player even if I was given all of the information theoretically needed to do so. It’s worth mentioning that players like Wainright appear to be the strongest guesses for an entity that knows the jersey number, age, position, team, height, etc of every active NBA player. It’s likely (and true in my case) that a user won’t know every player’s jersey number, for example, in which case the value of a first guess like Ish Wainright would be diminished greatly.

When the discussion of optimizing Wordle occurs, it’s usually met with blowback by fans who think it ruins the fun of the game and that people shouldn’t be using the same first guess everyday. While I use my favorite starting word of “raise” everyday in Wordle, I do understand that perspective and in the case of Poeltl, I don’t think using Ish Wainright as a first guess is particularly enjoyable and I definitely won’t be doing that moving forward.

This exercise was simply an example of using simple coding and statistics to answer a random question that I had, and I now consider it as being successful.

As an aside, this research does makes me think that the game of Poeltl could maybe be updated to increase the difficulty of the game. Perhaps the introduction of a ‘hard mode’ would spice things up. The mechanic of telling you whether or not a player’s age or height is within two of your guess makes life far easier for the user and for avid fans, I don’t really see how you could miss the right answer in eight attempts unless it’s a lesser known player. I focused on guys like Lindy Waters III and Ish Wainright in this article, but essentially any first guess is going to severely limit the number of possibilities.

The average number of valid possibilities left after the first guess is just 10.39 – that’s a 97% drop. Then again, this is still under the assumption that you would be able to identify those possibilities based on information as mundane as jersey number. Take with that what you will! In any case, Poeltl is a fun game and I would definitely recommend it to any NBA fan.

The code used for this article can be found here.

The post Mathematically Optimizing an NBA Player Guessing Game appeared first on The Spax.

The Next Great Portland Guard Has Arrived

Ahmed Cheema — Fri, 25 Feb 2022 04:47:00 +0000

Dan Hamilton – USA TODAY Sports

The history of the Portland Trail Blazers is not lacking in offensively skilled players at the guard position. From the sustained dominance of Clyde Drexler to the unrealized promise of Brandon Roy to the recent dynamic duo of Damian Lillard & CJ McCollum, the Blazers have been fortunate to consistently have guards that could be trusted to put up points reliably.

The Blazers picked up Anfernee Simons with the 24th overall pick of the 2018 NBA Draft, a reasonable position in the draft to take a gamble on a teenager with a ton of potential. Simons was drafted straight out of high school at just 18-years-old – naturally, he didn’t have expectations to be much of an instant contributor. Nonetheless, his talent was undeniable. While his slender frame and poor defense was criticized during draft season, these were traits that an 18-year-old could be excused for. Simons’ elite shooting along with his incredible speed, quickness and athleticism offered plenty reason for excitement.

As expected, Simons’ rookie season was not eventful. Up until the last game of the season, he had played in just 19 games for the Trail Blazers and averaged 2.0 points on 4.9 minutes per games. No one was expected a teenager to usurp a spot in the rotation from guards like Lillard, McCollum, Seth Curry, Nik Stauskas and Rodney Hood.

However, the Blazers gave us a tease of the future in their 2018-19 regular season finale.

Portland entered their final game of the regular season with a 52-29 record, slated to play the already eliminated Sacramento Kings. The Blazers had already clinched home-court advantage in the first round (the 3rd or 4th seed). Winning the game gave them a chance at the three seed, but when the team announced that Lillard and McCollum would sit due to “load management,” it appeared that they were content with entering the postseason with the four seed and a likely postseason matchup against the Utah Jazz. Or maybe they just wanted the rest for their stars and didn’t particularly care about the matchup. In any case, Anfernee Simons would start the first game of his career.

In a wild game in which they fielded just six players and trailed by as many as 28 points, the Blazers rallied back behind a stunning 37 points, 9 assists, and 6 rebounds from the 19-year-old Anfernee Simons. Simons played a full 48 minutes and shot 13-21 from the field and 7-11 from deep in his first career start to unexpectedly secure the three seed for the Blazers. In his breakout game, Simons was efficient from every spot on the floor and created shots for his teammates like a seasoned vet despite entering the draft with playmaking concerns. Oh, and the win ended up working out as far as seeding goes – the Blazers went as far as the Western Conference Finals.

Over the next two seasons, Simons played 134 games and maintained a consistent 8/1/2 average statline. He remained a distant guard in the rotation, but steadily showed progress – his 2020-21 season saw an uptick in efficiency as his 3P% jumped to 42.6%. The young guard was still just 21-years-old and continued to develop as a player. He also won the 2021 NBA Slam Dunk Contest – not particularly important but it’s a testament to his bonkers athletic capability.

The 2020-21 Trail Blazers entered the postseason with their main three stars healthy, Lillard, McCollum, and Nurkic. Also at their disposal were trade acquisitions Norman Powell and Robert Covington, both brought in to make an instant impact. Carmelo Anthony also came off of the bench as a key six man. Meanwhile, the Nuggets were missing both of their starting guards and thus entered the series with a backcourt of Facundo Campazzo and Austin Rivers. Nonetheless, the Nuggets won the series in six games despite a heroic effort from Lillard.

It was arguably the most embarrassing moment of the Lillard era in Portland, up there with the first-round sweep to the hands of the New Orleans Pelicans in 2018. The offseason was full of noise on Lillard potentially wanting out – if a fully healthy and tooled Portland squad lost to a hobbled Nuggets team despite Lillard averaging an efficient 34/10, clearly they weren’t even close to truly competing.

While Lillard has thus far stuck around, a lingering abdomen injury prevented him from maintaining his previous peak level of play and he has been sidelined indefinitely after receiving surgery. McCollum was traded to the New Orleans Pelicans in the middle of his eighth season with the Blazers. It was already clear at this point that the Blazers’ season was over, but the silver lining was that Anfernee Simons would finally have a chance to lead the team for longer than one meaningless game at the end of his rookie season.

Since Lillard’s last appearance on the court (December 31st), the 22-year-old Simons has played (and started) 24 games. He’s averaging 23.6 points and 6.0 assists per game while shooting 42.2% from three with a true shooting percentage of 61.4%. This stretch includes a blistering 43-point game in a close win over the Hawks, a clutch 29 in a nailbiter finish against LeBron’s Lakers, and a three game stretch of 30, 30, and 31 in wins over the Knicks, Bucks, and Grizzlies.

In this span, Simons also averaged more potential assists than playmakers like LaMelo Ball, LeBron James, Stephen Curry, and Josh Giddey. According to tracking data, Simons drove to the hoop 10.4 times per game in this stretch. Among other players with at least 10 drives per game, Simons’ had the seventh-best FG% behind DeMar DeRozan, Chris Paul, De’Aaron Fox, Jrue Holiday, Miles Bridges, and Dejounte Murray. Not bad company.

So, what’s so special about Simons? The numbers make it clear that his most impressive characteristic is his perimeter jump shooting. Over the past two seasons, Simons has quickly become one of the league’s premier spot-up shooters.

Among players with at least 50 catch-and-shoot 3PA, Simons’ boasts the third-highest C&S 3P% behind Joe Harris and Tony Snell. His efficiency on these shots is comparable to players like Zach LaVine and Seth Curry on similar volume. Oh, and he’s substantially older than anyone else labeled on the chart – LaVine is the second-youngest and is still four years older than Simons.

The impressive thing is that the bulk of Simons’ volume comes from games with Lillard and/or McCollum out and Simons expected to carry a heavy scoring load on a tanking roster as a 22-year-old. And he’s somehow pulling through!

We can also evaluate the change in Simons’ on-court impact with the use of an all-in-one lineup-based impact metric such as Estimated Plus-Minus, which is generally considered one of the better performing public metrics of its kind.

Since becoming a rotational player in his second season, Simons’ offensive EPM has steadily improved to +2.5 this season, which ranks in the 94th percentile of NBA players. His composite EPM has reached the positives for the first time in his career despite still being one of the worst defenders in the league statistically.

That’s obviously the elephant in the room – despite Simons becoming a truly elite offensive talent while being just 22-years-old, one can’t really ignore that he’s still a defensive liability. The bright side is that he doesn’t really lack the length to be a disruptive defender – he’s 6’4 with a solid 6’9 wingspan, he just has a slender frame at this point in time which isn’t surprising given his young age. If he can widen out and withstand more contact, there might be hope for Simons to reach average levels on defense – or even the level of a guy like Damian Lillard who’s below average but good enough offensively to make up for it (and then some). There isn’t really an excuse for him to continue being as bad defensively as he is now.

A big concern with Simons on the offensive end is his inability to get to the line. I also would hope that this improves as he bulks up – he isn’t able to seek out and absorb the contact around the rim necessary to consistently get to the line. NBA fans often forget that drawing fouls is a skill, one that some players are never able to master. Take CJ McCollum for example – he’s one of the more skilled one-on-one scorers in the league, but never being able to consistently get free throws at any point in his career severely limited his potential. One can only hope that Simons will be able to turn the tide on that aspect of his game.

In any case, his shooting ability genuinely seems generational and I could see him becoming a record-breaking shooter in his prime. The sky is the limit for his offensive game, and we can only hope that he’ll be able to patch up the holes in his game to unlock his potential as a player. While he clearly has some glaring weaknesses, he’s further along at 22-years-old than realistic projections expected. It’s hard not to be excited for his future.

The post The Next Great Portland Guard Has Arrived appeared first on The Spax.

PRYA: Modeling NFL Punt Returns for Player Evaluation

Ahmed Cheema — Sat, 05 Feb 2022 23:28:26 +0000

Kirby Lee – USA Today Sports

Introduction

Special teams and its influence on field position is the most underrated component of the football game. American football is a game of inches – every yard matters, and the ability to pin opponents deep, fight for extra yards on returns, and finish surefire tackles can make the difference in any given game.

For this project, we will be attempting to model punt return yards. We hope to develop an accurate model that can be used to evaluate punt returners, gunners, and teams as a whole and draw additional insights on the punt play as a whole. The model will be used to predict the punt return yards (PRY) of any given return. The prediction will be referred to as expected punt return yards (xPRY) and we’ll call the difference between the two values punt return yards added (PRYA).

Methodology

The data includes returned punts from 2018 through 2020 excluding penalties. Many different variables were tracked over a total of 1877 plays. The coordinates of each player relative to the returner, their speed, orientation, direction of motion, Euclidean distance from returner, etc.

I also experimented with Voronoi features, such as the area of the returner’s Voronoi region and the x-value of the leftmost vertex of the returner’s Voronoi region (after standardizing play direction). I also calculated the proportion of the returner’s Voronoi region’s perimeter that bordered a defender’s Voronoi region and the area of return unit players’ Voronoi regions that bordered the returner’s region. I tested indicators such as whether the line between a defender and the returner went through a blocker’s Voronoi region.

After feature selection, a parsimonious XGBoost model was trained using only variables representing how far each defender is from the returner, the x-value of the leftmost vertex of the returner’s Voronoi region, the area of the region, and the area of offensive players’ Voronoi regions that bordered the returner’s region.

I trained the model on data from the 2018 and 2019 season so that analysis could be done on the 2020 season.

Returner Evaluation

The most obvious application of xPRY is to evaluate individual punt returners based on their ability to get more yards than expected. We can predict the outcome of each returned punt in 2020 that didn’t have a flag thrown and compare the yardage to the actual output for each returner. The top ten returners in total PRYA are shown below.

First-team All-Pro returner Gunner Olszewski finished with the most punt return yards added in the 2020 NFL season while second-team selection Jakeem Grant came in third. Saints returner Deonte Harris had another strong season after a first-team All-Pro campaign in 2019.

Diontae Spencer had the most valuable return of the season in terms of PRYA, as he was expected to gain just 4.4 yards on an 83-yard punt return touchdown against the Panthers on December 13th, 2020.

Spencer was immediately met by two gunners on a play in which most returners probably would’ve called for a fair catch. Instead, he escaped the initial pressure and found an opening through the left side of the field, gaining 78.6 more yards than expected.

We can also rank the players who performed the worst relative to xPRY.

The average punt return for these players gained less yards than the model predicted. It’s interesting that many of these players continued to be the primary punt returner for their teams despite the poor performances. Perhaps the numbers don’t tell the whole picture. After all, Pharoh Cooper was an All-Pro as both a kick returner and a punt returner in 2017. Has he regressed so much? Here’s his worst punt return this season based on xPRY.

On this October 11th return against the Falcons, Cooper received the punt at the 15-yard line and was then predicted to gain 7.8 yards. Instead, Cooper lost three yards.

It may seem strange that Cooper is being dinged (-10.8 PRYA) for a return where he was met so quickly. It should be mentioned that Cooper’s expected gain on this play of 7.8 yards is below average. The average xPRY for 2020 punt returns is 8.9 yards, so the model wasn’t expecting Cooper to do anything crazy.

A possible reason for the prediction not being lower, though, is the apparently opening on Cooper’s left side – the gunner at the top of the field was not in position to stop Cooper from breaking away to that side. Instead, he hesitates and tries to break to the opposite sideline which may have worked if he was able to get past the first defender, but he was in strong position which forced Cooper to turn back around. By that point, it was too late.

A common theme from negative returns is hesitation or “trying to do too much.” For example, Christian Kirk has multiple plays where he could surge forward for a mediocre but respectable gain, but he tries to run laterally towards the sideline looking for the big play even when the opportunity is not there. These are the habits that NFL coaches may want to weed out as field position is too important to throw away yards.

Gunner Evaluation

The task of evaluating gunners is more complex than it was for returners. When we evaluated individual returners, the main metric of interest was PRYA. Similar to a running back’s rushing yards over expectation, the task is simple – get more yards than expected. In the case of punt coverage, there are arguably two important and separate skills: limiting a returner’s xPRY and then after the catch, limiting their return to less than their xPRY.

If a gunner is consistently forcing low xPRY numbers, they’re probably speeding down the field and limiting the returner’s space. That’s an important part of their role – tackling/slowing down the returner is another.

In the interest of brevity, I’ll simply rank the best gunner duos this season based on cumulative PRYA.

The best duo appears to be J.T. Gray and Justin Hardee for the Saints. Through 12 punts that were returned, the two gunners held the returnman to a low average of just 5.92 expected return yards. They put themselves in position to succeed and then capitalized, actually holding those 12 returners to just a two yard average. Overall, the unit is credited with saving 47 yards. This type of holistic evaluation based on multiple metrics allows us to quantify different types of skills that are all related to a gunner’s duties.

Case Study: xPRY-Driven Evaluation of Washington Football Team Gunners

Our statistical analysis of punt return tracking data can be a beneficial complement to the work of NFL coaches and scouts. In a brief case study of the Washington Football Team’s gunners in the 2020 season, we’ll study the production of their two most frequent gunner combinations: Troy Apke & Danny Johnson and Cam Sims & Danny Johnson.

Each respective gunner duo defended against eight punts that were returned in the 2020 season. The actual return yards allowed are plotted on the y-axis along with the expected return yards allowed based on the model.

It seems that the two gunner pairs gave up roughly the same amount of predicted yards. The difference comes in the end result of the play – not a single return covered by Sims/Johnson gained more yards than expected, while three such punts did for Apke/Johnson (points above the expectation curve).

The observed difference in average allowed PRYA is 4.57, so the Apke/Johnson duo allowed 4.57 more yards over expectation on average than the Sims/Johnson duo. The sample size is quite small, though. We can estimate the probability of observing this difference due to random chance through a two-sided permutation test. The permutation test gives a probability of ~2.38% of observing a difference greater than or equal to 4.57 or less than or equal to -4.57.

Of course, conclusions can’t be drawn on these numbers alone. Many variables are not accounted for, such as the skill of the returner, other teammates on the field, etc. These results could give the Washington Football Team coaching staff a reason to deeper analyze the performance of the two gunner combinations.

For instance, they could go on to study the sixteen plays in question. Which players were actually credited with the tackles? They would find that Apke was not credited with any tackles on the eight returned punts in which he played the gunner role with Danny Johnson. Meanwhile, Sims was credited with three tackles that held the returner to less than the yards they were expected to gain.

Of course, this initial analysis is just one piece of the puzzle. Watching the film of the sixteen returns would reveal that one of the ‘big plays’ given up by the Apke/Johnson duo occurred when Apke dove at the returner to slow them down enough for Deshazor Everett to clean-up the tackle. However, Everett stumbled in space and the returner gained an extra seven yards.

Also, PRYA only considers punts that are actually returned. It’s certainly possible that Apke’s 4.34 40-yard dash speed allows him to get down the field and force fair catches more often than Sims. We can quantify this in a vanilla way by crediting the ‘forced fair catch’ to the player on the kicking team closest to the returner at the time of the fair catch. We find that since 2018, Sims forced a fair catch on six punts out of 32 (0.188) versus Apke’s eight out of 73 (0.110). Furthermore, Sims’ average distance from the returner at the time of the fair catch is 3.8 yards versus 5.8 yards for Apke. While this topic can be looked into further, these are the type of numbers that can serve as a reference point for NFL coaches and scouts.

Team Evaluation

In addition to its value in player evaluation, xPRY can be used to evaluate the punt coverage and punt return ability of NFL teams.

We can calculate the average expected yards given up on punt returns by each team along with their actual yards allowed.

We found that the difference in predicted and actual punt return yards allowed by NFL teams is correlated with their punt DVOA, or Defense-adjusted Value Over Average (from Football Outsiders) where a higher DVOA represents a better punt unit. Teams who give up more yards than they were expected to also tend to have a worse punt DVOA. The opposite is also true – teams who limit returners to less yards than expected have a better punt DVOA on average.

The Chargers had the worst punt DVOA in the league by a wide margin (-37.8, ahead MIN at –13.7) and these numbers partially explain why that is. The Chargers gave up the second-most xPRY in the league on average (10.8) while also allowing the second-most average punt return yards (16.8). Not a good combination.

Approximately 32% of the variation in punt DVOA is explained by a team’s average PRYA allowed. This is particularly noteworthy because xPRY does not include the many punts that are never returned while DVOA obviously does. Thus, an xPRY approach to team evaluation can provide valuable insight into a specific component of a team’s special teams performance.

The same approach can be taken to evaluate a team’s punt returning ability.

We found that a team’s return yards over expectation are once again correlated with DVOA. In this case, average PRYA explained 66% of the variance in punt return DVOA. Once again, xPRY-based analysis is linked with DVOA, the gold standard of team evaluation metrics available to the public.

The post PRYA: Modeling NFL Punt Returns for Player Evaluation appeared first on The Spax.

Quantifying the In-Game Impact of Counter-Strike: Global Offensive Players

Ahmed Cheema — Fri, 04 Feb 2022 04:57:28 +0000

Introduction

Counter-Strike: Global Offensive (CS:GO) is a first-person shooter that was released in 2012 and has been played competitively as an e-sport ever since. In CS:GO, a single game is played between two teams of five players and consists of two halves, each containing a maximum of fifteen rounds. One team plays the first half as the terrorist (T) side while the other team plays as the counter terrorist (C) side. The teams switch sides at the end of the first half and the game stops if one team reaches 16 rounds before the other team wins 15 rounds (if the game ends 15-15, an overtime period begins which was not included in this analysis).

The professional CS:GO scene has grown dramatically over the past five years. The last major tournament finished on November 7, 2021 in Stockholm, Sweden and consisted of 24 teams competing for $2 million in prize money. The grand final was won by Ukrainian side Natus Vincere with a peak of 2.74 million international viewers. Despite its apparent popularity worldwide, statistical analysis in e-sports has lagged compared to traditional professional sports like football and basketball, where analytics are used to inform decisions and improve evaluation.

In this analysis, we will leverage public match data on CS:GO professional matches to train a new system of player evaluation based on regularized regression. We hope to expand on the traditional rating system based off of simple statistics such as a player’s eliminations, assists, and deaths. The objective of our new player evaluation framework is to isolate and quantify an individual’s contribution to winning.

Methods

Data Collection and Preparation

We scraped hltv.org, a website dedicated to CS:GO coverage and statistics, to obtain data for 19034 professional games since December 3rd, 2015. HLTV includes a star rating from zero to five for each game where no stars indicate that neither of the teams competing were ranked in the world top 20, while five stars indicate a map played between two top three teams. We limited our analysis to games with at least one star – including games without any top 20 teams would introduce data for over 50000 more matches, including many semi-professional teams in lower tier leagues. Most of these less successful teams & players would never actually compete against the best players and teams, thus limiting the model’s ability to estimate their value due to the lack of interactions.

The data set was split into two rows for each game, each representing one half of play. The score of each match was recorded and the percentage of rounds won by the T team was calculated to account for varying half lengths. For example, if the T side won three rounds in a half versus five for the C side, the response variable tWinPct would be .

The objective was to have a column representing each player who played in any of the 19034 games over the past six years. If a player played for the T side in the game represented by any row, the value for their respective column would be where represents the player rating for a T player in that half. If a player played for the C side, the value for that column would be for a C player . If a player did not play in that game, the value would be . A player’s rating was obtained from hltv.org and is included in the analysis to serve as a way to approximate how much credit one player should get for their team’s performance. The rating is calculated using statistics such as kills, assists, deaths, survival rating, damage rating, etc (Milovanovic, 2017).

The reciprocal of is taken for each player because higher ratings are better and we want higher model coefficients to correspond with better players. Greater input values would correlate with greater model coefficients, so the reciprocal is calculated. Recall that the response variable tWinPct measures the performance of the T side. Thus, the column value is for players because the negative sign indicates that a greater performance from would be expected to have a negative impact on the performance of the T side.

Ridge Regression

The data set consists of 2263 players. Each row will consist of 10 nonzero values (five positive, five negative) for these 2263 players because there are five players on each side. In addition, dummy variables based on the setting (or “map”) of the game are added to account for any bias. Certain maps are more favorable to the T side than the CT side, so this adjustment introduces eleven dummy variables for the eleven different maps played in the given time frame.

Thus, the input matrix consists of 2274 variables and 38068 observations while the output vector (or tWinPct) has 38068 values. We also have a weight vector representing the number of rounds played in each observation. Then we are looking to find the estimates for the vector where .

For example, suppose a game is played between a team with players and a team with players . Team begins the game on the T side and ends the first half up nine rounds to six for team . Team then plays as the T team in the second half, winning six rounds again while team wins the necessary seven rounds needed to win the game with an overall score of 16-12. The game was played on map in . Then would be equivalent to the matrix below where represents the player rating of the corresponding player.

All other player and map variables in would have a corresponding value of for these two rows. The weight vector would contain values and , representing the total number of rounds played in each half. Finally, the vector representing the response variable would have values and , representing the percentage of rounds won by the T side.

The traditional ordinary least squares (OLS) solution would be to compute the estimates for as . To handle collinearity in the data (as teammates will be playing with each other at the same time), we introduce a penalty term that reduces variance by shrinking the coefficients towards zero. Thus, the ridge regression solution is denoted as . The ridge parameter is a constant that represents the degree of regularization; when , the ridge solution is equivalent to the OLS solution. If , then all of the coefficient estimates would be zero.

We ran k-fold cross validation to find an optimal using the “one-standard-error” suggested by the authors of the glmnet package used for modeling (Friedman et al., 2010). The cross validation plot of mean squared error and is shown in Figure 1.

Using this value of , we were then able to use the ridge regression solution to compute the model estimates corresponding to each variable in . The effect of on the model coefficients can be seen in Figure 2, where a vertical line denotes our chosen ridge parameter. The reduction in variance and the shrinkage towards zero of the model coefficients as increases can be seen in this graph.

Results

The ridge regression model’s coefficients for map are shown in Table 1 along with the average T side win percentage on that map.

The coefficients for the remaining variables represent the estimated impact of each player in the model. The players with the ten highest coefficient estimates are shown in Table 2. We included their HLTV ratings and the corresponding percentile based on the rating as a way to compare the model results.

We explored the relationship between the coefficient estimates and the HLTV player ratings that were used as prior information in the model. Figure 3 shows a plot of a player’s average HLTV rating versus their coefficient estimate among players with at least 300 games played.

The correlation coefficient between rating and coefficient estimate was found to be 0.684 for these points, and the weighted correlation coefficient with games played as the weight for all points was found to be 0.658.

We also examined the relationship between the aforementioned HLTV rating and coefficient estimate with a player’s overall win rate in the time frame of interest. A weighted correlation matrix (weighted on games played) between these three variables is shown in Table 3.

Discussion

The map coefficient estimates in Table 1 are clearly related with the proportion of rounds won by the T side in the data set. If the T side has a win proportion of greater than 0.5 on any given map, the coefficient estimate is positive. Otherwise if the T side has a win proportion below 0.5, the corresponding estimate for that map is negative. The absolute value of the coefficient estimate depends on the difference between the proportion and 0.5, along with the proportion’s confidence interval. The 95% confidence interval was computed for the proportion of rounds won by the T side on each map (tWinPct) and the size of the interval varies based on how high the sample size is for each map. The map Tuscan has a massive 95% confidence interval for tWinPct from 0.335 to 0.541 because the map was only played in eight games (a total of 89 rounds), so it also has the coefficient estimate closest to zero.

The remaining model coefficient estimates represent the estimated impact of each player. As shown in Table 2, we’ve found that seven of the players with a top ten coefficient estimate also had an HLTV rating in at least the 95th percentile. The players with the five highest HLTV ratings (minimum 100 maps played) are all in the top six for coefficient estimates: s1mple, ZywOo, sh1ro, device, and NiKo. Oleksandr “s1mple” Kostyliev, Nicolai “device” Reedtz, and Nikola “NiKo” Kovač are all regarded as some of the greatest CS:GO players of all-time, while Mathieu “ZywOo” Herbaut and Dmitriy “sh1ro” Sokolov are young stars who are considered top five players in the world today. These claims seem to be supported by the players’ statistical dominance.

The weighted correlation matrix of player rating, win rate, and coefficient estimate (Table 3) suggests that the model estimates blend together both contribution to winning (0.609) and individual performance (0.658) in a way that neither metric can do on their own. By being able to combine both intertwined aspects of the game, teams can identify players that put up “empty stats” (racking up a high number of eliminations, but not contributing as much to their team’s winning chances) or players that quietly impact the game more than their basic statistics suggest.

Regularization has previously been used in a similar way for player evaluation in the National Basketball Association (Sill, 2010) and the same methodology appears to be applicable to an e-sports context. Future research should begin to explore the predictive capabilities of a regularization framework and continue to expand the use of advanced statistical methods in e-sports. Our regularization methodology can also be altered to better handle players with fluctuating skill levels throughout the time frame covered in the data set. For example, a player who peaked at a high skill level and then regressed later on would be treated as a single variable despite having varying impact throughout the data set. Also, the map dummy variables do not account for updates to each map that have taken place over the past years – these updates can affect the impact each map has on the terrorist team’s win rate. Despite the room for improvement, we believe that our research is a strong starting point for more advanced methods of player evaluation in e-sports.

References

Friedman, J. H., T. Hastie, and R. Tibshirani. “Regularization Paths for Generalized Linear Models via Coordinate Descent”. Journal of Statistical Software, vol. 33, no. 1, Feb. 2010, pp. 1-22, doi:10.18637/jss.v033.i01.
Milovanovic, P. (2017, June 14). Introducing rating 2.0. HLTV.org. Retrieved December 6, 2021, from https://www.hltv.org/news/20695/introducing-rating-20.
Sill, Joseph. “Improved NBA adjusted +/- using regularization and out-of-sample testing.” Proceedings of the 2010 MIT sloan sports analytics conference. 2010.

The post Quantifying the In-Game Impact of Counter-Strike: Global Offensive Players appeared first on The Spax.