# Mathematically Optimizing an NBA Player Guessing Game

### Background

If you’ve been paying attention to social media at all recently, you’ve probably heard about Wordle. It’s a popular game where users are given six attempts at guessing that day’s mystery word. You guess a five-letter word and are given clues that help you reach the answer for that day. It has become a daily exercise for many users around the world and thanks to its creative design (you’ve probably seen those colorful diagrams consisting of green, yellow, and black squares), it has become a ubiquitous staple of life in many social circles.

Unsurprisingly, the success of Wordle has sparked the creation of variants. These parodies include copies of the game ported to different languages or more difficult versions of it, such as Quordle where users essentially have to solve four Wordles at the same time. There are also many variants of Wordle that are guessing games for a different topic, such as Worldle where users try to guess a country or Nerdle, a math spinoff.

As an avid NBA fan, one Wordle variant that caught my eye was Poeltl. Named after San Antonio Spurs center Jakob Poeltl, the game gives you eight attempts at guessing a mystery NBA player. In each guess, you input an active NBA player and you are then giving clues based on some characteristics.

For example, suppose my first guess for Poeltl #6 (03/02/2022) was CJ McCollum. Here’s the output:

McCollum’s team is not highlighted in green or yellow. If it was green, it would mean that the mystery player is currently on that team. If it was yellow, it would mean that the mystery player was previously on that team. As it is not highlighted at all, we know that the mystery player never played for the New Orleans Pelicans.

However, the conference is highlighted in green. As such, we know that the mystery player is currently on one of the other fourteen teams in the Western Conference.

We also know the player is not in the Southwest Division. That leaves the Pacific and Northwest divisions – only ten possible teams!

We are told that the player is a guard as well. If the color of the position was yellow, it would mean that the player could be a G-F or F-G (both a guard and forward). As that is not the case, we know that the mystery player is either solely a point guard or solely a shooting guard.

We also know that the player’s height is not 6’3. The arrow pointing up indicates that the true height is greater than 6’3, but the yellow color tells us that our guess was within two inches. Thus, the mystery player can only be 6’4 or 6’5.

The same mechanic can be reversed to help us with age. We know the mystery player is less than 30-years-old, and because McCollum’s age is not yellow, it means that our guess was not within two years of the mystery player’s true age. Thus, the player we’re looking for is at most 27-years-old.

Finally, jersey number. We got lucky! The mystery player shares the same number on their jersey as CJ McCollum. Otherwise, we would have used the same logic as we did with height and age to narrow our options down (use arrows & the presence or lack thereof of the yellow highlight).

That’s a lot of information to process. We got a lot of good information out of it, though – there can’t be that many 6’4 or 6’5 guards with the jersey number #3 on one of ten teams in the NBA who are also at most 27-years-old. In fact, only three players meet this criteria. The hard part, of course, is identifying who they are. In this case, the answer was Jordan Poole, 6’4 2–year-old shooting guard for the Golden State Warriors (Pacific Division).

### Goal

After my first time playing this game, my first thought was that it was pretty fun and well-designed. My second thought was, “I wonder what the most optimal first guess is.” It was a popular question among fans of Wordle (and controversial). Some people liked to go with a word stacked with vowels like “adieu” while some analyses supported words like “salet” or “crate.”

In the case of Poeltl, I wondered what characteristics would make a good starting guess. Maybe you want a journeyman guy who has been on a lot of teams. Perhaps you’d like someone with average height and age so the arrows are more meaningful. How about a 6’6 journeyman 27-year-old with a jersey number in the 20s? That seems optimal. Does that exist? If they do, is that even an optimal guess? How much does it really matter?

I was bored and wanted to tackle these questions, so I decided to whip up Python and algorithmically determine the best starting guess in Poeltl.

### Methodology

The first step was data collection – I needed all of the information that is a part of the game. That means obtaining the height, age, position, team, and former teams of every active NBA player. That could be complicated for something like position which isn’t really objective. Fortunately, experimentation made it obvious that the game got all of its data from nba.com/stats, so I did the same.

I put together a data frame of 496 players currently on an NBA roster. As far as I can tell, these players are the only ones you are allowed to guess in the game. Go ahead and try guessing Matt Ryan (no, not the quarterback) and their jersey number will come up as N/A. Why? They don’t have a jersey number listed on the Celtics’ team page. I was fairly confident that my data aligned perfectly with the data Poeltl uses.

I coded a function that takes two inputs: a guess and an answer. It outputs a dictionary representing the information that a user would hypothetically learn if they inputted that guess into Poeltl when the mystery player is that answer. Then once that dictionary is inputted into another function, the players that meet that criteria are outputted. Here’s an example using our previous CJ McCollum guess:

`PossibleAnswers(GuessInformation('CJ McCollum','Jordan Poole')).player`
`['Jordan Poole', 'Terence Davis', 'Trent Forrest']`

We already know the answer is Jordan Poole, so `GuessInformation('CJ McCollum','Jordan Poole')` will output a dictionary representing the information we learned from that first guess. If you wanted to use this to cheat prior to knowing the answer you could always input that manually, but that’s not the point here. Based on the information we received, `PossibleAnswers` reveals that there were only three possibile answers, one of which was obviously the true mystery player.

But suppose the mystery player was, I don’t know, Matt Ryan (again, not the quarterback). How many possibilities would there have been after our first guess?

`len(PossibleAnswers(GuessInformation('CJ McCollum','Matt Ryan')).player)`
`126`

Okay, wow. Not nearly as useful. The only information you really receive is that the mystery player is in the East, is not a guard, is taller than 6’5, and is younger than 28-years-old. That leaves a lot of possibilities.

As a matter of fact, based on possibilities leftover afterwards, Matt Ryan is the worst-case-scenario mystery player for a first guess of CJ McCollum.

For 50.4% of possible mystery players, a first guess of CJ McCollum would leave six or less possibilities based on the information given by the game. But there is that 13.9% of possible mystery players for which the McCollum guess would leave over 100 possibilities for. Overall, the mean number of possibilities left by a first guess of CJ McCollum is 22.3 and the average reduction in number of possibilities is 95.5%.

One issue – this assumes that all 496 active NBA players are possible answers to the game. Common sense tells us that a well-designed game would not allow this to be a possibility. Did you even know that a player named Matt Ryan was in the NBA? I sure didn’t until now.

According to an interview with the game’s creator Gabe Danon, there are “only 300 or so mystery players in Danon’s pool of candidates.” Hm. How do we tackle that?

I took a slightly unscientific approach and went through all 496 players and decided whether or not I thought they were relevant enough to be an answer. There were more objective methods I could’ve taken, but I’m confident enough in my NBA fandom to think that this process would be reasonable enough. I generally took out players I never heard of (while using games played and minutes per game as a reference to make sure I didn’t take out actual key players) or players who I knew of but didn’t think would be put into the game.

I think I was fairly generous. I ended up with 353 players. Maybe a little more than “only 300 or so” (Or would you use that wording to describe the number 353? Probably not, right?) but I think it’s preferrable to removing too many players. In any case, I do not think the results of this analysis would change dramatically.

I proceeded to track the mean number of possibilities left by a first guess of a player, the maximum number of possibilities left, and the average percent reduction in number of possibilities for all 496 possible guesses. I did this under the false assumption that any of the 496 players could be the answer, and then once more under the more reasonable assumption that the answer could only be in a self-filtered list of 353 players. Onto the results!

### Results

The table below contains the results of our analysis for the scenario in which we assume that all 496 active NBA players could be the mystery player on a given day.

The first guess that provides information eliminating the most possibilities on average is Lindy Waters III, who is currently signed by the Oklahoma City Thunder on a two-way contract. The most options that you could be left with after guessing Waters first is just 15 out of 496.

LaMar Stevens is similar – 6’6, 24-years-old, and wears #8. Or Derrick Jones who is a 25-year-old 6’6 forward wearing #5 on their jersey.

What makes him such a great guess? Well, the median NBA jersey number is 13 and Waters rocks the number 12. The median age of active NBA players is 25, Waters is 24-years-old. Waters also happens to have the same height as the average NBA player at 6’6. By effectively splitting the playerbase in half with each of these three numeric characteristics, you’re bound to cut down on the possible answers with a first guess of Lindy Waters III.

How about the worst guesses? The worst guess by far is Boban Marjanovic, the 33-year-old 7’4 center for the Dallas Mavericks who wears a relatively high jersey number of 51. The percentage of players with a higher height, age, and jersey number than Boban are 5.4%, 0.0%, and 3.6% respectively. Yeah, you’re getting very little information there. In general, the worst guesses appear to be big men, specifically ones that are especially young or old.

Next, we apply our analysis to the filtered subset of players that were deemed to be realistic answers.

There is not much substantial change here – in fact, the correlation coefficient between the mean percent decrease for the two analyses is a whopping 0.997. The physical characteristics of the players removed from the data set are not significantly different from those that were not removed, so there is no reason to think that the results change all that much.

With that said, Lindy Waters III is no longer the most optimal guess. That title now belongs to Ish Wainright. Wainright is a 6’5 27-year-old forward who wears #12 for the Phoenix Suns. The average number of valid possibilities after a first guess of Ish Wainright is just three, while the maximum is 12. Wainright is three years older than Waters, which makes sense – the more ‘relevant’ players in the NBA (who are more likely to be a possible mystery player) are probably going to be older than the young guys who are relatively unknown. Thus, the most optimal guess on the filtered data set should be a bit older.

If you want an optimal guess that has the best shot at also being a potential answer, Wainright is probably not your best bet. Cody Martin might be a good shot, as he averages 26.8 minutes per game for the Charlotte Hornets and is essentially just as good of a first guess as Wainright. Other options include Taurean Prince, Jae’Sean Tate, and Derrick Jones Jr. Jaylen Brown and Zach LaVine are similarly fantastic picks if you want All-Star caliber guys.

### Looking Past the First Guess

If you want to get the most theoretical information out of your first guess, Ish Wainright is your man. But what about the second guess? The third? Should we be considering how the game might play out after your first pick?

Recall that the function `GuessInformation(guess,answer`) outputs a dictionary representing the information learned from a guess. We can set `guess="Ish Wainright"` and iterate all 353 candidate mystery players as `answer` to get the 353 possible information dictionaries for Ish Wainright. For each one, we can then cut our data set down to the players that are still possible answers based on the information given, and we can then repeat the previous process to find the best second guess in each situation and track how many possible valid answers there still are after the second guess. If that number is equal to 1, it means our algorithm can get to the right answer within three guesses.

Using Ish Wainright as our first guess, the mean number of guesses needed to get to the mystery player is approximately 2.425 and the maximum is three. Using this strategy, it’ll never take more than three guesses to win Poeltl assuming that our list of 353 players includes all of the “300 or so” actual possible answers. Doing the same with Cody Martin results in a average of 2.419 guesses and a maximum of four. We could go through and do this for every player, but that would be computationally time-consuming so I’ll leave it at the top ten first guesses in the previous table:

There isn’t really much of a noticeable difference in any of these, similar to the best Wordle words. Given that its maximum attempts to guess the mystery player is just three while also eliminating the most possibilities on the first guess, I’d stick with Ish Wainright as the best first guess if you don’t care about the hole-in-one chance. Otherwise, Cody Martin might be the better choice.

### Conclusion

I used Lindy Waters III as my first guess on the last two Poeltls (#5 and #6). Both times I was able to guess the mystery player on my second try because the information from the first guess actually left just one possibility (which is true about one-third of the time for the most optimal guesses).

However, I’m sure if I continued with this strategy I’d have some moments where I was unable to identify the right player even if I was given all of the information theoretically needed to do so. It’s worth mentioning that players like Wainright appear to be the strongest guesses for an entity that knows the jersey number, age, position, team, height, etc of every active NBA player. It’s likely (and true in my case) that a user won’t know every player’s jersey number, for example, in which case the value of a first guess like Ish Wainright would be diminished greatly.

When the discussion of optimizing Wordle occurs, it’s usually met with blowback by fans who think it ruins the fun of the game and that people shouldn’t be using the same first guess everyday. While I use my favorite starting word of “raise” everyday in Wordle, I do understand that perspective and in the case of Poeltl, I don’t think using Ish Wainright as a first guess is particularly enjoyable and I definitely won’t be doing that moving forward.

This exercise was simply an example of using simple coding and statistics to answer a random question that I had, and I now consider it as being successful.

As an aside, this research does makes me think that the game of Poeltl could maybe be updated to increase the difficulty of the game. Perhaps the introduction of a ‘hard mode’ would spice things up. The mechanic of telling you whether or not a player’s age or height is within two of your guess makes life far easier for the user and for avid fans, I don’t really see how you could miss the right answer in eight attempts unless it’s a lesser known player. I focused on guys like Lindy Waters III and Ish Wainright in this article, but essentially any first guess is going to severely limit the number of possibilities.

The average number of valid possibilities left after the first guess is just 10.39 – that’s a 97% drop. Then again, this is still under the assumption that you would be able to identify those possibilities based on information as mundane as jersey number. Take with that what you will! In any case, Poeltl is a fun game and I would definitely recommend it to any NBA fan.