Analyzing NBA Lineups With Player Classifications, Part II

In Part I, I fit a ridge regression model with the different player clusters on the floor at a given time with that lineup’s net rating as the label. The model gave us the following coefficient estimates which illustrates the magnitude and direction of each player cluster’s impact on the net rating of a lineup.

These coefficients demonstrate the great value of a ball-dominant scorer and the lack of value in a traditional center. However, we can’t simply use these coefficients to find an optimal lineup. If we did, we’d simply draw the conclusion that the best possible lineup would be one with five ball-dominant scorers, which probably isn’t accurate.

Instead, I’ll use an implementation of gradient boosted decision trees known as XGBoost to predict the net rating for every possible combination of player classifications within a five-man lineup.

First of all, we need to determine the optimal hyperparameters for the function XGBRegressor. Each hyperparameter has a default setting that is pretty reasonable, but because they can have a significant impact on the results of the regression, the selection of these hyperparameters is very important and should be done manually according to our specific task instead of just going with the default values. I went through a basic cross-validation process for each hyperparameter to determine an optimal value. For example, the following code creates a graph that illustrates the change in model score as the max_depth of the model changes.

from xgboost.sklearn import XGBRegressor

msk = np.random.rand(len(tf)) < 0.8
train = tf[msk].reset_index(drop=True)
test = tf[~msk].reset_index(drop=True)

row_list = []

for n in range(2,11):
	model = XGBRegressor(max_depth=n,objective='reg:squarederror')
	model.fit(train[features].values,train.nrtg.values)
	score = model.score(test[features].values,test.nrtg.values)
	dict1 = {'parameter':n,'score':score}
	row_list.append(dict1)
pdf = pd.DataFrame(row_list)

plt.plot(pdf.parameter,pdf.score)
plt.scatter(pdf.parameter,pdf.score)
plt.xlabel('max_depth')
plt.ylabel('score')
plt.title('hyperparameter tuning (max_depth)')
plt.tight_layout()
plt.show()

The model score is maximized when max_depth=3. I can repeat this process multiple times to make sure that the result stays consistent before moving on to the next hyperparameter. It’s tedious work, so I won’t include all of the code for it.

Also, I refer to a data frame called tf in the previous code. In case you forgot from Part I, this image shows a snippet of tf.

The first eight columns are stored in a variable called features, and they represent the number of players in a lineup belonging to a certain player classification. For example, two players in the second five-man unit belong to the eighth cluster. The nrtg column indicates each lineup’s Bayesian net rating, an approach I borrowed from this article.

Anyway, after determining some optimal values for the hyperparameters, the model could now be fit with our data from tf.

model = XGBRegressor(learning_rate=0.1,
                     n_estimators=100,
                     max_depth=3,
                     gamma=0,
                     objective='reg:squarederror')

model.fit(tf[features].values,tf.nrtg.values)

With our model now trained on all our data, we can look at the feature importance for each player cluster/classification:

fimp = []
for n in features:
    fimp.append(model.get_booster().get_score(importance_type='gain')[n]) 
data = pd.DataFrame(data=fimp, index=clusters, columns=["score"]).sort_values(by = "score")
data.plot(kind='barh',edgecolor='black',legend=None)
plt.title('feature importance')
plt.xlim(0,400)
plt.tight_layout()
plt.show()

I set the importance_type to “gain” to determine the relative contribution/importance of each feature in generating the prediction. The clear most important feature is the ball-dominant scorer cluster. In other words, it appears that the player classification type with the largest impact on determining a lineup’s net rating is the ball-dominant scorer. It stands to reason that one of the more important factors in predicting net rating is whether or not a team has a ball-dominant scorer on the floor at any given time (and the amount of ball-dominant scorers). Most of the other features display similar importance except for sharpshooters and high-usage big men, who are significantly less important in predicting net rating than the other six clusters.

Our next task is to actually use the model to make predictions. Specifically, we’ll generate every possible lineup combination and calculate the expected net rating of each one based upon our model.

import itertools as it

lineups = [i for i in it.product(range(0,6),repeat=8) if sum(i)==5]

df = pd.DataFrame(data=lineups,columns=features)

df.shape

(792, 8)

With eight different player classifications and a maximum of five players on the floor at once, there are 792 possible unique lineup combinations. Let’s predict the net rating for all of them.

df['nrtg'] = model.predict(df[features])

Now we can dive into our findings.

Results

Firstly, I’m going to plot a histogram of the net rating predictions to see if the distribution seems skewed in any way.

plt.hist(df.nrtg,edgecolor='black',bins=np.linspace(-15,15,31))
plt.xlim(-13,14)
plt.xlabel('predicted lineup net rating')
plt.ylabel('frequency')
plt.title('frequency of lineup net rating predictions ')
plt.tight_layout()
plt.show()

There appears to be more lineups with a predicted net rating slightly below zero than there is over zero, but that’s it. The distribution seems mostly unskewed.

Now, I can sort df based on the nrtg column to observe those lineups with extreme predictions. I have exported the df column at this point. You can find it at this link. The cluster columns are ordered alphabetically, so c1 refers to the ball-dominant scorers cluster, while c8 refers to the versatile forwards cluster. In other words, it directly follows the variable clusters that I created and sorted near the beginning of Part I. The first value of clusters is equal to c1, the second value is equal to c2, etc.

Anyway, one thing is absolutely clear from looking at the results: the importance of ball-dominant scorers. The 17 lineups with the highest predicted net ratings all contain multiple ball-dominant scorers. The 24th highest rated lineup is the best one to not include a single ball-dominant scorer. It contains two floor generals, one high-usage big men, and two sharp shooters. That lineup is given a predicted net rating of 8.10 — not too shabby.

The best lineup, with a predicted net rating of 13.49, is one with three ball-dominant scorers, one sharpshooter, and one versatile forward. In fact, the top seven lineups all have at least three ball-dominant scorers and at least one versatile forward. That’s all you need, apparently.

The average predicted net rating for a lineup with at least one traditional center is -1.13. That’s by far the worst for any cluster. It’s not impossible to have a good lineup with multiple traditional centers, though. A five-man unit with one ball-dominant scorer, two low-usage role players, and two traditional centers has a predicted net rating of 7.81.

What’s the worst lineup according to our model? Don’t throw one high-usage big man, three sharpshooters, and a stretch forward / big man on the floor together. Those group of players would be predicted to put up a net rating of -12.5. Yikes. The six worst lineups all share one clear thing in common: they have multiple sharpshooters but no ball-dominant scorers or floor generals. The problem appears to be a lack of creation. Sharpshooters aren’t good playmakers nor shot creators. They need someone to facilitate to them.

The worst lineup without any sharpshooters has a predicted net rating of -9.83. It contains two high-usage big men, one stretch forward / big man, and two traditional centers. That’s basically a lineup with big men. No wonder. There’s a reason there’s no “big ball revolution.”

The worst lineup with a ball-dominant score is one with one ball-dominant scorer, one high-usage big man, and three sharpshooters. Honestly, this seems like a pretty good unit to me, and I’m not entirely sure why the model predicts a net rating of -10.03 for it. Meanwhile, all 120 lineup combinations with multiple ball-dominant scorers are predicted to have positive net ratings. The worst of them is a five-man unit consisting of two ball-dominant scorers, one high-usage big man, and two stretch forwards / big men. Too many tall players, but the predicted net rating is still 0.32.

Why don’t we visualize how the average lineup rating changes as we change the number of players with a certain classification within that lineup?

df_list = []
for n in features:
    df_list.append(df.groupby(n).mean()[['nrtg']])
onoff = pd.concat(df_list, axis=1)
onoff.columns = ['c1','c2','c3','c4','c5','c6','c7','c8']
for n,i in zip(features,range(0,8)):
    plt.plot(onoff.index,onoff[n],label=clusters[i],linewidth=3)
    plt.scatter(onoff.index,onoff[n])
plt.xlim(0,5)
plt.axhline(y=0,c='black',linewidth=1.5)
plt.legend(bbox_to_anchor=(1, 0.75),loc='best',ncol=1)
plt.xlabel('frequency of player classification')
plt.ylabel('avg lineup net rating')
plt.title('lineup net rating vs player freq')
plt.tight_layout()
plt.show()

The more ball-dominant scorers, the better. Well, until you reach three. Then there isn’t any more improvement to be had — but the net rating doesn’t go down either. Meanwhile, the lineup net rating for sharpshooters and floor generals peaks at two and then drastically drops off.

For most of the other clusters, the best value appears to be zero with a steady decrease after that. Just take a look at the pink line at the bottom. Lineups have a positive average net rating without any traditional centers. For every one you add after that, you dip further and further below zero.

Low-usage role players are a bit weird. They’re really the mystery of this whole project. Why are they so valuable? All this graph does is create more questions. Having two low-usage role players is great, as with having four low-usage role players. But if you go in the middle and have three, the average net rating is almost zero? That doesn’t really make sense.

That’s just one of the results that bring to question the validity of the model. A lineup with five low-usage role players has a predicted net rating of 2.25? How’s that? If they’re all low-usage, where is the actual, y’know, usage coming from? Someone has to handle the ball, after all!

Some of these issues may have arisen during extrapolation. Remember how we found 792 possible lineup combinations? Our original data only contains 275 of those possibilities, meaning over 65% of these predictions are on lineups that the model has never seen before. There’s only one record of a lineup with four low-usage role players, and that five-man unit put up a net rating of 10.24. That’s probably gonna have an impact on the results.

So, this model probably isn’t incredibly useful for those extreme situations. It doesn’t have any data on lineups with more than two traditional centers. How is it supposed to be accurately predict what’s gonna happen when you throw in five of them? But if you focus on the more plausible situations where there’s more data for, I think the model does hold some weight.

Before I wrap up this short series, let’s address the elephant in the room. You can’t discuss NBA lineups over the past seven seasons without mentioning arguably the scariest lineup in league history: Stephen Curry, Klay Thompson, Andre Iguodala, Kevin Durant, and Draymond Green. Curry and Durant are both ball-dominant scorers, Thompson is a sharpshooter, Iguodala is a low-usage role player, while Draymond’s classification has changed throughout the years between stretch forward/big man, low-usage role player, and versatile forward. Depending on how you classify Draymond, our model predicts a lineup with those player classifications to record a net rating between 5.94 and 9.19. Not bad.

Including this article, my last five posts have been I related to the clustering and/or classification of NBA players. It’s been a good run and I think it was useful, but this article will probably be the last one on the subject (at least in the near future). The code used in this article can be found here, and most of the code used in all five articles can be found in this GitHub repository.

Results

Leave a Reply Cancel reply