Evaluating Individual Defense in the NBA With Python

In August, I released an article introducing a new way to evaluate defensive performance in the NBA. I thought I’d make a short article this year to show exactly how I use Python to calculate the metric.

The metric, which I’ll call Defensive Points Saved (DPS), quantifies the impact a defender has based on individual defensive matchup data. This data tells us how many possessions a player guarded another player and the outcomes of these possessions. Based on a player’s season averages, we can estimate the performance they’d be expected to have in a particular matchup and then compare that to their actual performance to see how well the defender fared. That’s the premise of this metric.

I start off by importing the required modules and defining the headers that are used to be able to scrape the official NBA Stats website.

import pandas as pd
import numpy as np
import requests
import json

headers = {'Host': 'stats.nba.com','User-Agent': 'Firefox/55.0','Accept': 'application/json, text/plain, */*','Accept-Language': 'en-US,en;q=0.5','Accept-Encoding': 'gzip, deflate','Referer': 'https://stats.nba.com/','x-nba-stats-origin': 'stats','x-nba-stats-token': 'true','DNT': '1',}

In order to determine the offensive player’s expected performance in any given matchup based on the number of possessions, we’ll need a data frame with the 2019-20 offensive stats per possession for every player. Using the method described in this article, one can find the link that contains the data for any data table on the official NBA Stats website. This can then be converted to a data frame which we’ll call ‘db.’

json = requests.get(url, headers=headers).json()

data = json['resultSets'][0]['rowSet']
columns = json['resultSets'][0]['headers']

db = pd.DataFrame.from_records(data, columns=columns)

Unfortunately, ‘db’ has 65 columns. That’s far more than needed. I’ll cut it down, introduce a few new columns for 2P%, 3P%, and FT%, and then rename all of the columns.

db = db[['PLAYER_ID','PLAYER_NAME','FGM','FGA','FG3M','FG3A','FTM','FTA','TOV','BLKA']]

db['two_pct'] = (db.FGM - db.FG3M) / (db.FGA - db.FG3A)
db['three_pct'] = db.FG3M / db.FG3A
db['ft_pct'] = db.FTM / db.FTA

db = db[['PLAYER_ID','PLAYER_NAME','two_pct','three_pct','ft_pct','TOV','FTA','BLKA']]

db.columns = ['off_player_id','off_player_name','two_pct','three_pct','ft_pct','tov_p','fta_p','blka_p'] # _p represents a per possession stat

Now, we have the information needed to process the matchup data. We just need to scrape the actual matchup data. Unfortunately, we have to go to every individual player page to find the matchups for that player, so a for-loop will be necessary. I’ll just show how the process works for a single player (LeBron James!). Let’s start off by taking the link, converting it to a data frame, and cleaning it up by removing unneeded columns.

x = 2544 # lebron's player id

url = 'https://stats.nba.com/stats/leagueseasonmatchups?DateFrom=&DateTo=&DefPlayerID=' + str(x) + '&LeagueID=00&Outcome=&PORound=0&PerMode=Totals&Season=2019-20&SeasonType=Regular+Season'

json = requests.get(url, headers=headers).json()

data = json['resultSets'][0]['rowSet']
columns = json['resultSets'][0]['headers']

df = pd.DataFrame.from_records(data, columns=columns)

df = df[['OFF_PLAYER_ID','MATCHUP_MIN','PARTIAL_POSS','PLAYER_PTS','TEAM_PTS','MATCHUP_TOV','MATCHUP_BLK','MATCHUP_FGA','MATCHUP_FG3A']]

I’ll rename the columns to meet my preference before I create a new column for two-point attempts (which is obviously equal to total attempts minus three-point attempts).

df.columns = ['off_player_id','matchup_min','poss','player_pts','team_pts','tov','blk','fga','fg3a']

df['fg2a'] = df.fga - df.fg3a

Alright, so there are two data frames now. All of the data for matchups in the 2019-20 NBA season where LeBron James is the defender is in the ‘df’ data frame. Meanwhile, a database with per possession offensive statistics for every play is in the ‘db’ data frame. Let’s merge the two.

df = pd.merge(df,db,on='off_player_id')

Here’s what a sample of the merged data frame looks like at this point:

Take a look at the first row. It tells us that LeBron James guarded DeMar DeRozan for 46.28 partial possessions.¹ On these plays, DeRozan scored 10 points on 12 shots, committed 1 turnover, and was blocked by James once. Meanwhile, the right side of the data frame (the last six columns) gives us DeMar’s season stats — his 2P%, 3P%, FT%, how many turnovers he commits per possession, how many free throw attempts he gets per possession, and how many times he’s blocked per possession. We can use this to determine his expected performance. For example, he took 12 two-pointers and his 2P% is 54.1%, so he should hit 6.49 of his shots on average for about 13 points. Add another three points for free throws and take away two for turnovers / blocks, and you’re at 14 expected points. LeBron held him to 8 according to the DPS methodology (player_pts – (tov + blk)), so that’s a good defensive performance. Here’s the code for it:

df['x_fta'] = df.fta_p * df.poss # x_ = expected_
df['x_tov'] = df.tov_p * df.poss
df['x_blk'] = df.blka_p * df.poss
df['x_fg2m'] = df.fg2a * df.two_pct
df['x_fg3m'] = df.fg3a * df.three_pct
df['x_ftm'] = df.x_fta * df.ft_pct

df['x_pts'] = ((df.x_fg2m * 2) + (df.x_fg3m * 3) + (df.x_ftm * 1)) - df.x_blk - df.x_tov

df['val'] = df.x_pts - (df.player_pts - df.blk - df.tov)

Now we have a column called ‘val’ which gives a rating evaluating LeBron’s performance in every matchup he was involved in as a defender. We can use this column along with the column with partial possessions to arrive at a single value — the number of points LeBron saves per 100 possessions:

dps = (df.val.sum() / df.poss.sum()) * 100

print(dps)

4.857884011059122

Not bad! Note that the scale is a bit off since the previous article because the league has adjusted the way they come up with matchup data. I’ll recalculate past DPS eventually. 4.86 would lead the league by a wide margin last year. It’s just the 20th highest among 160 eligible players this season, but that’s still good. Anyway, after putting the above code into a for-loop, I created a data frame called ‘final’ with the DPS rating for every NBA player. I filtered this data frame to include players with at least 500 possessions defended, sort it by DPS, and reset the index:

final = final[final.poss>500].sort_values(by='dps',ascending=False).reset_index(drop=True)

Let’s look at the top ten.

final.head(10)

player                      poss    dps
0              Rudy Gobert  723.63  13.550069
1            Anthony Davis  748.29  12.665262
2             Jimmy Butler  581.02   9.615515
3          Wesley Matthews  570.79   8.970503
4              Bam Adebayo  695.24   8.890664
5            Kawhi Leonard  555.56   8.554308
6         Montrezl Harrell  734.76   7.994497
7    Giannis Antetokounmpo  766.18   7.440318
8           Jonathan Isaac  580.30   7.328540
9       Karl-Anthony Towns  700.37   7.306286

These results certainly match the eye-test despite a small sample size of just over 15 games. Gobert and Davis are the two clear early front runners for Defensive Player of the Year, and Kawhi Leonard, Jimmy Butler, Giannis Antetokounmpo, and Karl-Anthony Towns have all been recognized for playing superb defense this season. Players like Wesley Matthews, Bam Adebayo, and Jonathan Isaac are also known to be stalwarts on that end of the floor. Matthews particularly has been superb for Milwaukee this year — he’s allowing the lowest FG% in the league so far. The only thing that surprised me was Montrezl Harrell’s spot on the list. He actually has a reputation as a mediocre defender. He did get a below average rating (-1.47) in my post in August, so maybe a larger sample size will sort things out. All things considered, this is a very reasonable top ten.

And now for the bottom ten!

final.tail(10)

player                      poss    dps
...
150            Buddy Hield  682.06  -4.601858
151       Maurice Harkless  559.90  -4.626677
152        DeAndre' Bembry  598.43  -5.028635
153        Nemanja Bjelica  540.16  -5.227452
154             Coby White  630.99  -5.306599
155          Rui Hachimura  554.20  -5.660784
156          DeMar DeRozan  826.29  -6.259388
157          Derrick White  516.21  -6.492176
158            Jeff Teague  534.84  -6.729818
159            Bryn Forbes  672.66  -7.836013

Nine of the players in this list have a negative D-PIPM (Defensive Player Impact Plus-Minus), which suggests that their have a negative impact on their team’s defensive performance. The odd man out is Maurice Harkless, who is actually known to be a good perimeter defender. In fact, he’s teammates with Montrezl Harrell … it almost feels like their spots on here should be flipped!

Considering the extremely small sample size (the 2019-20 season so far) of this experiment, though, I’m satisfied with these results. I think matchup data is more useful than box-score metrics like DBPM. While the best defensive metrics are still probably adjusted plus / minus stats like D-PIPM and DRPM, DPS appears to have some value.

Here’s an article which gives an in-depth explanation on matchup data like partial possessions. To save you a click, though, partial possesions are defined as “the sum total of partial possessions that were spent defending that player. For example: if Kawhi guards LeBron for 10 seconds of a 20-second possession, that is a 0.5 partial possession that would be added to this column for this matchup.”

Leave a Reply Cancel reply