
In August, I released an article introducing a new way to evaluate defensive performance in the NBA. I thought I’d make a short article this year to show exactly how I use Python to calculate the metric.
The metric, which I’ll call Defensive Points Saved (DPS), quantifies the impact a defender has based on individual defensive matchup data. This data tells us how many possessions a player guarded another player and the outcomes of these possessions. Based on a player’s season averages, we can estimate the performance they’d be expected to have in a particular matchup and then compare that to their actual performance to see how well the defender fared. That’s the premise of this metric.
I start off by importing the required modules and defining the headers that are used to be able to scrape the official NBA Stats website.
import pandas as pd import numpy as np import requests import json headers = {'Host': 'stats.nba.com','User-Agent': 'Firefox/55.0','Accept': 'application/json, text/plain, */*','Accept-Language': 'en-US,en;q=0.5','Accept-Encoding': 'gzip, deflate','Referer': 'https://stats.nba.com/','x-nba-stats-origin': 'stats','x-nba-stats-token': 'true','DNT': '1',}
In order to determine the offensive player’s expected performance in any given matchup based on the number of possessions, we’ll need a data frame with the 2019-20 offensive stats per possession for every player. Using the method described in this article, one can find the link that contains the data for any data table on the official NBA Stats website. This can then be converted to a data frame which we’ll call ‘db.’
json = requests.get(url, headers=headers).json() data = json['resultSets'][0]['rowSet'] columns = json['resultSets'][0]['headers'] db = pd.DataFrame.from_records(data, columns=columns)
Unfortunately, ‘db’ has 65 columns. That’s far more than needed. I’ll cut it down, introduce a few new columns for 2P%, 3P%, and FT%, and then rename all of the columns.
db = db[['PLAYER_ID','PLAYER_NAME','FGM','FGA','FG3M','FG3A','FTM','FTA','TOV','BLKA']] db['two_pct'] = (db.FGM - db.FG3M) / (db.FGA - db.FG3A) db['three_pct'] = db.FG3M / db.FG3A db['ft_pct'] = db.FTM / db.FTA db = db[['PLAYER_ID','PLAYER_NAME','two_pct','three_pct','ft_pct','TOV','FTA','BLKA']] db.columns = ['off_player_id','off_player_name','two_pct','three_pct','ft_pct','tov_p','fta_p','blka_p'] # _p represents a per possession stat
Now, we have the information needed to process the matchup data. We just need to scrape the actual matchup data. Unfortunately, we have to go to every individual player page to find the matchups for that player, so a for-loop will be necessary. I’ll just show how the process works for a single player (LeBron James!). Let’s start off by taking the link, converting it to a data frame, and cleaning it up by removing unneeded columns.
x = 2544 # lebron's player id url = 'https://stats.nba.com/stats/leagueseasonmatchups?DateFrom=&DateTo=&DefPlayerID=' + str(x) + '&LeagueID=00&Outcome=&PORound=0&PerMode=Totals&Season=2019-20&SeasonType=Regular+Season' json = requests.get(url, headers=headers).json() data = json['resultSets'][0]['rowSet'] columns = json['resultSets'][0]['headers'] df = pd.DataFrame.from_records(data, columns=columns) df = df[['OFF_PLAYER_ID','MATCHUP_MIN','PARTIAL_POSS','PLAYER_PTS','TEAM_PTS','MATCHUP_TOV','MATCHUP_BLK','MATCHUP_FGA','MATCHUP_FG3A']]
I’ll rename the columns to meet my preference before I create a new column for two-point attempts (which is obviously equal to total attempts minus three-point attempts).
df.columns = ['off_player_id','matchup_min','poss','player_pts','team_pts','tov','blk','fga','fg3a'] df['fg2a'] = df.fga - df.fg3a
Alright, so there are two data frames now. All of the data for matchups in the 2019-20 NBA season where LeBron James is the defender is in the ‘df’ data frame. Meanwhile, a database with per possession offensive statistics for every play is in the ‘db’ data frame. Let’s merge the two.
df = pd.merge(df,db,on='off_player_id')
Here’s what a sample of the merged data frame looks like at this point:

Take a look at the first row. It tells us that LeBron James guarded DeMar DeRozan for 46.28 partial possessions.1 On these plays, DeRozan scored 10 points on 12 shots, committed 1 turnover, and was blocked by James once. Meanwhile, the right side of the data frame (the last six columns) gives us DeMar’s season stats — his 2P%, 3P%, FT%, how many turnovers he commits per possession, how many free throw attempts he gets per possession, and how many times he’s blocked per possession. We can use this to determine his expected performance. For example, he took 12 two-pointers and his 2P% is 54.1%, so he should hit 6.49 of his shots on average for about 13 points. Add another three points for free throws and take away two for turnovers / blocks, and you’re at 14 expected points. LeBron held him to 8 according to the DPS methodology (player_pts – (tov + blk)), so that’s a good defensive performance. Here’s the code for it:
df['x_fta'] = df.fta_p * df.poss # x_ = expected_ df['x_tov'] = df.tov_p * df.poss df['x_blk'] = df.blka_p * df.poss df['x_fg2m'] = df.fg2a * df.two_pct df['x_fg3m'] = df.fg3a * df.three_pct df['x_ftm'] = df.x_fta * df.ft_pct df['x_pts'] = ((df.x_fg2m * 2) + (df.x_fg3m * 3) + (df.x_ftm * 1)) - df.x_blk - df.x_tov df['val'] = df.x_pts - (df.player_pts - df.blk - df.tov)
Now we have a column called ‘val’ which gives a rating evaluating LeBron’s performance in every matchup he was involved in as a defender. We can use this column along with the column with partial possessions to arrive at a single value — the number of points LeBron saves per 100 possessions:
dps = (df.val.sum() / df.poss.sum()) * 100 print(dps)
4.857884011059122
Not bad! Note that the scale is a bit off since the previous article because the league has adjusted the way they come up with matchup data. I’ll recalculate past DPS eventually. 4.86 would lead the league by a wide margin last year. It’s just the 20th highest among 160 eligible players this season, but that’s still good. Anyway, after putting the above code into a for-loop, I created a data frame called ‘final’ with the DPS rating for every NBA player. I filtered this data frame to include players with at least 500 possessions defended, sort it by DPS, and reset the index:
final = final[final.poss>500].sort_values(by='dps',ascending=False).reset_index(drop=True)
Let’s look at the top ten.
final.head(10)
player poss dps 0 Rudy Gobert 723.63 13.550069 1 Anthony Davis 748.29 12.665262 2 Jimmy Butler 581.02 9.615515 3 Wesley Matthews 570.79 8.970503 4 Bam Adebayo 695.24 8.890664 5 Kawhi Leonard 555.56 8.554308 6 Montrezl Harrell 734.76 7.994497 7 Giannis Antetokounmpo 766.18 7.440318 8 Jonathan Isaac 580.30 7.328540 9 Karl-Anthony Towns 700.37 7.306286
These results certainly match the eye-test despite a small sample size of just over 15 games. Gobert and Davis are the two clear early front runners for Defensive Player of the Year, and Kawhi Leonard, Jimmy Butler, Giannis Antetokounmpo, and Karl-Anthony Towns have all been recognized for playing superb defense this season. Players like Wesley Matthews, Bam Adebayo, and Jonathan Isaac are also known to be stalwarts on that end of the floor. Matthews particularly has been superb for Milwaukee this year — he’s allowing the lowest FG% in the league so far. The only thing that surprised me was Montrezl Harrell’s spot on the list. He actually has a reputation as a mediocre defender. He did get a below average rating (-1.47) in my post in August, so maybe a larger sample size will sort things out. All things considered, this is a very reasonable top ten.
And now for the bottom ten!
final.tail(10)
player poss dps ... 150 Buddy Hield 682.06 -4.601858 151 Maurice Harkless 559.90 -4.626677 152 DeAndre' Bembry 598.43 -5.028635 153 Nemanja Bjelica 540.16 -5.227452 154 Coby White 630.99 -5.306599 155 Rui Hachimura 554.20 -5.660784 156 DeMar DeRozan 826.29 -6.259388 157 Derrick White 516.21 -6.492176 158 Jeff Teague 534.84 -6.729818 159 Bryn Forbes 672.66 -7.836013
Nine of the players in this list have a negative D-PIPM (Defensive Player Impact Plus-Minus), which suggests that their have a negative impact on their team’s defensive performance. The odd man out is Maurice Harkless, who is actually known to be a good perimeter defender. In fact, he’s teammates with Montrezl Harrell … it almost feels like their spots on here should be flipped!
Considering the extremely small sample size (the 2019-20 season so far) of this experiment, though, I’m satisfied with these results. I think matchup data is more useful than box-score metrics like DBPM. While the best defensive metrics are still probably adjusted plus / minus stats like D-PIPM and DRPM, DPS appears to have some value.
- Here’s an article which gives an in-depth explanation on matchup data like partial possessions. To save you a click, though, partial possesions are defined as “the sum total of partial possessions that were spent defending that player. For example: if Kawhi guards LeBron for 10 seconds of a 20-second possession, that is a 0.5 partial possession that would be added to this column for this matchup.”
Comments