Creating an Analytic Data Set of NFL Player Value

Python code

ProFootball Reference’s Approximate Value (AV) is a great NFL analytics measure of player value, created by PFR’s founder Doug Drinen. AV, similar to baseball’s WAR or basketball’s PER metrics, puts a single value on a player’s season and more detail can be found on their website here:

https://www.sports-reference.com/blog/approximate-value/

There are other great posts out there analyzing the NFL draft using ProFootballReference’s Approximate Value (AV) metric for player value, including statsbylopez’s “Approximate value and the NFL draft” and OvertheCap’s “Examining Draft Pick Approximate Value & Team Success in the Rookie Wage Scale Era”.

The basics of how AV is calculated

AV is calculated for every position with the following basic tenets:

  • For both offense and defense, a total “pool” of AV points is calculated by comparing the per drive scoring (or scoring allowed for defense) of a team vs. the league average per drive scoring (scoring allowed)
  • These points are then divided up across positional groups based on modeling. On offense, 45% of the total offensive points are allocated to the OL and the remaining 55% is allocated to skill position players. For defense, 67% of defensive points are allocated to the front 7 and the remaining 33% are allocated to the secondary.
  • Individual positions are further broken down using their contribution compared to the team’s total and a weighting of how much that position impacts the overall team.
  • Offensive positional AV is pretty directly calculate based on the share of a player’s contribution to a team’s per drive scoring. Defensive positions are harder to directly measure and calculated differently. Defensive AV still ultimately uses per drive points allowed compared to league average per drive points allowed, but a defensive player’s individual AV also pulls in number of games played and started, number of All Pro awards, and tackles, interceptions, fumble recovery, and defensive TD defensive stats.

A more detailed explanation of the calculation is available on PFR’s site: https://www.sports-reference.com/blog/approximate-value-methodology/

Thoughts on AV as a metric

Is it perfect? No, nothing ever will be especially with the complexity of the NFL. Some questions I have digging into the AV calculation are:

  • Despite the positional weighting, do specific positions skew high based on their opportunity for stats? For example, LBs will have more tackles, driving their AV up but does that sync with the NFL’s relative view on the value of LBs?
  • Are trends in the NFL shifting positional values and do NFL teams views on positions sync with AV scoring? Two examples where I think AV data skews a bit — RBs and LBs. “Base” defense today is nickel or dime with teams averaging 70% of snaps in nickel or dime vs. traditional 3–4 or 4–3 base defense. With more defensive backs on the field the majority of the time, does allocating two-thirds of defensive points to the front seven (which is really front five or six now) make sense? And for RBs, AV (rightfully) weights scoring which gives RBs higher AVs relative to other positions, but this diverges from the lower value teams put on RBs today and ease of replacing RBs.
  • What if a defensive back is so good that they aren’t thrown at and therefore opportunities for points based on interceptions and tackles is much less?
  • Does AV really reflect OL value when OL play is so nuanced? OL values team scoring as the other positions do, but pulls in All Pro and Pro Bowl awards and assigns relative values to C, G, and T.

That said, I think AV does a really good job at valuing players, especially at a population (aggregate) level and over time. This is true of most statistics. For example, a person’s BMI is at best directional and at worst irresponsible to be looked at at an individual level but has extreme value over a population. Below shows AV per game by draft pick location and Career AV by draft round — both show what you would expect in that earlier picks have better performance (higher AVs/Career AVs).

AV per game by draft pick
Career AV by draft round

I do believe there are some positional skews in the AV data. Below shows the AV per game by position in descending order along with what percentage of that position’s picks were made by round (which is meant to be a proxy for how teams value different positions). QBs as expected is the top valued position and also the position with the most valuable draft capital used (26% of QB draft picks are made in round 1). Second is Tackle, again with the second most valuable draft capital used (22% of Tackle picks are made in round 1). LBs are the fourth highest valued position by AV per game but only the 8th highest position group in terms of how teams spend draft capital. Other divergences between AVperG and how teams spend top draft position are Edge and DB, both rated lower in AVperG but teams are willing to expend greater draft capital on these positions, which are generally accepted as premium positions today.

Picks by round by position

A more rigorous analytical analysis would need to be done to refine AV, but the discrepancies in LB, Edge, and DB make some sense. Since defensive stats (interceptions, tackles, sacks) are bigger parts of defensive AV, LBs will have outsized tackle stats compared to how LBs are used today (with fewer LBs on the field in general). DBs true value is in quarterback rating allowed or completion percentage, not interceptions and tackles. Edges are valued higher in the draft than their AV numbers suggest probably due to the importance of pass rush. Applying an adjustment to AV based on actual draft capital weighting might improve the cross-position valuations.

While I believe certain positions have a bias high or low in the AV calculation, when you look at which players have higher AVs at their position, it is hard to argue that AV does not properly value players relative to their position. For DBs and Edges, the top AVperG players (AVperG > 0.55) over the past 10 drafts are the following:

CB and Edge AV per G leaders

My data set

PFR has a great dataset (linked here) which provides every player drafted, their individual stats, games and years played, college, and with AV by season and Career AV.

I largely used PFR’s dataset but did enrich the date with the following:

AV per G: I wanted a metric that would normalize players value and not focus on career AV, weighting a player drafted years ago higher than a recent player.

% Years Started: Just a calculation of how many years of the player’s career they started.

PosGroup: The PFR dataset has some positional assignments that I felt were better grouped. First, I labeled positions beginning with “Off” and “Def” tto make analysis of offensive and defensive players easier. Second, there are players listed as “DB” and “CB” which I combined into “Def — DB” and I combined “NT” and “DT” into “Def — IDL”, “RB” and “FB” into “Off — RB”, “G” and “C” into “Off — OL”, and “ILB” and “LB” into “LB”

PickGrouping: The initial dataset has round and pick and I added a pick grouping by 10s to make certain analyses easier when round was too rough but pick was too precise. For this, all picks from 1–10 will be grouped into PickGroup 10, 11–20 into PickGroup 20, etc.

PositionPct: This is the important one. Given the above points on potential positional bias in the calculations, a way to normalize data is to compare players within their position group and calculate their percentile rank. This is what this field does — for example, Patrick Peterson at a 97 PositionPct is valued in the top 3% of DBs.

PlayerClass: And lastly, to make grouping of players easier for certain analyses, I created a class field and classified players the following way:

  • “Elite” — 90th percentile (top 10%) in AVperG at their position
  • “AboveAvg” — between 60th and 90th percentile
  • “LeagueAvg” — between 40th and 60th percentile
  • “Poor” — below 40th percentile.
  • Note, I also classify players who averaged playing fewer than 5 games per year in their career as “Poor” to handle players that either due to injury or outlying data, had high AVperG stats but only over a few games before being out of the league. Not sure if 5 games per year is the right threshold but this is where I have it set now.