Technology and Data Posts

php
A site to show draft and player value, performance and trends for GMs, positions, and teams across the draft and free agency. Read more
php
Following the last post which analyzed offensive positions, this is a review of the top metrics that drive player value on the defense with a specific focus on what matters when watching a game. Read more
php
A review of the top metrics that drive player value on the offense with a specific focus on what matters when watching a game. Read more
php
A quick comparison of PFF offensive line blocking grades vs. ESPN's block win rates. Run and pass block win rates are a better picture of offensive line value. Read more
php
Automating scraping of NFL team data from PFR's site using Python. Output data includes team records, PF/PA, strength of schedule, DVOA, and playoff and Super Bowl teams. Read more
php
ProFootball Reference's Approximate Value (AV) is a great NFL analytics measure of player value. AV, similar to baseballโ€™s WAR or basketballโ€™s PER metrics, puts a single value on a playerโ€™s season. Here I use Python to create an NFL data set. Read more

Introducing my player value analytics website

I started this player value site a couple of years ago but never really “released” or publicized it. But over the past month, I’ve finally made some long-wanted changes and thought I would do a quick post to explain the site in case it is interesting or helpful to others.

Here’s the link to the site: https://phillycovercorner.com/draft/index.html

Why I made the site…

I actually think this may be the best freely available draft and player value site out there.

If you have read my stuff, you will know I often use data like draft hit rates, player value, positional costs and free agency prices. To get this data, there isn’t a great or easy-to-use place to get it and I found myself manually pulling data and using it either in Excel or Python.

So, I decided to build a site that would give me what I wanted. And as a (personal) bonus, it allows me to stay sharp technically for my real job.


What you can find on the site

I have to give credit to Pro Football Reference for their work on Approximate Value (AV) which is one of the best single metrics of player value across any position. And it’s what I use as a basis for player value (but I have continually improved the valuation model on top of it) which weaves through so much of the site. You can find more detail on AV on their site here.

The actual analytics site is a responsive site and is mobile friendly, but the below images of the site included here will be small on a mobile device as I captured the desktop site for samples. If you want to see it better, just head to the site and navigate around.

Draft value

Here you can see player values for all drafts since 2000 and filter by year, position, and team.

And beyond player value, this page will show a draft pick’s value vs. expected value given their draft location. In 2020, Chase Young and Jonathan Greenard have similar career value but Young is 13% below expected value for the 2nd overall pick while Greenard was a draft hit, outperforming pick 90’s expected value by 33%.


Team performance and trends

This shows teams draft performance and free agency spend including their record trend, draft hit rates, if they exceeded or fell short of expected draft value, and how much they spend in free agency.

Clicking on a team takes you to a more detailed team page which shows:

  • Positional investment – how they invest in the draft and free agency for each position compared to the rest of the league
  • Draft performance – by year, the total player value obtained in the draft compared to their expected value and league average
  • Free agency spend – by year, what they have added and lost in free agency broken apart by free agency tiers (top, middle, bottom free agents)

Below is an example of the team positional investment view:


General Manager performance and traits

One of my recent – and favorite – adds to the site, you can see how GMs have led their teams, draft performance, how they spend in free agency, and their philosophy and what they value.

Like the Team page, clicking on any GM will take you to a more detailed page showing:

  • Philosophy – a summary of what they believe in running their teams including relevant quotes.
  • Draft performance – a similar view as the Teams page, this shows how much value they gained by year in the draft vs. expected, although this will only show years that they were a GM
  • Positional prioritization – which positions they have drafted by round and how that compares to the league
  • Free agency spend – again, a similar view as the Teams page with how they have spent in free agency

Player value

This page allows searching for any drafted player (UDFAs are not loaded yet, but something I am working on) and shows various aspects of a players value including:

  • Value percentile – a single number showing a player’s overall value percentile
  • Playing time adjust value – value percentile adjusted based on their playing time which will show good players that may have missed time due to injury
  • Draft capital – amount of draft capital used on the player
  • Value vs. expected – a player’s value vs. the expected value for their pick location
  • Career duration – view on how long the player has played
  • Total career value – while value percentile is independent of how long a player has played, Total Career Value measures a player’s cumulative career value

Positional trends

Another page I recently completely re-did and one I love, this shows draft and free agency trend information across positions including draft hit rates and draft capital usage, average free agency prices, and how both draft capital and free agency prices have been changing.

Clicking on any position will take you to a more detailed positional information page which includes:

  • Draft statistics – draft capital used over time and how many elite, above average, average, or poor players came from each class
  • Free agency – how free agency prices for the position have changed compared to the overall salary cap growth rate, both for all free agents and different tiers
  • Team positional investment – if you are curious which teams have invested in a position, here you can see every team’s investment in both the draft and free agency

Below is an example of the team positional investment view from the detail page.


Insights

Various one-off insight pages on various topics, including how free agency contract prices growth with the salary cap, draft prospects ages and trends, punter value, and others.

Below is the view on the salary cap vs. free agent prices which I have used often to project new deals and show that player prices (rightly) keep going up.


How to use

And lastly, there’s a page dedicated to a deeper explanation of the various fields, player value metrics, and the model changes I have made over the years to improve it including positional value adjustments, handling QB overvaluation, and smoothing yearly anomalies.


If you check it out, I hope you enjoy it and find value in it. I will continue to improve it and add to it.

Go Birds.

The Best Metrics to Watch That Drive Player Value – Defense

Last time I ran through offensive metrics (link here) that matter and now I will hit the defense. Defense is a lot harder to simplify as there are fewer true individual metrics and a player’s impact on defense often doesn’t show up in a stat. Correlations to player value (Approximate Value or AV) will be lower and players will be outliers to a bigger degree. As a reminder, here I am seeking as best as possible metrics that you can see when watching a game and avoiding player “grades” which don’t mean anything themselves (grades from PFF, as an example, are really good but there isn’t much value in just relaying them here).

Defensive LineLinebackerCornerbackSafety
Pass rush win rate +
Run stops
Run stops +
Completion % allowed +
Pressures
Completion % allowed +
Yards per reception +
TDs-interceptions +
Missed tackle rate
Forced incompletions +
Run stops +
Pressures +
Turnovers

Defensive Line

Pass Rush Win Rate + Run Stops

The best defensive line metrics are, naturally, the opposite of the offensive line’s metrics:

  • Pass rush win rate (PRWR) – The ability to beat a block in less than 2.5 seconds
  • Run stops* – The ability to force a runner to adjust a running lane or make a tackle within 3 yards of the line of scrimmage

*Run stops are defined as the lineman preventing an offensive player from gaining 40% of yards to go on 1st down, 50% of yards to go on 2nd down, and preventing a first down or touchdown on 3rd and 4th downs. Basically, limiting the offense to less than half the yards they need.

Pass Rush Win Rate (PRWR) is not a perfect metric because the pass rush is only one piece of a defense’s success against the pass with the secondary being a huge component. To illustrate, below shows PRWR vs. the passer rating allowed with coverage grades shown with the circle size (smaller circle is a worse coverage grade).

DL pass rush win rate vs Pass rate allowed
Circle size = coverage grade (larger is better coverage grade)

With this you will see Philadelphia with a relatively good pass rush but a poor passer rating allowed but this was because of its secondary. Conversely, the teams in the bottom right are the teams with the best PRWRs and lowest passer ratings allowed – these all also have good coverage grades. In the bottom left, there are some teams (SFO, DEN, and GNB) with below average PRWRs but still good passer ratings allowed due to very good coverage grades.

Switching to run defense, below shows run stop rate (run stops as a percentage of rush attempts) vs. rush yards per attempt. Using run stops works better than the related Run Stop Win Rate (RSWR) which has some oddities (Houston scores in the top 10 in RSWR but has the lowest rated run defense in the league). The circle size below is the RSWR to visualize it, with a bigger circle being a higher rated RSWR. You can see HOU as an outlier, allowing 5.2 yards per attempt but having a good RSWR score, but when you use run stop percentage, HOU drops to one of the lower in the league.

DL run stops vs rush yards allowed
Circle size = run stop win rate (larger is better win rate)

Neither of these will get really high R2 values because defensive line can only control the pass and rush so much on its own (run stops have a bit better correlation than pass rush). Both metrics intuitively make sense. A run stop (limiting to fewer than half of the yards to go) shortens drives and reduces chances to score. And defensive linemen beating their blocks in under 2.5 seconds either gets to the quarterback for a pressure (offensive DVOA reduces by over 100 points under pressure) or forces a quicker throw which will limit depth of target and explosive plays.


Linebacker

Run Stops + Completion % allowed + Pressures

Linebacker is becoming one of the more interesting positions in the NFL (along with safety) as they are the primary position often tasked with dealing with the growing offensive mismatches being used. It’s becoming hard to even list who is a linebacker – “base” defense is only 25% of snaps so at least one linebacker is off the field most of the time. But more and more, hybrid players (converted safeties or undersized but athletic ends) are being used in this role.

To simplify linebacker as much as possible and cover their responsibilities in both run and pass defense, the three key metrics to look at are run stops, forced incompletions, and pressures. These three get an R2 of 0.63, meaning 63% of a linebackers value (defined by AV) is explained by them:

  • Run stops* – Average of 6.6 per season across all LBs, 14.6 among starters
  • Forced Incompletions – Average of 1.3 per season across all LBs, 2.9 for starters
  • Pressures – Average of 5.4 across all LBs, 11.9 for starters

The key outliers in the top right on the below chart are the elite linebackers and seasons (Fred Warner, Darius Leonard, Bobby Wagner, Dont’a Hightower in 2019). One thing to note is almost all are part of defenses that have top-end DVOA scores which may be elevating all defensive player value scores given how AV works (a large part of the AV score includes the starting total value of the defense which is split up by positional allocation). But the combination of stops, incompletions, and pressures has a good correlation to linebacker value.

LB value vs composite metric
Circle size = defensive DVOA (smaller circle is better DVOA)

When you look at the top linebackers from 2020 (those that generated 10 or more AV for the season), you see that while they are all good at both run and pass defense, they have wide ranges across these metrics, which is what makes it impossible to use a single metric.

Top 2020 linebackers

As I explained in the offensive metrics post, the closer these metrics are to true value – which is scoring points on the offense and preventing points scored on the defense – the better they will be. For linebackers, limiting run gains, forcing an incompletion, and disrupting the quarterback are the most important things they can do, especially in the NFL today when they are asked to cover receivers, tight ends, running backs, and increasingly contain mobile quarterbacks.


Cornerback

Completion % allowed + Yards allowed per reception + TDs allowed – Interceptions + Missed tackle rate

Cornerback is another position that is tough to measure easily. The best corners often aren’t targeted and result in a lack of traditional metrics (interceptions, forced incompletions) or skew other metrics (catch rate, yards per reception).

Many will look at yards allowed per coverage snap which is a very good stat but there is too much overlap between good and bad cornerbacks as shown below. The top CBs (with AV of 10 or above) are in blue with the rest of the league in red – while the top CBs on average give up fewer yards per coverage snap (1.0 yard vs. 1.3 yards), there is so much overlap.

Cornerback yards per coverage snap
Cornerback yards per coverage snap

The reason yards per coverage snap fails is because the best corners often have a higher average depth of target as they cover the best receivers as shown below. The top CBs in the league have an average depth of target of 11.4 yards, a full yard higher than the bottom quartile and 0.7 yards above the average. As an example, in the above chart one of the blue dots further to the right is Jalen Ramsey’s 2018 season where he allowed an above average 1.3 yards per coverage snap but he dealt with one of the highest average depths of target (13.5 vs. a league average of 10.7) as he was in single-coverage vs. WR1s. Justifying his value ranking, he gave up an elite catch rate (87th percentile) and lower than average QB rating allowed (73.8).

Cornerback average depth of target
Cornerback average depth of target

The cornerback metrics are more complicated than I was seeking and ultimately you would want to compare corners against rated wide receivers, but the below four metrics do a good job of explaining corner value (and, again, these are metrics you can easily see watching a game):

  • Completion % allowed – How often the CB allowed a receiver to make a catch. Slot receivers or CBs covering short routes can be penalized on completion % alone which is why yards allowed and average depth of target (aDOT) needs to be considered next.
  • Yards allowed per reception – Average yards per reception is a good metric but skews slot corners that defend shorter routes and penalizes corners that have to defend deep threats. I tried different adjustments (without needing to pull # of slot vs. outside snaps) and settled on using yards allowed per reception * 10.7/aDOT. The average depth of target across all CBs is 10.7 so this will give a slight adjustment based on the depth that CBs are defending.
  • Touchdowns allowed minus interceptions – As we are looking at metrics that stay as close to true value (the ability to score or prevent a score), TDs and interceptions are absolutely critical. Interceptions are valued at 0.588 of a touchdown based on prior research explained here). This is the metric that shows the least stability year-to-year and while it reflects CB ball skills, there is an opportunistic or situational component to it as well.
  • Missed tackle rate – This is one of the more stable metrics year-to-year for corners and covers both their value in the run game and in limiting gains in passing.

No metric will ever be perfect and result in an un-debatable ordering of players – AV doesn’t do this, PFF is great but has its flaws, and so on. But below shows the top CBs in 2020 by catches, yards, TD-Ints, and missed tackles. The center grey columns show each player’s percentile performance in each and these percentiles are then combined into a composite score, one weighted per 500 coverage snaps (to give more weight to players that played more and weed out low volume players) and a second percentile that is unweighted. I included each player’s AV (column 4) and the PFF rank for the top 10 corners (last column).

2020 Top CB stats

There are, of course, differences between this and AV and PFF rankings. The relative weightings of catches vs. yards vs. TD-Ints could be adjusted to value the aspects of a corner’s role more or less, but there were several interesting things when I looked at this:

  • There is pretty good agreement with AV but a little more difference with PFF ranks. One reason is PFF scores regardless of snap count – Bryce Callahan is the key example here, rated 3rd by PFF with elite coverage grades and 2 interceptions and no TDs allowed but he only played 386 coverage snaps. I prefer weighing snap count because the more a player plays, the greater their value, but it depends what you are trying to accomplish.
  • As mentioned above, interceptions aren’t a very stable metric year-to-year but they are valuable and need to be included (besides a touchdown, turnovers are the highest value play at 4 expected points). But interceptions will skew CBs – the examples of CBs with high interceptions here are Malcolm Butler (5) and Xavien Howard (10). Both are 4-5 spots higher than they would have been if they had interception totals at their career average. But again, interceptions are valuable and both Butler and Howard are known for their ball skills.

Safety

Forced incompletions + Run stops + Pressures + Turnovers generated

Safety turned out to be one of the most difficult positions to simplify because similar to linebacker, what a safety is today is increasingly varied. They have responsibilities in coverage, against the run, as a pass rusher, and often take responsibility covering the tight end and running back.

Because of this, how the top safeties generate value varies widely. The average usage for safeties is 65% in coverage, 32% in the box, and 3% as a pass rusher but the actual usage varies greatly. Below shows safeties plotted by percentage of time in coverage or the box and their AV value and PFF rankings denoted with the circle colors. Players above the dashed line are used in the box more than average and below the dashed line are used in coverage more often.

  • Safety positional usage and AV ranking

Given this, safeties can provide value in greatly different ways and you cannot just look at coverage stats, even though coverage is still two-thirds of their time. The simplest way I believe to look at this is to “count the impacts” of safeties – impacts defined as the discrete plays that you can see including forced incompletions, run stops, quarterback pressures, and turnovers (forced fumbles and interceptions). Other metrics like completion percentage allowed or missed tackle rate slightly improve the tie to value but overcomplicate an already complicated view.

The below shows safeties grouped into sets of 10 players by their 2020 AV compared with the number of forced incompletions, run stops, pressures, and turnovers. The top ten safeties by value generate 28.5 “impacts” vs. 23.5 for the next group of ten with a continual decline shown.

Safety value vs impact count

There is one outlier group, the 81-90 grouping where Miami’s rookie Brandon Jones skewed the numbers as he generated 15 run stops and 6 pressures in only 385 snaps as an almost exclusive box defender. In 2020, Jones only accumulated 2 AV but projecting him out to a full season, his AV would be 6-7 and would have put him in the top 30 safeties. Players with low snap / game counts and little history that stick out in the data are interesting to watch moving forward as their playing time increases.

As with the other metrics, I was seeking relatively simple metrics you can see when watching games that are highly correlated to player value. With safeties, these all make sense – their ability to stop completions and stop runs, to pressure the QB, and to generate turnovers all have great value to a defense.


Note: The source data files will be added here and to Github once they are cleaned up and any non-sharable data is removed.

The Best Metrics to Watch That Drive Player Value – Offense

I’m procrastinating finishing the Eagles draft history analysis because the last two positions to do are tight end and running back and both are pretty uninteresting positions to look at (mostly because the relatively small number of players drafted at each position makes conclusions on the data useless). But that will be finished soon and I have a request to look at the Eagles tendencies which I will include.

One thing I have been working on and wanted to get out was a listing of the best positional metrics that determine player value and are intuitive to track. This last part is an important distinction as there are some really great metrics used, but they aren’t intuitive to see when watching a game. Most of the metrics below are metrics that are out there created by others, but a couple are ones I enhanced or added to (and when I did, I explained why and gave detail on the calculations). My plan is next to take a look at what 2021 could / should look like for the Eagles based on these metrics and what typical improvements and drops we could see. As a summary, following are the offensive metrics that are most important to watch:

QuarterbackWide Receiver /
Tight End
Running BackOffensive Line
Expected Points Added +
Completion Percentage Over Expectation +
Rushing Yards / 600 Dropbacks
Avg Depth of Target * Separation +
Catch Percentage +
Yards After Catch
Rush Yards After Contact +
Receiving Yards After Completion
Run Block Win Rate +
Pass Block Win Rate

Some initial credits to several people that have done great work here:

  • PFF’s Austin Gayle (@PFF_AustinGayle) was on Fran Duffy’s “Journey to the Draft” podcast which was a great run through the metrics that PFF sees as most stable in projecting players
  • Michigan Football Analytics (@mfbanalytics) – which is becoming one of my favorite sites – does great work on metrics and advances the entire community

My approach and what matters

If you read anything else here, you will see me use PFR’s Approximate Value (AV) in a lot of analyses. The reason I like AV is that the basis of the metric – points per drive – makes sense in determining value. What determines wins in the NFL? Obviously, more points than the other team. The best way to measure value is looking at an offense’s ability to score higher than expected and a defense’s ability to allow scoring at a lower rate than expected. This is what AV does. DVOA is another great metric that uses success on a play vs. expected as its basis and there is good correlation between DVOA and AV.

It is easy to get lost down holes looking for correlations among data that “work” on a chart and mathematically but either don’t make sense or are further distanced from what ultimately determines success. A classic example is Leonard Koppett’s “Super Bowl Indicator” which predicted stock market success – it back-tested but then failed moving forward because it fundamentally makes no sense. For the positional metrics, the closer they are to generating points for the offense or preventing points for the defense, the more useful (correlated with player value) and stable (consistent year over year and across players) the metric will be.

The last thing I will say is that football is so incredibly complicated that any single metric has its flaws and will miss things. A lot of people spend time trying to build predictive models to show what a player will do next year and while that can be fun, I don’t think a pure model will ever be good enough at an individual player level to do that. But now, on to the metrics…


Quarterback:

Composite metric of EPA+CPOE and rushing value

PFF will highlight a few metrics that they see as highly stable, including “clean pocket passer rating” and “average depth of target from a clean pocket”. All QBs ability to succeed drops under pressure so the thinking here is you best measure a QB on how successful they are while protected with more value on deeper passes (as it makes the offense more efficient).

FiveThirtyEight did an analysis here showing better correlation of Completion Percentage Over Expected (CPOE) on predicting NFL success. And Michigan Football Analytics has an awesome post here where the look at CPOE plus Expected Points Added (EPA) and the ability to predict QB success.

All of these metrics above make intuitive sense as a quarterback’s ability to complete passes at depth is a close connection to an offense’s ability to score points. But I do think a lot of these miss out on a QB’s running impact on the game which is why I incorporated rushing value. Below are the R2 values for each of these metrics (if you aren’t familiar with R2, most simply it is how well one variable is at explaining changes in another with higher R2 meaning the two metrics are better correlated).

MetricR2 with AV
Avg depth of target on clean pocket0.017
Clean pocket passer rating0.319
CPOE0.340
EPA0.584
CPOE+EPA0.553
Composite CPOE+EPA and rushing value0.702

Below is the chart of quarterback value (normalized to a full season AV) vs. CPOE+EPA+rushing yards per 600 dropbacks (as an aside, that’s Lamar Jackson’s silly 2019 season way out at the top right).

QBValue vs CPOE+EPA and Rushing
QBValue vs CPOE+EPA and Rushing

And below is a gallery of the other mentioned metrics compared to AV, each of which has a looser correlation.

  • QB Value vs Clean Pocket Passer Rating
    QB Value vs Clean Pocket Passer Rating

The above makes sense – CPOE+EPA is the closest metric to successful drives which translates to points but it ignores a QB’s ability to add value rushing. The calculation I use for the composite CPOE+EPA and rushing metric is:

= 6169.56 * CPOE+EPA + Rushing Yards per 600 Dropbacks

Quick explanation on the above:

  • A QB’s passing performance is more important to a team’s success than their rushing performance (the model I used showed rushing had 43% the value of passing). This along with EPA+CPOE being a small number generated the 6169.56 coefficient.
  • The best measure of rushing value I found was rushing yards per 600 dropbacks (the average number of dropbacks for quarterbacks over a full season). Measuring this by dropbacks normalizes for QBs that didn’t play a full season.

Wide Receiver / Tight End:

Receiver efficiency

I think PFF has this one nailed with Wide Receiver Efficiency (article here) which values receivers based on three things: ability to separate (yards of separation at the time the ball arrived), ability to catch (catch percentage), and ability to generate yards after the catch (YAC). This is also weighted by average depth of target (aDOT) as there is more value generated from deeper receptions. Again, this one makes intuitive sense – a wide receiver needs to get open hopefully deeper down the field, catch the ball, and then add yards to the completion.

I wrote more on this in the Eagles WR draft philosophy post here and showed a similar set of data using Next Gen Stats’ data as inputs:

WR league efficiency stats
WR league efficiency stats

Running Back:

Yards After Contact + Receiving Yards After Catch

PFF and Next Gen Stats both have good research on running back value, focusing on a running back’s ability to create more yards than expected for a given rush. Next Gen Stats uses a metric called Rushing Yards Over Expected (RYOE) to identify what a running back created on each rushing play vs. the average (expected) based on where the defenders were, speed, and relative location. Offensive line, scheme, and a back’s own ability to create all factor into their value but you have to separate what a RB controls to get their value.

Yards before contact (YBC) has traditionally been viewed as more of an offensive line stat, and while partly true, a running back obviously can avoid contact with vision and speed. When looking at how both yards after contact (YAC) and YBC vs. Rushing Yards Over Expected (RYOE), both have not-super-high R2 values. YAC does have an R2 almost double YBC’s (0.38 vs 0.21), showing YAC has a relatively larger contribution to a back’s ultimate performance than YBC.

Adding a running back’s pass-catching impact by using Yards After Catch to their ability to create on the ground, you get an R2 of 0.79, meaning 79% of a back’s value is explained by yards after contact and after catch.

RB Value vs. Rushing YAC and Receiving YAC
RB Value vs. Rushing YAC and Receiving YAC

This is an improvement over other metrics which have much lower correlations to total running back value:

MetricR2 with AV
% Rushes with Gains Over Expected (ROE%)0.069
Rushing Yards Over Expected (RYOE)0.167
Broken Tackles0.366
Yards Before Contact (YBC)0.501

Again, this makes intuitive sense – what separates running backs is, all things equal, their ability to create yards in excess of the rest of the league. On average, contact is made at just under 2 yards per carry and what a back does after that has a relatively larger impact on their ultimate value. The above model sightly improves to an R2 of 0.83 if you do factor in YBC with a coefficient of 0.63 (roughly meaning 63% of yards before contact contribute to a back’s success) but I left that out to not over-complicate it for a small improvement.


Offensive Line:

Pass Block Win Rate + Run Block Win Rate

ESPN’s Run Block Win Rate (RBWR) and Pass Block Win Rate (PBWR) described here are the metrics most focus on to judge offensive line success. Offensive line metrics are much harder to isolate and link the line with AV, particularly for pass blocking because so much depends on the quarterback. In the below showing 2020 PBWR vs. passing net yards per attempt, you see negative outliers on the left (PHI, WFT, CHI) where pass blocking was better than the passing offense – each of these teams had awful quarterback play. And on the right side, teams with marginal pass blocking (TEN, HOU, TAM, MIN) outperformed because of good QB play last year.

Offensive line pass block win rate vs. net yards per attempt
Offensive line pass block win rate vs. net yards per attempt

RBWR has a bit better link to outcomes but is still impacted by the quality of the running back (and further skewed by mobile quarterbacks) as described above.

Offensive line run block win rate vs. yards per attempt
Offensive line run block win rate vs. yards per attempt

This is one area where I think the win rate metrics are better than PFF’s blocking ratings and I wrote more about that here – a couple of teams stick out when looking at PFF’s grades. Pittsburgh, for example, had a really poor line in 2020 and scored that way in PBWR but scored higher in PFF grades because Ben released the ball quicker than any quarterback by a large margin. Here, PBWR tells the fuller picture.

Again, blocking win rates are metrics that make intuitive sense. Pass Block Win Rate measures how often a lineman holds their block for at least 2.5 seconds. Football Outsiders has good data that shows across the league, a quarterback’s DVOA is 108 points lower when under pressure. Run Block Win Rate measures how often the lineman prevents the defense from forcing the runner to adjust a running lane or making a tackle within 3 yards of the line of scrimmage.


Next I will use these metrics to look at the Eagles 2021 season and what we can expect…

Source data and code

All source files and the Python scripts to visualize the data are in my GitHub repo here: https://github.com/greghartpa/position-analyses

Additionally, the source data is included below. All source data files are XLSX with the calculations used to create the QB composite score or WR efficiency in the Excel files – the calculations are not done in the Python scripts, which are only used at this point for visualization.

2020 Offensive Line Grades – Win Rate vs. PFF

If you don’t follow Brad Congelio, PhD and Professor of Sports Analytics at Kutztown University (@BradCongelio), you should – lot of great sports analytics. A tweet of his got me thinking on different metrics for offensive line, including PFF grades and ESPN’s win rates, and I wanted to look at a comparison of them.

Pass and run block PFF grades

First, a scatter of PFF run (x axis) and pass (y axis) grades. Top right are teams graded highly in both, bottom left are team graded poorly in both. Bottom right are teams with good run block grades but poor pass block grades and top left are teams with good pass block grades but poor run block grades.

PFF offensive line grades

Pass and run block win rates

Next, the same scatter of teams but using ESPN’s run block win rates (x axis) and pass block win rates (y axis). Pass block win rate measures when an offensive lineman holds their block for at least 2.5 seconds. Run block win rate is a bit more complicated and measures various outcomes like disrupting the running lane, forcing a change in running lane, and a tackle within 3 yards of the line of scrimmage. For more information, go the ESPN’s write-up here.

ESPN offensive line win rates

Differences Between PFF and ESPN OL Win Rates

And lastly, a view on the differences between PFF grades and ESPN’s win rates.

PFF vs. Win Rate grades

Many people hate on PFF’s grades which I think is misguided – I believe they are largely right but will, like all metrics trying to capture a complicated sport like football, miss out on things. The one that stuck out to me was Pittsburgh which is one of the biggest outliers between the data sets. I am an Eagles fan but lived in Pittsburgh for twenty years and have always liked the team. Also, I had the Steelers pick in Brand Lee Gowton’s 2021 fan-led mock at Bleeding Green Nation – I took tackle Samuel Cosmi and thought the Steelers inevitably taking one of the running backs was going to be a huge mistake because their issues are the offensive line.

Pittsburgh’s pass block grading rated really well on PFF (4th in the league) but it is really a factor of Roethlisberger getting rid of the ball at a historically quick pace – in 2020, he averaged 2.17 seconds to throw, by far quickest in the league. This resulted in a plodding offense and watching any tape, the line was just awful last year when they needed to be.

A commenter on Brad Congelio’s tweet also questioned Kansas City’s run blocking, which is another that shows a difference between PFF grades and OL win rates. One thing that stands out to me with Kansas City is the amount of run success that came from Mahomes and Tyreek Hill – they combined for 24% of rushing yards and elevated YPC. Removing Hill and Mahomes and Kansas City’s YPC would be 3.39, putting them near the bottom of the league. Rushing yards are rushing yards, so it isn’t valid to just exclude rushes by non-running backs, but it does point to why one metric scores out differently than another and why watching the Chiefs didn’t feel like watching a strong rushing team.


Code for this quick analysis is in the following GitHub repo:

https://github.com/greghartpa/offensiveline-metrics

Automating Scraping of NFL Team Data With Python

Credit to Steven Morse (@thestevemo or his blog at https://stmorse.github.io/index.html) who had code I used as a basis for my ProFootballReference scraping scripts. He is a good follow for math and analytics work as well.

The full code and supporting files for scraping team and player data from ProFootballReference is located in my GitHub repo here: https://github.com/greghartpa/scrape_pfr_data


Most of the analysis on this site uses data from a few sources, including ProFootballReference as a primary source for team, player, and draft data. Here I am sharing how to automate scraping the team data using Python which will output the following:

Output of scraping PFR team data

Beyond being a team record and DVOA dataset, this output will be used when scraping player data which I will explain in a future post.

Setup and Dependencies

There are only a few package dependencies, including Pandas for storing and manipulating the data, BeautifulSoup and requests for parsing HTML, and NumPy used for some filtering cleanup.

A range of yearly data can be pulled by setting the start and end years. I also pull in team DVOA data which is stored in teams-dvoa.csv and was just manually pulled (some day I will script this out, but it is a small dataset and infrequently changes). If DVOA is not needed, the two lines reading it in and later on merging with the end dataframe can be removed.

Scraping Team Data

The main section of code to scrape PFR team data loops through the range of years, makes an HTML request and parses the resulting table data from PFR. A few points here:

  • The URL is manually built and then pulled using the requests library.
  • BeautifulSoup is used to parse the returned HTML and keys off of the class “sortable stats_table”. PFR breaks NFC and AFC team data into two tables which need to be separately pulled and stored.
  • There is some data cleanup I perform. For example, PFR appends indicators to the team name to represent division winners (“*”) and wildcard teams (“+”). I remove these into a separate column so I have clean team names for later matching and also to make analysis on playoff teams easier. They do not indicate Super Bowl winners (which is odd to me) which I deal with later.
  • I have to check for ties as if a season happened to have no ties, that column doesn’t exist and will cause an issue. So, if no ties, I just add a tie column “T” and populate with zeros.
  • I also add a column “Scraped” and populate with zeros – this will be used when I scrape player data to control which teams I need to or want to scrape.
  • And lastly in this section, I store the team roster URL which will be used for the player data scraping so I don’t have to re-scrape the URL.

Clean Up Data and Export

After the main loop to grab all years, I clean up the data with the following:

  • Convert year, record, points for and against, and strength of schedule columns to numeric
  • Calculate a win percentage
  • I manually create a team abbreviation dataframe named tmabrevdf and merge it with the main output dataframe to normalize three-letter team abbreviations. This is done so I can handle team moves (the Rams, Raiders, and Chargers various moves) or team name changes (Washington Redskins to Washington Football team)
  • And lastly, I manually create a list of Super Bowl winners since PFR does not indicate that in the team tables. It is stored in the header which I may go back and pull from in a future edit, but for now this data, like DVOA, is a small dataset and only changes annually.

Creating an Analytic Data Set of NFL Player Value

ProFootball Reference’s Approximate Value (AV) is a great NFL analytics measure of player value, created by PFRโ€™s founder Doug Drinen. AV, similar to baseballโ€™s WAR or basketballโ€™s PER metrics, puts a single value on a playerโ€™s season and more detail can be found on their website here:

https://www.sports-reference.com/blog/approximate-value/

There are other great posts out there analyzing the NFL draft using ProFootballReferenceโ€™s Approximate Value (AV) metric for player value, including statsbylopezโ€™s โ€œApproximate value and the NFL draftโ€ and OvertheCapโ€™s โ€œExamining Draft Pick Approximate Value & Team Success in the Rookie Wage Scale Eraโ€.

The basics of how AV is calculated

AV is calculated for every position with the following basic tenets:

  • For both offense and defense, a total โ€œpoolโ€ of AV points is calculated by comparing the per drive scoring (or scoring allowed for defense) of a team vs. the league average per drive scoring (scoring allowed)
  • These points are then divided up across positional groups based on modeling. On offense, 45% of the total offensive points are allocated to the OL and the remaining 55% is allocated to skill position players. For defense, 67% of defensive points are allocated to the front 7 and the remaining 33% are allocated to the secondary.
  • Individual positions are further broken down using their contribution compared to the teamโ€™s total and a weighting of how much that position impacts the overall team.
  • Offensive positional AV is pretty directly calculate based on the share of a playerโ€™s contribution to a teamโ€™s per drive scoring. Defensive positions are harder to directly measure and calculated differently. Defensive AV still ultimately uses per drive points allowed compared to league average per drive points allowed, but a defensive playerโ€™s individual AV also pulls in number of games played and started, number of All Pro awards, and tackles, interceptions, fumble recovery, and defensive TD defensive stats.

A more detailed explanation of the calculation is available on PFRโ€™s site: https://www.sports-reference.com/blog/approximate-value-methodology/

Thoughts on AV as a metric

Is it perfect? No, nothing ever will be especially with the complexity of the NFL. Some questions I have digging into the AV calculation are:

  • Despite the positional weighting, do specific positions skew high based on their opportunity for stats? For example, LBs will have more tackles, driving their AV up but does that sync with the NFLโ€™s relative view on the value of LBs?
  • Are trends in the NFL shifting positional values and do NFL teams views on positions sync with AV scoring? Two examples where I think AV data skews a bit โ€” RBs and LBs. โ€œBaseโ€ defense today is nickel or dime with teams averaging 70% of snaps in nickel or dime vs. traditional 3โ€“4 or 4โ€“3 base defense. With more defensive backs on the field the majority of the time, does allocating two-thirds of defensive points to the front seven (which is really front five or six now) make sense? And for RBs, AV (rightfully) weights scoring which gives RBs higher AVs relative to other positions, but this diverges from the lower value teams put on RBs today and ease of replacing RBs.
  • What if a defensive back is so good that they arenโ€™t thrown at and therefore opportunities for points based on interceptions and tackles is much less?
  • Does AV really reflect OL value when OL play is so nuanced? OL values team scoring as the other positions do, but pulls in All Pro and Pro Bowl awards and assigns relative values to C, G, and T.

That said, I think AV does a really good job at valuing players, especially at a population (aggregate) level and over time. This is true of most statistics. For example, a personโ€™s BMI is at best directional and at worst irresponsible to be looked at at an individual level but has extreme value over a population. Below shows AV per game by draft pick location and Career AV by draft round โ€” both show what you would expect in that earlier picks have better performance (higher AVs/Career AVs).

AV per game by draft pick
Career AV by draft round

I do believe there are some positional skews in the AV data. Below shows the AV per game by position in descending order along with what percentage of that positionโ€™s picks were made by round (which is meant to be a proxy for how teams value different positions). QBs as expected is the top valued position and also the position with the most valuable draft capital used (26% of QB draft picks are made in round 1). Second is Tackle, again with the second most valuable draft capital used (22% of Tackle picks are made in round 1). LBs are the fourth highest valued position by AV per game but only the 8th highest position group in terms of how teams spend draft capital. Other divergences between AVperG and how teams spend top draft position are Edge and DB, both rated lower in AVperG but teams are willing to expend greater draft capital on these positions, which are generally accepted as premium positions today.

Picks by round by position

A more rigorous analytical analysis would need to be done to refine AV, but the discrepancies in LB, Edge, and DB make some sense. Since defensive stats (interceptions, tackles, sacks) are bigger parts of defensive AV, LBs will have outsized tackle stats compared to how LBs are used today (with fewer LBs on the field in general). DBs true value is in quarterback rating allowed or completion percentage, not interceptions and tackles. Edges are valued higher in the draft than their AV numbers suggest probably due to the importance of pass rush. Applying an adjustment to AV based on actual draft capital weighting might improve the cross-position valuations.

While I believe certain positions have a bias high or low in the AV calculation, when you look at which players have higher AVs at their position, it is hard to argue that AV does not properly value players relative to their position. For DBs and Edges, the top AVperG players (AVperG > 0.55) over the past 10 drafts are the following:

CB and Edge AV per G leaders

My data set

PFR has a great dataset (linked here) which provides every player drafted, their individual stats, games and years played, college, and with AV by season and Career AV.

I largely used PFRโ€™s dataset but did enrich the date with the following:

AV per G: I wanted a metric that would normalize players value and not focus on career AV, weighting a player drafted years ago higher than a recent player.

% Years Started: Just a calculation of how many years of the playerโ€™s career they started.

PosGroup: The PFR dataset has some positional assignments that I felt were better grouped. First, I labeled positions beginning with โ€œOffโ€ and โ€œDefโ€ tto make analysis of offensive and defensive players easier. Second, there are players listed as โ€œDBโ€ and โ€œCBโ€ which I combined into โ€œDef โ€” DBโ€ and I combined โ€œNTโ€ and โ€œDTโ€ into โ€œDef โ€” IDLโ€, โ€œRBโ€ and โ€œFBโ€ into โ€œOff โ€” RBโ€, โ€œGโ€ and โ€œCโ€ into โ€œOff โ€” OLโ€, and โ€œILBโ€ and โ€œLBโ€ into โ€œLBโ€

PickGrouping: The initial dataset has round and pick and I added a pick grouping by 10s to make certain analyses easier when round was too rough but pick was too precise. For this, all picks from 1โ€“10 will be grouped into PickGroup 10, 11โ€“20 into PickGroup 20, etc.

PositionPct: This is the important one. Given the above points on potential positional bias in the calculations, a way to normalize data is to compare players within their position group and calculate their percentile rank. This is what this field does โ€” for example, Patrick Peterson at a 97 PositionPct is valued in the top 3% of DBs.

PlayerClass: And lastly, to make grouping of players easier for certain analyses, I created a class field and classified players the following way:

  • โ€œEliteโ€ โ€” 90th percentile (top 10%) in AVperG at their position
  • โ€œAboveAvgโ€ โ€” between 60th and 90th percentile
  • โ€œLeagueAvgโ€ โ€” between 40th and 60th percentile
  • โ€œPoorโ€ โ€” below 40th percentile.
  • Note, I also classify players who averaged playing fewer than 5 games per year in their career as โ€œPoorโ€ to handle players that either due to injury or outlying data, had high AVperG stats but only over a few games before being out of the league. Not sure if 5 games per year is the right threshold but this is where I have it set now.