top of page

Are PFF's NFL running-back grades biased?

  • Writer: QuantPunter
    QuantPunter
  • Jun 24, 2020
  • 5 min read

Updated: Jun 28, 2020

The NFL analytics community has become scathing of the contribution of the running back in recent years. Despite, playing one of the most physical, dangerous, injury-prone roles in football, the halfback and fullback are routinely mocked by hordes of geeky nerds from the safety of their homes with the familiar analytics mantra: "Running backs don't matter".


While this statement is probably is little exaggerated, the NFL analytics community, including private companies such as PFF, have established, almost beyond dispute, two important truths about running the football that fly in the face of conventional wisdom:

  1. The passing game is vastly more important to your chances of winning than the running game

  2. The success of the running back (traditionally quantified as yards per carry) is largely a function of factors outside his control, such as the quality of the offensive line, the run scheming of the offensive coordinator, the passing frequency of the offense, the defensive setup, the down and distance, game situation, etc. The running back's actual ability doesn't seem to be particularly important.

PFF running back grades assess the performance of the runner on each play and are made on the basis of process rather than result. If the running back gets tackled for a loss because the guard missed his blocking assignment, PFF will not necessarily punish the RB with a poor grade. In theory, this will produce grades that are more stable and accurate than traditional numerical metrics. But because this evaluation is necessarily subjective, and maybe applied inconsistently by different people, it's easy to imagine that the grades could contain bias or inconsistency. Indeed I suspect the reason PFF don't provide play-by-play grade breakdowns, even in their premium subscription is because they are worried that the resultant public scrutiny would turn up examples of grades which are indefensible. This impugn the credibility of the grading process, and thereby undermine the value of the whole business. It's been difficult to test the reliability of PFF grades though, because the alternate measures were so deeply flawed. If a running back gets more yards per carry than his average grade implies, perhaps it is because he plays with a great offensive line, or takes carries in easier low-value situations.


The recent NFL Big Data Bowl provides a unique opportunity to test the quality of the PFF grades. The competition, which ran in November and December last year, tasked participants with predicting the yards that would be made by the offense on a each run play, given the position, direction, velocity and acceleration of all 22 players on the field at the point of hand-off. Incredibly, the competition was won by two European data scientists, who prior to competing, knew very little about American football. They built a surprisingly simple, yet highly effective convolutional neural network that interpreted the interactions of all 22 players on the field and was able to accurately forecast how likely the offense was to gain each of the possible yardage values (one through 99 yards).


In the aftermath of the competition, it was pointed out that the unexplained variation - whether the team over or under-performed the models yardage prediction - would be a good indicator of the ability of the running back. This isn't strictly true, since the running back's position, direction, velocity and acceleration at the time of hand-off were factored into the model and these values likely reflect some underlying ability. But it's also true that much of the running back's performance is dictated by what happens after the hand-off. Was he able to evade the first defender or break the first tackle? Was he able to continue his forward momentum a few yards as the tackler brought him to ground? And these elements weren't captured by the spatial data fed into the models. We can evaluate the performance of running back objectively, by computing the average yards made relative to expectation. Good running backs will outperform the model more often than not and hence score positive yards above expectation (YAE) while bad running backs will score negative YAE.


So who are the best running backs in the NFL?


ree

Kerryon Johnson beats out the field (amongst eligible RBs), averaging 1.6 yards more than the Kaggle-winning model expected. Ken Dixon and Saquon Barkley round out the top three. How reasonable are these ratings? It's difficult for me to say, as I didn't see enough games in 2018 and can barely remember the ones I did. Let's look at the 2019 Pro-Bowl team, which is a good indication of top-tier talent. Starters Todd Gurley (35th) and James Conner (19th) don't fare very well, although reserves Phillip Lindsay (9th), Ezekiel Elliott (16th) and Barkley are much more highly rated. While Gurley is unquestionably good, it has been suggested he benefited disproportionately from the Rams offensive scheme and blockers. His subsequent release from the Rams at the end of the 2019 season would seem to confirm this view. In any case, it seems the Kaggle ratings are plausible (although Ken Dixon is a bit of a head-scratcher), in which case, we can finally get to answering the question posed in the title of this post - are the PFF running grades biased?


Here are the PFF season average running grades for 2018, plotted against the Kaggle-derived yards above average ratings.

ree

On first glance, these results are promising for PFF. There is a clear positive relationship between PFF run grades and yardage out-performance. The r-squared value of 42% sounds good, although without similar analytical studies to compare to, this number is difficult to interpret. On the whole, I'd say this result is a pretty good endorsement of the PFF run grades. Most of the time, they lineup with the quantitative, objective ratings derived from a state-of-the-art spatial model pretty well. One possible pattern I see though, is that running backs with more carries (larger circles) lie predominantly above the dotted line, whilst the players with less carries are mostly below the line. Above the line means PFF rated the player better than can be explained by their Kaggle rating and below means PFF rated them relatively worse. Does this mean PFF are biased? Maybe.


Here's my hypothesis: PFF analysts (or whoever designates the grades) know that high-profile starters such as Saquon Barkley, Kareem Hunt, Marshawn Lynch, Melvin Gordon, Todd Gurley, etc are supposed to be elite. These guys get paid the big bucks for a reason, so the analysts approach the grading process with preconceived ideas about skill. When they see a great run, their prior beliefs are confirmed ("this guy's a star!") and they will justifiably award him a high grade. When the running back makes a poor play, by contrast, they may inadvertently make excuses for the star player ("the defence did well there" or "his blockers let him down"). Humans struggle to incorporate disconfirming evidence into our worldview rationally, a phenomenon known within the field of behavioural economics as confirmation bias. This is one reason why Bayes Rule, and bayesian thinking in general, is still such an important tool in science of decision-making. In any case, if my hypothesis were true, then I'd expect to see a chart much like the one above. The fact that we have observed such a chart, doesn't, of course make it true. But, until someone shows me contrary evidence, I'm going to stick with my hypothesis.

2 Comments



Michael Webb
Michael Webb
Jun 24, 2020

Outstanding. I agree with your model and feel as you yourself explore the great game some more you will realize Ken Dixon has been an underrated player for both the Ravens and Jets. It won't be long before he is splashed all over the social pipes

Like

QuantPunter

  • Twitter

© 2020 by QuantPunter.

bottom of page