Which stats are most predictable and predictive?

Throughout my season previews and my picks during the season, I refer to certain statistics as being either predictive, predictable, or not. I wanted to discuss what I mean by that and put all my supporting statistical evidence in one place. First, let’s define some terms.

Predictability is probably the easiest term to understand. A statistic is predictable if it can be easily predicted from one year to the next. This is measured by calculating the correlation between a team’s performance in a specific statistic in a specific season with its performance in that same statistic the following season.

Essentially, the higher the correlation the easier it is to predict how a team will perform in a certain statistic based solely on how they performed the season before. This is important because if a statistic can’t be reasonably predicted on a year-to-year basis, it doesn’t provide us much predictive value.

That leads into the second term, predictiveness. A statistic is predictive if it can be used to predict a team’s likelihood for winning. I am going to measure two types of winning percentage predictiveness, one measuring same-season win predictiveness and one measuring next season’s win predictiveness. 

Same-season win predictiveness is measuring the correlation between a certain statistic and a team’s winning percentage in the same season. For example, as a team averages more yards per play, their likelihood of winning goes up, but not at a perfect 1:1 rate and likely not at the exact same rate as other statistics, which have their own statistical relationship with winning percentage. Measuring correlation allows us to see which statistics most closely vary with winning percentage. 

That being said, while same-season win predictiveness is definitely worth taking into account, it’s not a particularly useful stat for handicapping purposes because it only works with data from games that have already happened. Once I know how many yards a team gained in a game, I can give you a pretty good guess as to whether or not they won the game, but that isn’t all that useful.

Next season’s win predictiveness is really what we want because we want to be able to take last year’s statistics and use them to most effectively predict future winning. Rather than just measuring the correlation between a statistic and the same season’s winning percentage, we also want to measure the correlation between a statistic and the next season’s winning percentage to see how closely those variables relate.

If this isn’t making sense yet, hopefully it will when I get into some examples. Let’s start with a common one, turnover margin. All statistics included in this post are over a sample size of the past 10 seasons (2011-2020).

StatisticYear to YearWinning %Next Year Winning %
TO Margin11.26%69.67%23.73%

We all know intuitively that winning the turnover margin has a significant impact on winning, but this puts it into context. A team’s turnover margin correlates with same-season winning at close to a 70% rate. However, while it is predictive of same-season winning, it is highly unpredictable year-to-year, with a correlation of just about 11% year-to-year, meaning from a statistical standpoint, a team’s turnover margin almost might as well be random year-to-year. 

As a result, while turnover margin is predictive of same-season winning, it really isn’t predictive of next year’s winning percentage. I will break this down further later, but I wanted to use this as an example right off the bat.

Another good example is winning percentage itself.

StatisticYear to YearWinning %Next Year Winning %
Win %25.31%100.00%25.31%

Winning percentage correlation is obviously going to be 100% because we are correlating a statistic with itself within the same season, but on a year-to-year basis, winning percentage only correlates with itself at about a 25% rate, meaning winning percentage can’t be used to accurately predict itself on a year-to-year basis. 

It’s well-known the NFL is a parity league that is highly unpredictable every season, but this just puts into context how unpredictable and how tough it is to handicap a team’s future success. Fortunately, there are statistics that are significantly more predictive of future winning percentage than winning percentage itself.

Let’s start with one I’ve already mentioned, yards per play. The below chart breaks out yards per play, yards per play allowed, and yards per play differential. Note: any “allowed” statistics will have a negative correlation with winning percentage because the less a team allows, the more they win.

StatisticsYear to YearWinning %Next Year Winning %
YPP36.08%50.46%23.40%
YPPA42.74%-31.16%-15.76%
YPPD36.10%67.06%31.97%

All three statistics are reasonably predictable on a year-to-year basis, with yards per play allowed actually being the most predictable of the three by a slight amount, although offense correlates with winning at a much higher rate and is much more predictive of future winning than defense. This is a theme we’ll see throughout this analysis, offense being more predictive than defense.

In terms of overall differential, this statistic correlates with same season winning slightly less than turnover margin does, but because it is significantly more predictable, it’s significantly more predictive of future winning, correlating with future winning at about a 32% rate, already a significant increase from the 25% predictiveness we get just from looking at winning percentage.

We can do better than that though. Let’s look at another obvious one that would correlate heavily with winning, points, more specifically points per play, points per play allowed, and points per play differential.

StatisticYear to YearWinning %Next Year Winning %
PPP31.57%75.24%34.46%
PPPA31.41%-67.81%-26.27%
PPPD35.25%89.95%38.39%

Right off the bat, we see this correlates with same season winning at a very high rate, which is to be expected, considering points are what decides games. It’s not a perfect 1:1 correlation as teams can win a high percentage of close games in a single season sample size, and, as a result, would have a better record than their point differential would suggest, but point differential is the most predictive statistic of same season winning that we’re going to find. 

It’s also a good predictor of next season’s winning percentage, as this is the best predictor of future winning that we’ve seen yet by far.  However, there are a couple big problems with points per play differential as a statistic. For one, while it is relatively predictive, it’s not all that predictable, predicting itself at just a 35% rate, which gets even worse when you look at points per play and points per play against, which only correlates with itself on a year-to-year basis at about 31.5%. 

That leads into my second big problem with this statistic, that it does a relatively poor job of breaking out offense versus defense, which is likely why points per play and points per play allowed are relatively unpredictable statistics. Return touchdowns by special teams or defense count towards points per play and against opponents’ points per play allowed and field position skews this statistic even more, as good defenses can easily look bad in this statistic if their offense constantly gives them terrible field position to start, and vice versa. 

At first glance, it might seem like a good thing that the gap in predictiveness between points per play and points per play allowed is less than other offensive/defensive statistics, but I think that is a result of neither stat accurately representing the side of the field it is supposed to represent. As we’ll see more going forward, if offense and defense are broken out from each other properly, offense always is significantly more predictive.

The next statistic is a personal favorite of mine, first down rate differential, which includes first down rate and first down rate allowed.

Year to YearWinning %Next Year Winning %
FDR48.42%54.65%29.01%
FDRA41.08%-28.13%-10.27%
FDRD43.20%71.75%33.78%

Right away what stands out is that, across the board, first down rate and its associated statistics are significantly more predictable than anything we’ve seen thus far and, in fact, it is the most predictable statistic year-to-year. It also does a great job separating offensive and defensive performance and, unsurprisingly, there is a significant gap between the predictiveness of offense and defense performance, more so than any statistic we’ve seen thus far. As a result, first down rate correlates with future winning more than yards per play, but yards per play allowed correlates with future winning more than first down rate allowed.

The disappointing thing about first down rate differential is that, while it is significantly more predictable year-to-year and higher correlated with same-season winning than yards per play differential, it isn’t actually more predictive of future winning year-to-year than yards per play differential, at least over the 10-year sample of this study. On top of that, in comparison to points per play differential, it is less predictive of future winning, despite the problems with points per play differential. 

However, there is still a lot to like with first down rate differential and there is a key thing that points per play differential takes into account that first down rate doesn’t that likely explains why it is more predictive. That key thing is special teams, which both yards per play differential and first down rate differential both lack, likely the reason they are not as predictive. Reliable special teams statistics are hard to come by, but one that does a great job is DVOA, Football Outsiders’ signature statistic.

To illustrate this point, I’ve broken out overall, offensive, defensive, and special teams DVOA.

Year to YearWinning %Next Year Winning %
DVOA41.36%87.54%38.67%
DVOA O41.28%70.29%31.10%
DVOA D39.86%-48.32%-18.27%
DVOA ST38.73%26.18%18.62%

Across the board, DVOA does very well, correlating with next year’s winning at about the same rate as points per play differential, while effectively separating out performance in all three phases of the game. The standout here is special teams though, having year-to-year predictability in line with other phases in DVOA and surprisingly correlating with winning and future winning relatively well, given how small a part of the game special teams is. 

In fact, special teams DVOA is actually slightly more predictive of winning than defensive DVOA, at least over the course of this 10-year sample. I would take that with a bit of a grain of salt, but it’s clear that special teams performance has a much bigger impact on winning than most, including myself, would expect. Because of this, I am going to go back and factor special teams more significantly into my season previews.

Given that special teams is likely what makes points per play differential more predictive than first down rate differential, I decided to add special teams DVOA to first down rate differential and see what that does to predictiveness. I played around with different allocations of offensive, defensive, and special teams performance, but I found that 45% offense, 30% defense, and 25% special teams was most predictive, which once again reinforces the importance of special teams.

Year to YearWinning %Next Year Winning %
45/30/2546.05%75.55%39.20%

Just by adding special teams to first down rate differential, we get a statistic that is more predictive than anything we’ve seen so far. We can do better than this though. Since we know that yards per play allowed is more predictive than first down rate allowed, let’s see what happens when we swap yards per play allowed into this hybrid statistic. Once again, I found the 45/30/25 split was most predictive.

Year to YearWinning %Next Year Winning %
45/30/2547.34%75.16%41.50%

This gets us to an impressive number when you consider that winning percentage itself predicts future winning percentage at just a 25% rate. NFL records are very tough to predict year-to-year, but having a statistic that correlates with future winning percentage at a 41.5% rate is a very useful tool for handicapping. 

For the record, I tried swapping in points per play allowed and defensive DVOA and both lowered the predictiveness significantly. Points per play allowed didn’t surprise me because, even though it was predictive, it includes things that the offense is already being given credit for. Defensive DVOA surprised me a little, but it’s not a very predictive statistic year-to-year, so it’s not a huge surprise that including it did not have a positive effect on predictiveness.

Let’s see how each team performed in this metric in 2020.

BUF2.81%
NO2.23%
KC1.68%
SEA1.55%
BAL1.44%
IND1.28%
TB1.18%
ARZ0.97%
GB0.92%
NE0.73%
LAR0.61%
SF0.58%
TEN0.17%
WAS0.12%
CLE0.10%
PIT0.07%
CHI0.03%
CAR-0.04%
LV-0.16%
DAL-0.17%
MIA-0.23%
DET-0.45%
MIN-0.46%
NYG-0.65%
HOU-0.79%
ATL-1.10%
PHI-1.15%
LAC-1.65%
CIN-1.91%
DEN-2.14%
JAX-2.32%
NYJ-3.22%

Obviously, this can’t be blindly followed, as 41.5% correlation is still not that high and a lot changes for teams from season to season to affect their performance from year-to-year, but this is a much better base point to start with than win/loss record.

I also wanted to show a few other breakdowns. This one shows yards per play differential broken out into pass offense, pass defense, rush offense, and rush defense.

Year to YearWinning %Next Year Winning %
PYA35.26%58.72%24.34%
PYAA37.21%-45.59%-25.51%
RYA27.64%11.24%7.13%
RYAA21.94%-9.14%-24.38%

Unsurprisingly, offensive statistics are more predictable and predictive than defensive statistics and, also perhaps unsurprisingly, pass statistics are more predictable than rush statistics and by a significant amount.

Let’s take a look further at passing statistics.

Year to YearWinning %Next Year Winning %
PYA35.26%58.72%24.34%
Completion %45.08%49.02%15.91%
TD%25.31%60.52%24.10%
INT %18.82%-50.43%-19.86%

We see that completion percentage is much more predictable year-to-year than any other metric, but yards per play correlates better with winning and next year’s winning. Touchdown rate also correlates with winning and next year’s winning, but is tough to predict on a year-to-year basis. Interception rate is as well, but it’s notable that it’s significantly more predictive than turnover margin, which brings me to my next chart.

Year to YearWinning %Next Year Winning %
INT %18.82%-50.43%-19.86%
Def INT %11.73%48.71%3.65%
Fumbles Lost2.43%
Fumbles Recovered-3.23%

While turnover margin itself is very unpredictive, interception rate seems to at least have some predictive value, which makes sense, given that passing offense is what tends to be most consistent year-to-year. Teams who fare well in turnover margin as a result of having a quarterback who had a low interception rate are more likely to see their turnover success continue than teams reliant on defensive takeaways or avoiding fumbles. For fumbles, I didn’t even bother calculating its relationship to winning because of how unpredictable it is year to year. There is no predictive value to a statistic you can’t reasonably predict and fumbles are a perfect example of that.

Year to Year
1st/2nd34.96%
3rd/4th38.75%
1st/2nd vs. 3rd/4th differential10.52%
1st/2nd allowed29.71%
3rd/4th allowed30.81%
1st/2nd vs. 3rd/4th allowed differential12.22%

This is the last one I want to show for now. I may add more to this later, but this breaks out the year-to-year predictability of first down rate and first down rate allowed between early downs (1st and 2nd) and later downs (3rd and 4th). I didn’t correlate these statistics with winning because it’s obvious that better success on 3rd and 4th down leads to better results on the scoreboard, but it’s worth noting that those downs don’t tend to be any more predictive than early downs and there is minimal, if any, evidence that teams can consistently outperform their 1st and 2nd down performance on 3rd and 4th down year-to-year, as there is very little year-to-year correlation in the differential between early down and later down performance.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s