A projection system is a statistical model which takes into account historical data to predict future performance. Some famous projection systems include ZIPS:
The 2021 ZiPS Projections: An Introduction
The first ZiPS team projection for 2021 goes live on Wednesday, and as usual, this is a good place to give reminders…
Which is an advanced projection system for baseball players. Similarly, PFF (Pro Football Focus) is a site to go to for NFL projections.
The most basic projection system needs 3 things:
- Information about a players previous performance (data from previous seasons)
- A factor or aging curve which takes into account the fact that younger players are likely to improve their performance while older players are likely to see a decline in their performance.
- Information about the league mean (what the average player will accomplish in any given year/season).
An advanced projection system would return a range of values based on the likelihood of a player accomplishing. If you recall some basic statistical knowledge about the normal distribution — the percentile of a statistic is similar to the idea of a standard deviation-it measures how far a particular value is from the mean of the data. For our sports example, a 0.1 (10%) percentile means that 90% of players outperformed an individual in that statistic.
We would read this graph like so: after running multiple projections, the top 90% of the projected values were around 35, the bottom 10% of projections was a value of 20, and the 50% projection is 30. So, if you believe a player is likely to under perform his/her projections- you would likely look at the 10% projections. Similarly, if you thought they would over perform their projections look towards the 90th percentile projection. Most projection systems will take the 50th percentile (0.5 or 50%) to be safe. This is the average value computed by the projection system after MULTIPLE simulations.
For today’s post, I decided to show some work I did on Kaggle projecting statistics for NFL QBs. Note that projections have value outside of sports — for example if you were a YouTube creator, you may want to project the amount of views you would get on your next video based on data from your previous 5 videos.
I worked to implement a very simple projection system- just using data from the previous 5 or so seasons of a players performance. I collected data from https://www.pro-football-reference.com/players/M/MannPe00.htm.
For example to project the number of Touchdowns (TDs) Tom Brady would throw in 2016 (I used Data from the 2011–2015 seasons and compared my projection with the actual result), I used the following code snippet:
td_brady_16 = 0.960*(sum(df_brady['weight']*df_brady['TD%']*df_brady['Att'])
Here I have 0.96 — which is the current age weight for Brady’s age 39 season. I then find the sum of the ‘weight’ column times the previous season’s TD percentage (you’re less likely to have errors due to outliers such as injuries if you project the TD% rather than just the TDs). The weight factor takes into account that I want to weight the most recent seasons more stronger than historical seasons (i.e the weight for the 2015 season is larger than the weight for the 2014 season and so on). Finally, I’m also taking into account the average TD rate for that season. By multiplying the projected ATT (passing attempts) for that season- I get Tom Brady’s projected number of TDs thrown for the 2016 season. The denominator is simply a normalization factor.
For this example, my model predicted Tom Brady would throw around 23 TDs, and he ended up throwing 28 that season.
You can use these techniques to similarly project INTs (interceptions), Yards thrown, ect. Check out the full repository here:
NFL QBs: Marcel Projection System
Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources
Thanks for reading!
If you want to see more of my work: