IntroductionOffensive lines are generally the only position set in football where there are no reliable statistics to measure value of the players. There are some superficial stats like sacks given up or total rushing yardage, but I wanted to build something more reliably dependent on the offensive line. This project is an attempt to create such a statistic, at least for run blocking. Pass blocking would be another measure that would be a lot more difficult.
The general idea for the statistic is average rush yardage for only the first 5 yards. Any run that extends beyond that would not be counted. Also, sack yardage (usually scored as negative rush yardage) would not be counted, since the line is actually going to be in pass blocking, and it doesn't make any sense to include in the statistic.
In my thinking, this is the best way to measure the effectiveness of the offensive line at run blocking. Of course, lineman often release to the second level, so they contribute sometimes beyond 5 yards, but I thought 5 would be a good cut off. Anything the running back or the downfield blockers do beyond that point is past the scope of the offensive line. Or so that's how I imagine it.
SourceIn order to create the statistic though, you need play-by-play data, which is not freely available anywhere. I'm sure the agencies that do stats for the leagues do have a database of all plays.
Well, when I wanted to learn about webscraping and try it out, this is the project I took on to learn. Using ESPN's play-by-play pages for each game in 2011, I was able to gather information on every play in the NFL last year. Obviously, it would have taken quite a long time of copy pasting to have done it manually, so instead I made a program in Python to do it for me. Automatically reading the html files of each game page and recording the information in a couple of tables.
The program is hosted on Scraperwiki.com, and you can access it here: https://scraperwiki.com/scrapers/espnfootballscoresworking/
If you are familiar with Python, I invite you to play around with it or to build on it. I literally had 0 experience with Python before I started the project.
There is a lot of possibilities for different sorts of measures using a play-by-play database like this, and I will be using the dataset probably to do other projects. As you may have noticed though, the play description is copied as a whole from ESPN, and no parsing is done in Python.
I parsed the play description in Excel, to obtain what type of play it was, if it was successful, if there were penalties, if they were accepted or declined and all that. This process required A LOT of trial and error, and was more fit for Excel, where you can see your errors in real-time.
AnalysisThe analysis was simple after the play descriptions were parsed. Each rush attempt on every play was broken down into yardage (negative to 5) and "gap": indicated as 1) left end 2) left tackle 3) left guard 4) middle 5) right guard 6) right tackle and 7) right end. These were decided on by whoever writes the play descriptions for ESPN.com. I then just averaged all these rushing attempts as a whole and also at each "gap".
Overall run blocking
Please note that the graph is not zero-anchored, it starts at 2.25 to show better contrast.
The Saints OL appears to be the best by far on run blocking. The Panthers separate themselves from the bulk of the league as well. On the opposite end, the Falcons and especially the Titans OLs look pretty bad in this analysis. It is interesting to think that maybe the quality of the offensive line in Tennessee may have contributed as much to Chris Johnson's off year as anything else.
Per Gap ResultsIn the table below, the top 5 in the league at each gap are highlighted in blue, and the bottom 5 are highlighted in red.
|Overall||Left End||Left Tackle||Left Guard||Center||Right Guard||Right Tackle||Right End|