Sunday, August 12, 2012

Ranking NFL Offensive Lines 2011

Introduction

Offensive lines are generally the only position set in football where there are no reliable statistics to measure value of the players. There are some superficial stats like sacks given up or total rushing yardage, but I wanted to build something more reliably dependent on the offensive line. This project is an attempt to create such a statistic, at least for run blocking. Pass blocking would be another measure that would be a lot more difficult.

The general idea for the statistic is average rush yardage for only the first 5 yards. Any run that extends beyond that would not be counted. Also, sack yardage (usually scored as negative rush yardage) would not be counted, since the line is actually going to be in pass blocking, and it doesn't make any sense to include in the statistic.

In my thinking, this is the best way to measure the effectiveness of the offensive line at run blocking. Of course, lineman often release to the second level, so they contribute sometimes beyond 5 yards, but I thought 5 would be a good cut off. Anything the running back or the downfield blockers do beyond that point is past the scope of the offensive line. Or so that's how I imagine it.

Source

In order to create the statistic though, you need play-by-play data, which is not freely available anywhere. I'm sure the agencies that do stats for the leagues do have a database of all plays.

Well, when I wanted to learn about webscraping and try it out, this is the project I took on to learn. Using ESPN's play-by-play pages for each game in 2011, I was able to gather information on every play in the NFL last year. Obviously, it would have taken quite a long time of copy pasting to have done it manually, so instead I made a program in Python to do it for me. Automatically reading the html files of each game page and recording the information in a couple of tables.

The program is hosted on Scraperwiki.com, and you can access it here: https://scraperwiki.com/scrapers/espnfootballscoresworking/

If you are familiar with Python, I invite you to play around with it or to build on it. I literally had 0 experience with Python before I started the project.

There is a lot of possibilities for different sorts of measures using a play-by-play database like this, and I will be using the dataset probably to do other projects. As you may have noticed though, the play description is copied as a whole from ESPN, and no parsing is done in Python.

I parsed the play description in Excel, to obtain what type of play it was, if it was successful, if there were penalties, if they were accepted or declined and all that. This process required A LOT of trial and error, and was more fit for Excel, where you can see your errors in real-time.

Analysis

The analysis was simple after the play descriptions were parsed. Each rush attempt on every play was broken down into yardage (negative to 5) and "gap": indicated as 1) left end 2) left tackle 3) left guard 4) middle 5) right guard 6) right tackle and 7) right end. These were decided on by whoever writes the play descriptions for ESPN.com. I then just averaged all these rushing attempts as a whole and also at each "gap".

Results

Overall run blocking


Please note that the graph is not zero-anchored, it starts at 2.25 to show better contrast.

The Saints OL appears to be the best by far on run blocking. The Panthers separate themselves from the bulk of the league as well. On the opposite end, the Falcons and especially the Titans OLs look pretty bad in this analysis. It is interesting to think that maybe the quality of the offensive line in Tennessee may have contributed as much to Chris Johnson's off year as anything else.

Per Gap Results

In the table below, the top 5 in the league at each gap are highlighted in blue, and the bottom 5 are highlighted in red.


Overall Left End Left Tackle Left Guard Center Right Guard Right Tackle Right End
League Max 3.184 3.72 3.70 3.50 3.10 3.40 3.41 3.83
League Ave. 2.872 3.12 2.85 2.90 2.76 2.84 2.89 2.87
Saints 3.184 3.72 3.13 3.41 2.67 3.40 2.78 2.98
Panthers 3.102 3.69 3.18 2.71 2.73 2.86 2.95 3.83
Patriots 3.000 3.28 2.90 2.96 2.67 3.12 3.41 2.94
Bills 3.013 3.67 3.00 2.76 3.06 3.03 2.32 3.41
Broncos 3.004 3.21 2.60 3.05 2.92 2.83 3.20 3.44
Eagles 2.986 3.61 2.57 2.35 2.83 2.70 2.90 2.91
Steelers 2.978 2.68 3.70 3.25 2.63 2.98 3.13 3.33
Vikings 2.973 3.37 2.97 2.70 2.85 3.11 3.18 2.86
Buccaneers 2.949 2.76 2.23 3.20 2.74 3.27 2.97 3.47
Jets 2.918 2.95 2.08 3.00 3.10 3.13 3.02 2.54
Cowboys 2.916 3.22 3.19 2.91 2.98 2.71 2.62 2.71
Dolphins 2.904 3.23 3.11 3.05 2.81 3.28 2.17 2.56
Jaguars 2.902 2.81 3.07 3.02 2.97 2.70 2.52 2.85
Browns 2.899 2.96 3.04 2.80 2.89 2.69 3.31 2.95
Cardinals 2.882 3.15 2.80 2.95 2.63 3.00 3.24 2.86
Ravens 2.864 3.41 2.92 2.89 2.46 2.74 3.17 2.67
Packers 2.884 2.96 2.91 2.65 2.81 3.11 3.19 2.68
Redskins 2.870 2.81 2.52 3.26 2.71 2.96 3.04 2.97
Texans 2.863 2.67 2.79 2.76 2.82 3.32 2.87 2.89
Bengals 2.832 3.16 2.89 2.73 2.62 2.48 2.84 3.15
Colts 2.818 2.94 2.32 3.06 2.89 2.62 2.89 2.88
Chiefs 2.817 3.32 3.43 3.00 2.52 2.40 2.78 2.74
Rams 2.805 3.50 2.76 2.76 2.42 3.13 2.87 2.88
Seahawks 2.805 2.20 3.06 2.82 3.01 2.16 2.97 2.41
Raiders 2.800 3.08 2.94 2.35 2.71 2.48 3.21 2.61
Bears 2.773 2.73 2.95 3.09 2.86 2.41 2.56 2.82
Chargers 2.787 3.49 3.14 2.50 2.74 2.05 2.62 2.54
Giants 2.749 2.73 2.60 3.47 2.76 2.53 2.49 2.81
49ers 2.746 2.98 3.01 2.06 2.45 2.63 2.87 3.07
Lions 2.708 3.09 2.77 3.50 2.49 3.06 2.59 2.40
Falcons 2.599 2.88 2.25 2.90 2.39 2.20 3.00 2.65
Titans 2.506 2.42 2.62 2.68 2.45 2.54 2.27 2.61

Wednesday, May 16, 2012

Breakdown of NBA Playoff Minutes

I'll just run down the methodology really quickly before presenting the data:

I've seen various breakdowns across different sports where number of players are compared between certain categories. Here, I wanted to take a separate approach and weigh each player by his average minutes played (also possible to use total minutes played, but injured players get penalized). I originally wanted to do this with the regular season, but the data becomes insanely unmanageable. I could possibly do it in the future.

Instead, I looked at average minutes played for every player in the 2012 NBA Playoffs Round 1. I gathered some other information about the players. And here are the results.

The numbers in the graphs below are "sum averages". It might not make intuitive sense, but keep in mind that it is not total minutes.

First is a breakdown by college. Here I include only the top 20 schools (out of the total 77 represented). There were a few borderline cases, and there I tried to use the school played during the most time. I include High School and International players as well for comparison.


I had already expected UCLA to be high on the list. I really started the project to see if my suspicions were true, but the rest of the info that came out of it is pretty cool. We are also now entering the era where the "Straight-From-High-School" Era players are hitting their prime.

Then I grouped the data into conferences. It should be noted here that I used the most "current" alignments. So Syracuse and Pitt, for example, are listed under the ACC, not the Big East. I just had to choose some sort of rule that I could follow consistently. I also adjusted the numbers to a 10-member conference. Simply by dividing and multiplying so that conferences with fewer schools wouldn't be penalized, and conferences with more schools wouldn't benefit simply from having more members.


I grouped the data into a couple other variables that I thought would be interesting. The following is a breakdown of American players based on what year they left college (plus straight from high school players).


As a note, the top of the list is dominated by High School and Freshman players (and International), but as you move into the middle and bottom of the data, it is dominated by Soph-Jun-Seniors.

Lastly, I broke down the International players (strictly defined as players who did not play college or high school in the US). I determined "ethnicity" by the country in which they played in their first professional ball.