You may be familiar with an earlier post I made a year ago looking at where NBA playoff round 1 minutes "came from". I decided to expand the idea here, mostly in order to accomplish a couple things. One, work on python and webscraping to see if I could get more used to grabbing a whole bunch of data from the internet, and two, see if I could make an interactive SVG. I'd say that it worked out for the most part.
If you would like the raw data or code, just request in the comments and I could send it to you. I used python's native IDLE text editor though, so my files are sort of all over the place, and it might not make coherent sense.
Unlike my previous post, here I am just going to deal exclusively with total minutes, rather than the sum-average. With this much data, I figured the effects of injuries would just end up leveling themselves out anyway.
The data collection portion worked out pretty well, and I was a little like that picture of a guy holding onto a bunch of limes, except with numbers. It took a little work to decide which data to use and how. I have here more than a decades worth of data that I gathered.
By School
So here is a similar chart to what I had last time, looking at total minutes across different schools. Make sure you note that the image below is INTERACTIVE, at least if you are using a "modern" browser. So I can't really know what this will look like in older version of IE or different mobile browsers, etc.
NOTE: Seasons are coded by the year in which the ENDED. So the 2005-2006 season is described above as 2006
NOTE: The 2011-2012 season was a shortened season (66 games vs. the usual 82), so everyone should be down on average.
That Arkansas-LR you see in the playoffs is basically Derek Fisher single-handedly putting that school in the top 25.
Click around and have fun. I should also note that I relied on this guide as well as a bunch of google searches. I am actually not too familiar with javascript, so that was the most painful part of it. Also, if you would want to replicate something like this with a lot of data, I would recommend using a program to write the SVG for you and not to do it by hand (or even with Inkscape). I again used python to read the data I had gathered and transform it into an SVG.
By Conference
Again, I followed up on the same idea as my previous post and also did the numbers by conference. However, I used the CURRENT alignments as of this season of college basketball, eg. Maryland would still be ACC, etc. Use this ESPN page as reference if you are confused with all the realignment talk. All non division I conferences are grouped together.
By Age
I also had the birth dates, so I tried to see if I could put it to use. Below we have the age distributions of the minutes played by players. I thought it would be an interesting question to ask whether there is an age distribution difference between regular season and playoff games. I made a double histogram to compare. Notice that the axes are different.
You see that there is a clear difference between the two distributions. I could have done some stats test here to compare, but it would involve a little reading to see what sort of model is appropriate, so I left it as is.
Heat Map
I also wanted to see if I could make a heat map over the US to see where these players came from. I used birth place (maybe not the most accurate thing), particularly because it was easy to get from the ESPN profile pages of the players. I did not, however, make this map myself. That would have been a whole lot more time spent learning how to do something I have no idea how to do. Instead I found that Bing (yes, Microsoft does some cool stuff sometimes) has a Heat Map generator. I had to find a way to convert Location names to latitude and longitude though, but there again I found a good resource and was able to do it in batches.
These maps don't look all that great since they pretty much look like population density maps, but there are little nuances in there, in case anyone is interested. I should also note that I hate the idea of By-State Heat Maps, which seem totally pointless, especially if not normalized to population sizes. Anyway, these look ok.
By Time
I also made some line graphs for some of the top schools (and international and high school players). I was originally going to make another interactive SVG with this data, but I got burned out with the process. It involves a lot of trial and error, and I just wanted to get done with this project already. So here they are. Not Kentucky's funky jump in the last season here. Especially when you consider that 2011-2012 (the last year included) was an incomplete season, Kentucky's jump is especially note-worthy. There are some other interesting pieces of information.
Nice work!
ReplyDeleteIf you're interested in making more interactive svgs, I'd suggest checking out d3.js