Beta release of 2013 Lahman Baseball Database available

Photo by Keith Allison

These guys are excited about the update…
Photo by Keith Allison

Beta versions of the 2013 database are available now from the download page in three formats: Microsoft Access database, CSV files, and SQL tables. More details are available in the readme file.

I’ve done consistency checks on this version and think its in pretty good shape, but there area few outstanding issues. A handful of awards have not been anounced yet, and neither have the Hall of Fame results. I’m still working on adding some data fields to the pitching table (SF, SH, GDP) to help facilitate calculating BABIP and other advanced metrics.   I’m hoping that early adopters and hardcore database users can do some field testing and let me know if there are any problems to be resolved. I’ll make a full release in January 2014, after the new HOF class is announced. Please send any feedback to my by email or post your questions/comments/concerns to our  discussion group.

I’m planning two additional data releases in January or thereabouts. One will be a separate set of tables with splits and other supplementary team and player stats derived from Retrosheet data.  These will include home/road spilts and left/right splits, as well as some others.  Some users with advanced data skills are able to access those datasets on theior own, but there are an increasing number of open source data sets (Pitch F/X, MLB Gameday, Retrosheet, etc.) that require some programming chops to acquire in a useful format.  I’d like to help bring these datasets to a usable form for folks who don’t want to write python scripts or learn how to parse XML files.

I’m also planning to release a set of supplementary tables that contain lists of things like no-hitters, players who’ve hit for the cycle, hitting streaks, etc.  Theyll be tagged with playerIDs and teamIDs that make integrating them easy. I’m releasing these two collections as separate collections that users can import on there own if they want the whole kit and kaboodle, but the main database won’t become unnnecessarily cluttered.  A number of peope have shared  these over the years, and perhaps you If you have suggestions about what you’d like to see in these supplementary files or have datasets you’d like to share, please let me know.

–Sean Lahman