Friday, April 8, 2011

PMF Data 2009-2011

[Update 4/11/2011: As promised to those of you looking for CSV/Excel formatted data, here it is:]

Here is another quick update to let you know that I have made available all of the finalists data from 2009-2011, in JavaScript Object Notation (JSON) format. If you ask nicely and want it, I can offer it in other formats as well.

The data can be retrieved here:

Available fields and descriptions are as follows:

  • label: Either an MD5 hashed version of the original finalist name, or, because I didn't have the names available when I imported the data, something like "applicantX" where X is an incremental number.
  • type: Currently only finalists are available, but when I get to it, other valid values, for which there are available rows, will be "semifinalists" and "nominees."
  • year: The PMF class year. Not every record type is available for every year.
  • rank: This is just the database unique record identifier; you don't really need it for anything.
  • school: The corrected name of the school the individual PMF attended. By corrected, I mean the standardization I undertook as part of the record cleanup.
  • field: Individual's academic field. No effort to standardize or clean these up occurred.
  • latlng: The latitude and longitude of the school, as determined by a separate geocoding script. I expect some percentage of error to have occurred here, but see below for error reporting.
If you have questions about the data or spot any obvious errors, please let me know in the comments. As stated above, I have the greatest expectation of errors in the latitude and longitude data, but this can be fixed pretty easily if you just tell me which school is wrong, and what the correct lat/long should be.

Also, feel free to use the data however you see fit. If you have anything you're trying to put together, I would be happy to link to it. Similarly, I would be happy to help if you want data that's not currently there (assuming I have it).


    If you want other formats, it might help to let me know exactly WHAT formats.

  5. I should have prefaced the blog post with an indication of the general usefulness of the data in this format for non programmers. Which is to say, it's not. I will update this post with new formats as I make them available.

  7. Do you have stats on how many finalists actually found fellowships?

  8. The data is now available as CSV for anyone who wants to see it in Excel (see the top of the post). I should note that there is an extra field in this one, and it denotes whether the finalist was a veteran or not. It is either Yes or blank. You'll see it if you look for it.

