Saturday, April 2, 2011

2011 PMF Data Visualization: Semifinalists vs Finalists

[Also posted here]

I spent a good deal of time gathering and sifting through the lists of 2011 semifinalists and finalists, cleaning up school names and gathering latitude and longitude information for each of the schools I saw represented in the data sets. At present, I have not had a chance to do the same for the nominees list, because it is so much larger than the other two sets of data. Once I do, I will showcase what I find, hopefully presenting it in a nice interactive tool so that you can see the sheer drop-off in numbers, especially as a function of geographic distribution.

In the mean time, what I present here are two graphics I extracted from my current visualization efforts, which seek to present this year's PMF program in terms of its geographic distribution. It is of course centered on the US, not only because there are many fewer applicants from non-US schools, but also because I had to have a starting point to make my representation. I will adjust my visualization settings later to indicate the scales of the global distribution of this program, which in some ways out-performs the reach of the PMF program in a certain class of schools within the US (I mean in this case HBCUs, or historically black colleges and universities, whose representation in the PMF program has been marginal in the past). All this is to say that more visualizations are forthcoming just as soon as I can find meaningful ways to express them.

Now let's get on to some graphics. You will want to open these up to see them full size, since this blog theme limits their visibility considerably.


In this first image you can see the geographic distribution of the 2011 semifinalists. The markers are sized according to the number of semifinalists from each of the nominating schools (though see below for some additional detail on my cleanup approach). The legend below indicates the relative sizes, and I should point out that the largest circle is for schools that had 60 or more semifinalists. In all, there were approximately 280 schools represented among the 1530 semifinalists. You will no doubt notice the heavy presence of East Coast schools, especially centered around DC, which should be no surprise; what may be surprising are the volume of semifinalists at Upper Midwest and West Coast schools.


In this second image, which depicts the schools with finalists, you can see a very noticeable decline in the scales of semifinalists and a less noticeable drop in the scope of geographic distribution. Gone is the apparent advantage exhibited in the previous graphic of both the West Coast and Upper Midwest schools. It is quite obvious that East Coast schools are massively overrepresented in this program (and someone has already done a breakdown of degree programs, so we know what that picture looks like). Since there were many schools with single digit nominees, it is also expected that there would be fewer schools represented in the finalists data. From 280 in the semifinalists round, we drop to 210 schools among 858 finalists. That is, approximately 25% of the schools that were represented in the semifinalists data ultimately failed to put forward finalists this year. This is a testament to both the competitiveness of the program and the long road it still has ahead of it to market itself to every eligible graduate school.

Finally, let me talk a bit about the data. The biggest challenge in an operation like this is that with so many data points to deal with, it is incredibly difficult to conduct 100% quality control. There are errors in the data, and I am aware of a few that I have not corrected yet. Additionally, the PMF Program Office, in conjunction with the schools who feed it their nominees, tends to make what I would consider needless distinctions in the school names. For instance, in the lists on the PMF site, you may notice that Harvard has four or five distinct names, one for Harvard University, and the rest for things like the law school, the divinity school, and the like. I realize that students at these schools, and the schools themselves, often pride themselves on such distinctions, but I assure you it makes data analysis an even greater chore. Where possible, I have consolidated schools to the common university names. Besides, it would be utterly meaningless for me to depict semifinalists and finalists at that granularity, because all you would see is a set of concentric circles centered on the latitude and longitude of Harvard, for instance. In addition to name consolidation, I have also expanded each entry to the full text of the school names, which was a prerequisite to gathering the geolocation information. This will become apparent once I am satisfied with and release the interactive tools.

I am interested in what you think of what I've presented, both in my approach and in what the data has to say. Also, let me know what other kinds of views you are interested in. My tools are probably capable of generating pretty much anything, so just let me know.

26 comments:

  1. Can you plug the numbers into a stat program and confirm the geographic disparity in finalists' representation?

    ReplyDelete
  2. I'm not exactly sure what you are requesting. If you like I can provide the data I am using.

    ReplyDelete
  3. I don't suppose you happen to have the placement data for last year's class on a similar geographic basis? Thanks for all you've provided on this blog by the way; quite a helpful resource for us budding PMFs.

    ReplyDelete
  4. I haven't seen any 2010 maps, but Aaron Helton did one for 2009.

    http://aaronhelton.wordpress.com/2009/08/03/348/

    ReplyDelete
  5. PMF fellow, would you mind posting the list of finalists from 2010 and 2009? I am interested in contacting Finalists from those years who have secured appointments from my school. Maybe they can help me out.

    ReplyDelete
  6. I have the finalist lists from 2009, 2010, and now 2011 and will post links to them shortly. As far as placement data, @9:30 PM, can you be more specific? Do you mean the breakdown of the agencies that placed finalists?

    ReplyDelete
  7. Outside of the London School of Economics, there are only a handful of Finalists from foreign universities.

    I see one each from: Oxford, Cambridge, University of Geneva, St Andrews, University College Dublin, American University of Cairo, American University of Beruit, McGill University, and Universitat Pompeu Fabra.

    Considering the amount of American grad students in Canada, the UK, France, and Australia, this seems low.

    ReplyDelete
  8. On placement data, I was thinking actual geographic location of fellows once a placement has been made. So not by agency necessarily, but just where, geographically, PMFs actually end up on a given year. I know the bulk are in DC, but curious on specifics. This information may not be available, but if it is it would be nice to see.

    ReplyDelete
  9. Yeah, I don't have anything like that. If it exists, OPM would have it. They don't seem to be sharing, but I would love to see it myself.

    ReplyDelete
  10. @pmfellow

    These maps are fascinating! A few thoughts/questions:

    1. Have you considered breaking the maps down into thirds (West, Midwest, East) or quadrants to allow for a higher level of detail (especially in cases where many schools are grouped closely together)? A set of 3 or 4 maps (with scale and key kept consistent) might be a good way to achieve this.
    2. Might it be a bit easier to parse the data if each circle representation (signifying size 0, 12, 24, 36, 48, 60) had a unique color as well as a unique size (0=blue, 12=orange, etc)? I hate to disrupt your data-ink ratio (which is so efficient!), but not all of your readers are as graphically-inclined :D
    3. On that note, what is the significance of the size 0 circles in the above representations? Are these instances that have been rounded down from 1?

    Thank you for these compelling visualizations!

    ReplyDelete
  11. I will be happy to provide grater granularity if you like. The point of these particular maps was to illustrate the sheer disparities in geographic representation. A far as color coding, I will see if my tools can do this. I don't see why not. And the size 0 circles were an oversight.

    Incidentally I do plan to present more data as soon as I work out some of these items.

    ReplyDelete
  12. Could you tell us how you created these maps. It would actually be really helpful for me to be able to create a map like this for my thesis and I'm having trouble finding a quick, user-friendly way to do so.

    ReplyDelete
  13. Sure, but that way lies madness :). Also, it depends on your particular threshold as to whether you find my method quick or user-friendly.

    I cleaned up the data and inserted it into a MySQL database, then wrote a PHP script to do the geocoding. Another script queries the database to present the data in the JSON format so it can be consumed by a JavaScript based visualization tool called Simile Exhibit.

    If anyone is interested in the code or contributing to any of this, I am always eager to share.

    ReplyDelete
  14. Haha. That was jibberish to me. I actually was supposed to do a day long training on data mapping and GIS for my thesis but my in-person assessment was scheduled for the same day. Oh well, better to be a PMF than have a pretty map in my thesis. Sounds like a candidate for some of those 80 hours of professional development, though ;) I have data on the location and size of some key stakeholders I have engaged in a particular city and I would love to overlay that information on a map of the city to show the representation I got in different regions. But I'm going to reach out to some people I know at the City and Regional Planning dept at my school and see if they can help. I just thought maybe there was a way I could use what you did instead. Thanks for letting me know what you did!

    ReplyDelete
  15. When do you need to have it done? Now that I have done a bit of this, it wouldn't be terribly difficult to do it again.

    ReplyDelete
  16. That is so nice of you to offer. I would really hate to impose so if you don't have time or decide you'd rather not, I would not be offended at all. But it would be a great contribution to my analysis, if you are able and willing. My plan is to finalize the list by Wednesday and then my draft is due April 12th but I could add it in later if that is too short a turnaround. You can email me at vlaws@berkeley.edu and let me know what you would need.

    ReplyDelete
  17. Hey pmfellow, do you mind posting the finalist lists from 2010 and 2009? Do you have them sorted by school and by placement? Thanks. If you want I'll give you my e-mail and you can just send them to me.

    ReplyDelete
  18. I have both by school, but the agency placement data is a little wonky because the PMF Program Office simply updates the page in place, adding new names as they receive notice of placements. It's not incredibly accurate, so I never could get a handle on it.

    Also, if you are looking for specific names on these lists, you will be disappointed. I have either lost them in the process of aggregation, or I have purposefully obfuscated them. The PMF Program Office may be on firm statutory ground, but I am not as certain about my own publications. In any event, for the things I am doing, I actually don't need the names.

    Give me a couple more days to clean the data up and I will happily post what I have.

    ReplyDelete
  19. @pmfellow

    It just occurred to me that if you did a 3D topographical representation (where steepest peaks corresponded to schools with highest numbers of nominees/semi-finalists/finalists), there might not be a need to break the map into thirds or quadrants (or to resort to color-coding) in order to provide the granularity. Sorry to be persistently nerding out about this, but this data is just so interesting!

    ReplyDelete
  20. Heh, no worries. Got any recommendations for tools that can accomplish this? Preferably in the range of free?

    ReplyDelete
  21. re: April 3 at 7 am, I agree it is low, especially since International Relations/Affairs/Development is 4th or so in terms of degree representation. I'm the finalist from University of Geneva (the Graduate Institute/HEI to be more specific)and I applied a day before the first deadline on a whim, since I didn't even think I was eligible. Foreign schools don't talk up the prestige of the PMF the way American schools do (understandably), and the OPM guidelines on foreign degrees are unclear. Personally, I'm definitely going to try to get the word out to other Americans in the year below me, so at least people know the PMF exists...

    ReplyDelete
  22. Great job, PMF Fellow! Thanks for the anaylsis.

    ReplyDelete
  23. Maybe I missed this but it would be interesting to see a percentage breakdown that showed just what percent of the total applicants that applied got accepted in the end. I am less interested in sheer number of finalists and more curious the overall success percentage from each school. Does that make sense? It would be interesting.

    ReplyDelete
  24. I will see if there is enough data in what I scraped off the pages the last two years to support the analysis, but don't hold your breath.

    ReplyDelete
  25. What I would find even more interesting is the number of non-DC school finalists who, nevertheless, have DC work experience (ie they chose a better/different/other school outside of the beltway)

    ReplyDelete
  26. Yeah, I think it would be interesting to see something like 30 people applied from school x and this percentage are finalists. That is much more telling than just total number of finalists from each school.

    Either way, incredibly cool! The GIS dork in me is loving the maps!

    ReplyDelete