Cleaning up country code data

The OpenSSL Conference was a great success and I’m looking through the data we collected from attendees. One data point we asked for was “Country Code” which produced a wide variety of responses (in no particular order):

Country Code
420
CZ
CZE
Česko
+420
Czech republic
Czechia
Czech Republic
cz

Of course all of these are valid answers to that vague prompt that all point to the same country. Given the conference was in Prague, it’s not surprising we got so many responses from people living there. I would quibble with people who entered full names rather than a “code”, I suppose.

I was surprised with the number of people who entered their calling code until I noticed the question was asked right after asking for phone numbers. This does present something of a problem since we have +1 entries that indicate the North American Numbering Plan (NANP) which covers 25 countries in North American and, incongruously, the Caribbean. The best guess is that these are US people, but we risk miscountrying Canadians if we make that assuption.