Tuesday, June 23, 2009

GRUPP Day 1: Thoughts on data collection

Some preliminary thoughts at lunch:

I'm at the GCamp@RUPP meeting in Phnom Penh right now, hosted by the Royal University of Phnom Penh. The subtitle is "Exploring Emerging Technologies to Address Emerging Infections."

Wow, data collection is hard. I knew this from spending 10 years designing and implementing systems in non-profits in California, but I had no idea what the challenges were in the developing world. Many of us here from Google, primarily engineers, came here with some naive notions of how we can help. Here are just some of the issues we've been rocked by:

Lack of connectivity: So, it is common knowledge that SMS is used the world over, aside from the US, often more than voice connections. But in some emerging countries, it isn't. Think about this: Khmer, the language of Cambodia, isn't yet represented in Unicode. No UNICODE! Apparently, they just didn't have people representing Khmer during the meetings that setup Unicode. Klingon and Tolkein's Elivish are in Unicode, but not Khmer. This has a deep impact on technology: SMS doesn't work in Khmer. Plus no phones are made with Khmer keyboards, and to even produce an app, like a form, you have to render Khmer as an image file. Fortunately, this is being remedied, but it'll be a long time before phones and computers can support it.

Incentives: What incentive do people have to provide information? User generated content needs to give something back, and not create perverse incentives.

Unavailability of data: Many in the Open Source community want all data to be free. But many governments have incentives not to share. For instance, a disease outbreak can cause a catastrophic drop-off in tourism. So if the government doesn't share the information that there is one, then people are less likely to stay away. See also: Privacy

Anyway, just some thoughts, more later.