Open Data’s Access Problem, and How to Solve it

The recent Gov 2.0 summit in Washington D.C. saw several promising new announcements which will help government agencies share code and best practices for making public data available to developers.

The idea behind new projects like, the FCC’s new developer tools and the Civic Commons is that by giving developers access to data previously stored in dusty filing cabinets, they can create tools to give ordinary citizens greater access to that data.

Unfortunately, not everything open data project leads to good things. It is critical that if open data is made available on the web, it must be accompanied by some effort to ensure everyone can access it.

We’ve seen an explosion in creative hacks that use this newly available data to provide excellent online resources. Public data sites like EveryBlock, or the Sunlight Foundation’s Design for America contest have highlighted some of the amazing ways open data can make our lives better. Whether it’s finding out crime stats, real estate values, health hazards and business license statuses in your neighborhood, or visualizing how the government is spending your tax dollars through innovative maps, open data and what you can do with it is the current hotness among web developers.

Most of the benefits are close to home — in the U.S., just about everyone has access to online government resources thanks to web-enabled computers in free public libraries.

But extend that argument to the rest of the world and the number of people that really have access to the data drops significantly. If you don’t have an easy way to get online, you can’t benefit from open data.

Michael Gurstein, Executive Director of the Center for Community Informatics Research, recently highlighted some of the problems with open data accessibility.

Gurstein points out a number of assumptions about open data that are often overlooked by those most enthusiastic about making such data publicly available.

Worse, he shows how such data can be used against you.

Gurstein’s example of the dark side of open data is Bangalore, India’s digitization of land records, which gives every citizen a way to see who owns what in Bangalgore. On the surface, it seems like a good thing, but the upper classes and corporations have been using the land records data to gain ownership of land from the unknowing poor.

The data, writes Gurstein, allowed the well-to-do to instruct surveyors and lawyers how to most effectively “challenge titles, exploit gaps in title, take advantage of mistakes in documentation, identify opportunities and targets for bribery” among other things. Details are in this PDF.

It isn’t necessary to go all the way to India to find examples of open data leading to unintended consequences.

In an e-mail exchange, Gurstein told me of a similar case in Nova Scotia where efforts to make titles, deeds and other land data led to very same situation — companies pouring over 19th century deeds, ancient maps and other newly available data, finding oversights, misfiled papers and other means to seize land from owners.

Of course unintended consequences aren’t a reason to stop making data available. For Gurstein, the solution is to make sure that open data isn’t just thrown onto the web, but that universal accessibility is built in so it can really benefit everyone.

How that is done will vary considerably by location and the type of data in question, but without such efforts Gurstein worries that “the outcome of ‘open data’ may be quite the opposite to that which is anticipated (and presumably desired) by its strongest proponents.”

It might come as a shock to some of the more enthusiastic open data proponents, but there is more to open data than just dredging it out of the Indiana Jones-style warehouses where it currently gathers dust. Putting it online for “anyone” access and just walking away isn’t necessarily a recipe for good things.

Gurstien also pointed out several solutions to me, which he lists in a follow-up blog post. These solutions would help ensure that what happened in Nova Scotia and Bangalore won’t happen elsewhere. Among the things he believes governments and other data providers need to take into account are:

  • Advocacy — Perhaps the most important of Gurstein’s guidelines is to ensure that everyone knows the data is available, making sure that a community’s resources are sufficient for turning the data into some kind of project with local benefits.
  • Internet access — Especially a concern in rural areas, the level of internet access is the cornerstone to open data. Just because data is on the web does not mean everyone can get to it. And if not everyone has access, then your data isn’t “open.”
  • Content and formatting — If the data just a raw GIS database that most people won’t understand, then even internet access doesn’t matter because only those with specific skills (or the money to hire them) will be able to do anything with the data.
  • Computer/software skills — Similar to content and formatting issues is having access to GIS tools and other specialty software. As Gurstein says “techies know how to do the visualization stuff, university and professional types know how to use the analytical software but ordinary community people might not know how to do either.”

It’s also worth pointing out that Gurstein has several examples of open data being used in constructive ways. He isn’t arguing that we shouldn’t put government and other data online, just that we should keep in mind that the data isn’t necessarily useful to everyone in its most raw forms.

As Tim O’Reilly notes in conjunction with Gurstein’s post, “we need to think deeply about the future” — to consider all the consequences of open data, not just the ones we’d like to see.

Punchcard scan by Steve Collins/Flickr/CC

