A Two-Way Data Pipeline for Governance

Originally posted for the OpenUp? program hosted by the Omidyar Network, November, 2014.

A Two-Way Data Pipeline for Governance: Opening Up the Data Government Holds About the Companies It Regulates

Beth Noveck, The GovLab, New York University, and Jed Miller, TABridge

Many governments are now publicly releasing swaths of new data each month. A growing number of private companies are doing the same, notwithstanding tensions over issues likemandatory disclosure and privacy. Used well, this windfall of raw material can do much to inform policymaking and public debate. But it can do even more—if it can be combined with information from other sources, including crowdsourcing, to improve how government regulates. The practical challenge, however, is that far too much valuable data still sits in silos, gathering dust, unavailable and unhelpful even to the regulators who collect it.

Imagine, for instance, if an inspector at an environmental agency had ready access to data on a company’s compliance with regulatory rules—on clean water, air, and chemical disposal, for example—from across her agency. Imagine if she also had access to data from other agencies about corporate compliance with workplace safety and financial regulatory rules? She could then plan her regional inspections to increase the chance of discovering bad corporate behavior. At the very least, it is worth finding out if access to better systematic data about the companies government regulates and their compliance with laws and regulations could improve the effectiveness and efficiency of regulation.

This is hard to do. In many cases different agencies—or even agencies within the same ministry—use different ID numbers to refer to facilities under their jurisdiction. Even if the air division of the U.S. Environmental Protection Agency has a file on a regional factory operated by “ABC Industries,” the water division may group that facility under ABC’s parent company, “XYZ Corp.” With different ID numbers and different compliance records, these distinct entities might not be linked by their respective data footprints. Compounding these “latent obstacles” to data-driven knowledge are more traditional political, bureaucratic, and cultural obstacles, such as poor communication between different levels of government, or inconsistencies in the monitoring of corporate activity among different sectors.

To open up data-driven policymaking, four key priorities will be important:

Knowing what we have: To understand how we can use open data to govern better, we must first inventory the data government holds, the formats used, and the identifier systems employed in order to develop hypotheses about which data sets will enable more effective compliance. It is only when we look at what data agencies collect about the companies they regulate that we can make surmises about whether, for example, visualizing a company’s environmental track record would be useful to predicting its workplace safety track record.

Making data comparable: For successful data analysis, governments, the private sector, and citizen groups need to compare “apples to apples,” not apples to oranges or apples to apple trees. This will only be possible through the use of widely accepted Legal Entity Identifiers that enable regulators, investors, and advocates to track company activities or follow the money between government, multilateral, and private actors. Groups working to establish greater comparability include OpenCorporates, the Open Contracting Data Standard project, and governments committed to the G8 Open Data Charter, among others.

Making open data two-way: Rather than look to the data collected and held by governments as the only “big data” to inform how we govern, we can explore whether crowdsourcing, already in use by citizens and NGOs to monitor governments and companies, can help fill data gaps and improve data quality. The public actually knows a lot about companies, and more can be gleaned from analyzing social media. In fact, to learn where a company operates, who it owns and who owns it, as well as details about its practices, the best place to start may be with its employees, customers, and insurance and banking partners, not just with activists. Such two-way data gathering, however, is not yet embedded in regulatory practice and faces legal and cultural hurdles to getting there.

In one example of crowdsourced data being combined with more traditional sources, IBM has just embarked on a collaboration with the Open Government Initiative of Sierra Leone, telecom provider Airtel, and other groups to help officials “collect, analyze, and disseminate information” about the Ebola outbreak via “a citizen engagement and analytics system.” Starting from public reports sent via SMS and voice calls, this platform uses the anonymized geographical data to create “heat maps” to help identify areas where Ebola is starting to appear. An unrelated tool, HealthMap.org, used aggregated crowd and social data to surface early indications of Ebola in Guinea almost 10 days before the WHO officially announced the outbreak. HealthMap may have succeeded as an early warning in the Guinea example, but aggregating news and social chatter generates “a lot of noise,” as this World Bank commentary notes.

It takes high-touch analysis by software and analysts to yield “more valuable signals” from the social stream. It will take a great deal of work and learning to develop the tools, practices, and rules for combining crowdsourced data with official data sources to inform how we govern. At GovLab, we are undertaking new work in 2015 to improve our understanding of what kinds of information crowdsourcing can yield about companies, their asset flows, and economic impacts. We will focus on information about extractive sector companies, given their deep economic and environmental impact and the international transparency standards that could help provide a base of data.

Finally, the best start on this path is to foster new models of trust between governments and citizens. Citizens have traditionally been petitioners, not partners, in policymaking. They have “extensive experience in monitoring implementation of public programs, but little exposure to sustained engagement when it comes to planning and budgeting with local governments,” as The Enhancing Transparency Impact (ETI) Project in the Philippines reflected on the OGP website in March. Governments at all levels will need to collaborate with citizen groups to define new models of sustained engagement if citizens are to be enlisted to help enhance what we know.

A Two-Way Data Pipeline for Governance

Leave a Reply