Logo

The Data Daily

Bringing The US Census Into The Data Age

Bringing The US Census Into The Data Age

I recently attended a Virtual Wine Night hosted by Splunk. The event was moderated by Splunk CEO Doug Merritt. He was joined by two honored guests: Rumman Chowdhury, PhD, Responsible AI Lead at Accenture, and Stephen Buckner, Assistant Director of Communications for the US Census Bureau. The discussion revolved around moral and ethical issues of how organizations use data, and specifically on data collected by the US Census Bureau and how to extract value from it without crossing any “Big Brother” lines or violating privacy regulations.

Doug kicked things off with an overview of the evolution from the Age of Big Data to the Data Age, and the state of data today. The low cost of storage combined with billions of connected devices generates petabytes of new data every day. Doug noted that much of the data is transactional—customer relationship management, financial, or human resources data.

He pointed out that the vast majority of the data that exists is “dark data”—data that is unknown and largely ignored. Doug also stressed that humans are not very good at understanding or interpreting data. “We continue to focus on stories and narratives instead of the facts of the data.”

Splunk recently published the Data Age Report, which contains a number of interesting insights related to data and how organizations are managing it. One of the key findings is that a majority of businesses are not ready for the massive influx of data that is coming. More than 4 out of 5 survey respondents acknowledge that data is either very or extremely valuable to their organization. However, only 14% of the organizations surveyed indicated that they are fully prepared for the imminent wave of new data that is coming. A third of respondents are currently preparing, but that leaves 53% that fall into not yet preparing, aware of rising data but not yet thinking of the implications, not aware of the rising volume of data, or not sure.

Rumman talked about companies like Facebook, Google, and LinkedIn—companies that were born in the cloud and are swimming in a sea of data. She also noted that organizations of every size and across every industry are embracing the value of data today, though, and stressed the need for organizations to consider the ethical implications of how data is aggregated, analyzed and used.

One significant source of new data is the US Census. Every 10 years, the United States conducts a comprehensive census of US citizens. It is mandated in Article I, Section 2 of the US Constitution in order to allocate resources and representation based on population. By its very nature, the census is about collecting data, but what that data means and how it can be viewed and used takes on new meaning as technology evolves.

Stephen shared that he has been with the US Census Bureau for 23 years—which makes the 2020 Census the third one he has participated in. He explained that the US Census Bureau does provide access to data for academic institutions, but they are keenly aware of the privacy implications of the data.

He talked some about database reconstruction theory—the idea that you can reconstruct a database using publicly available data. He explained that the US Census Bureau went through an exercise to reconstruct the database of the 2010 US Census and they were able to do so with a high degree of accuracy using only tables that were publicly available online. That was a concern, and they used that exercise to implement changes, such as disclosure avoidance techniques to prevent this type of reconstruction.

There are a variety of valuable insights that can be extracted from the US Census data. The whole point is to be able to determine population shifts—but if you dig deeper you might be able to draw more specific conclusions about the makeup of the US population or patterns related to specific regions or career fields. There are limits to just how much detail we can derive, though, out of concern for privacy and the need to anonymize and sanitize results.

That said, the rise of the Data Age and the explosion of technology and data open up new opportunities—both for the US Census Bureau and for private companies of all sizes. The companies that are prepared to leverage and visualize that data will have a strategic advantage over those that are not.

Images Powered by Shutterstock