3 Big Problems with Big Data and How to Solve Them

Read original article here

We discuss some of the negatives of using big data, including false equivalences and bias, vulnerability to security breaches, protecting against unauthorized access and the lack of international standards for data privacy regulations.

Big Data is unique in its size and scale. Add machine learning and Data Science, and this sheer volume will make it possible to reachunprecedentedlevels of accuracy and scope in predictions. When dealing with Big Data, there’s no need to worry about insufficient sample sizes or test group results—because the sample size is no less than everything. So it’s easy to think that with the famous 3 Vs (or5, 6, 7 Vs) every single piece of possible input is at your service and disposal, meaning total control over complex, foolproof, endlessly scalable systems and services.

That’s the theory, at least.

However, when such vast and all-encompassing data amounts are processed automatically, numerous issues are bound to surface. Some of them are inherited from the ‘small data’ times; some are new and unique to the Big Data format. Here are three of the most prominent issues, along with the suggested solutions.

With extremely large data quantities, the automated process of drawing conclusions (think black-box machine learning models) can be a bit of a gamble. What if in the process of learning, like in the often brought-upUCI study case, your AI chooses to make a shortcut and finds a link when there is none? Mistaking coincidence or causation for correlation and vice versa is a prominent problem with real-life consequences.

One of such consequences is customer dissatisfaction, when, for example, a recommendation engine fails to provide users with products they’re actually interested in. Or, in a far more alarming example, when “machine-learned” biasapplies to mortgage rates calculationsresulting in higher charges for minorities and the least protected groups. Credit history, court sentencing, job search, medical insurance—the more reliant on Big Data we become, the more challenging and crucial it is to avoid the false equivalencies and bias it can lead to.

What are the ways to tackle this issue? According to Yael Eisenstat, ex-head of Facebook elections integrity operations,cognitive bias training is the keyalong with time, better Data Science and bigger, cleaner input data. Implementing algorithms and analytics without accounting for enough variables is what causes this issue. So ongoing critical evaluation, quality control and course correction (plus not over-relying on technology) go a long way in keeping Big Data on the right track here.

Aside from the “garbage in, garbage out” issues leading to biased and incomplete conclusions, another weakness (also, a priority) is Big Data security. A slew of big online leaks and wide-scale breaches in recent years (US Citizen Records leakandYahoo! emails, to name a few) demonstrate that when dealing with Big Data, the usual methods of security protection aren’t enough.

Big Data and the algorithms that operate on it are very complex, and the more complex a system, the more potential weak spots that can be exploited. Here are a few examples:

It’s still a long way to go before there are clear regulations on data collection and breach aftermath proceedings. Meanwhile, the experts froma1qarecommend organizations to relentlessly test all the software based on and working with Big Data.

Such testing should be aimed at finding weak spots, verifying if open-source software like Hadoop is breach-proof, employing attribute-based encryption and other measures of digital protection. Altogether, this should be par for the course on the way to ensure the cybersecurity of an enterprise’s Big Data.

One step away from potential attacks and breaches of security, the question of Big Data privacy—personal data privacy in particular—has never been more relevant. Sincenobody reads terms of service, billions of users voluntarily provide app makers, device manufacturers, social networks, and various businesses around the globe with unending and uncontrolled streams of personal information. While extremely beneficial for marketing and research purposes, this also creates opportunity for intrusive, unethical use of Big Data—by those who collect and process it.

This is a serious concern for governments and general public alike. So for the past couple of years, new regulations and data privacy lawshave been emerging around the globe. The trouble, however, is that these laws and initiatives are:

This leads to an avalanche of questions of the legal and ethical nature. Where should we draw the line between user data privacy and corporate use? Will the region-specific Data Regulations fracture and change the Internet as we know it? Will it then lead to the unprecedented dominance of region-locked content, software and hardware? Will smaller international businesses be able to keep up? These and other concerns are at the forefront of today’s Big Data legal discussions.

At large, theemerging data privacy laws aim to provide a more ethical environment and much-needed transparency to user data processing. But it’s important to understand that they are a work in progress. There’s a lengthy path of trial and error ahead where businesses are concerned; and legislators are likely to miss the mark on their first few tries.

Those who collect and process Big Data should have boundaries, obviously. But these boundaries shouldn’t be restrictive nor should they leave businesses (especially SMEs) no choice but to quit a particular regional market altogether.

Creation of a global standard with internationally agreed-upon core principles would also be an immense boon. This should also go together with taking into consideration existing conditions and nuances,having clear-cut guidelines, and allowing sufficient transition periods to apply the needed changes.

When it comes to Big Data, it’s easy to get overwhelmed with its endless exciting possibilities. Nevertheless, critical assessment, the understanding of shortcomings and vulnerabilities (technological, ethical and legal), as well as strategies to address them should be at the core of any Big Data implementation.

Bio: Elena Yakimova is the Head of Web Testing Department at software testing company a1qa. She started her career in QA in 2008. Now Elena’s in-house QA team consists of 115 skilled engineers who have successfully completed more than 250 projects in telecom, retail, e-commerce, and other verticals.

Images Powered by Shutterstock

The Data Daily

3 Big Problems with Big Data and How to Solve Them