Security, Through the Lens of Data Science
DZone's Guide to
Security, Through the Lens of Data Science
Whichever path you choose, I urge you to adhere to these four tenets and take measures to safeguard your organization's data.
May. 03, 18 · Big Data Zone ·
Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
If you are a business leader, you are well aware that the challenges of information security have never been more daunting. Security remains one of the top unresolved challenges for businesses today. And the problem is only growing.
Hacking today is much more complex than just scanning and penetrating the network via a known vulnerability. Yet, the traditional tools in most companies are often inadequate.
When business leaders ask me for advice on cybersecurity, here are the four tenets I tell them they must follow.
1. Use Data Science to Identify Abnormalities
Data science is all about taking disparate types of data and creating structure, organization, and labeling so that pattern analysis can be run via machine and deep learning algorithms. Whether it's buyer sentiment analysis, facial recognition algorithms, or modeling the spread of malware through a network — it's the same basic data science. What changes are simply the types of patterns you detect and how you build it into your incident response process.
The most commonly understood data science involves identifying "normal" activities to determine patterns of behavior. In retail, you might apply these insights to analyze customer sentiment, buying preferences, peak activity times, and so on. In a case like this, the focus is on identifying the patterns; the abnormal or edge cases are disregarded.
With cybersecurity, it's the opposite. Security vendors and security professionals in your organization must use the same data and the same algorithms, but instead with an alternative focus. With the right technology, data analytics can help you quickly identify when a behavior takes place that is counter to normal patterns.
2. Don't Filter Your Data: Get It Raw, Use It All
"Use all the data" is a fundamental tenet we learn as data scientists that may not be so obvious for security professionals. You need to detect all of the behavioral changes and run machine learning algorithms against raw activity, not a pre-filtered event stream or subset from one tool or another.
You cannot build analytical models and a behavioral profile that can detect abnormal activities if you are not able to detect raw behavior in the first place.
It's therefore important to consider how any security analytics solutions are collecting data, what they are collecting, and whether they provide a true raw unfiltered feed of activity for a comprehensive view of relevant data.
3. Choose a Comprehensive Solution
To be truly effective, a modern cybersecurity solution must be both sophisticated and comprehensive. But what does that mean?
In short, you need:
Usability: Advanced technology is essential, but just as important is your user interface. Your security team needs to be able to prescribe a solution via a centralized, organized, and easily understood single view of all relevant data.
High-speed ingestion: With the rate of potential incidents ever-increasing, security telemetry needs to be immediately collected, normalized, and stored for easy access.
Real-time processing: As with above, speed is critical. Streaming data feeds with real-time enrichment is essential to quickly understanding and addressing potential threats.
Scalability: Your data store solution needs to be cost-effective not only for initial capture but also for future access.
4. Finally: Automate, Automate, Automate
The problem in many organizations is that there is too much security alert data coming too fast. Many companies are generating hundreds of thousands of alerts per second.
Automated responses are where an analytics-driven rules engine really shines. Without automation, the vast majority of the alerts remain untouched, which is why we have industry statistics that say compromises can run for an average of 300 days before anyone notices.
Don't lose track of the role of automation in addressing the core business issue preventing you from finding a hack. The reality is that top businesses need automation of detection and response that triggers automated workflows and can reduce 30 minutes per event down just a few seconds.
The Solution Is Out There
The connected world creates a rate and volume of streaming cybersecurity data that is unprecedented, and attacks are increasingly sophisticated and multifaceted. And while trillions of dollars have been spent on security technology over the last three decades, hackers seem to be more successful than ever.
The good news is that the battle over cybersecurity is one that forward-looking leaders can win. Most existing security tools typically use a few facets of cybersecurity data. Better solutions exist. We offer our own solutions, of course, and we also recommend that any leader who is serious about this issue becomes familiar with the expertise offered by the non-profit Apache foundation .
Whichever path you choose, I urge you to adhere to the four tenets outlined above. It is paramount that your organization takes every measure to safeguard data to protect its business continuity, brand name, and most importantly, employees, and customers.
To learn more: