As we start talking about Hadoop or, more specifically, an enterprise data hub, you’ll notice that many of the benefits of an EDH have an offset with some interesting security side effects.
With an EDH, you can have a single platform for all of your data. But you’re also now combining data and audiences that used to be siloed into separate, secure systems. Hadoop offers a rich, flexible ecosystem of tools and utilities but you want to be sure that the ecosystem doesn’t come with an equally abundant ecosystem of authentication and access controls. For every tool, you don’t want to manage a unique permission to control access and privileges as it becomes unyielding very quickly.
Hadoop allows you to ingest data of any type, very quickly. But this means you don’t always know when sensitive data is coming in and who is accessing it.
Lastly, active archive is a key benefit of an EDH, providing much lower storage costs than legacy systems. But you also realize that existing systems, while extensive, have a lot of compliance controls built into them and it begs the question, how do you get those same sets of compliance and privacy controls inside the new environment.
Hadoop provides a lot of flexibility, but it’s important to find a platform that maintains this flexibility while still providing the necessary security controls.
Another key part about security is that there are multiple stakeholders all concerned about the security of the system and what you can and cannot do.
The Business Manager is interested in using the EDH to run high value workloads inside the cluster, answer new questions, and gain new insights. They want to put sensitive data into the cluster to reap the benefits back to the business. They also want to be able to quickly adopt new innovations within the Hadoop ecosystem and take full advantage of all the capabilities.
The InfoSec Team is fully supportive of this, but also has established internal rules for how new technologies can be adopted and existing policies and procedures around how systems are authenticated and how people access sensitive data. While Hadoop may be a great advancement for the business, the InfoSec team will not change their policies just for one new system. Additionally, in some environments, the system and data must maintain external compliance to meet HIPAA, PCI, etc. Lastly, for IT/Ops, this isn’t the first system that needed to be secured and already has made existing investments in security tools – such as Active Directory, Kerberos, SIEMs, etc. They want to leverage this existing infrastructure as much as possible for any new systems being introduced. They also want a system that can be set up without too much end-user support and automate the security configurations.
So, not only do we need to balance the security concerns introduced with Hadoop/Big Data, but also against all the viewpoints of the stakeholders.
What does it mean for your data to be secure?
To be fully secure we have to address parameter security, data security, access
Cloudera Encryption and Key management
RecordService offering fine grain access control
Separations of business roles and duties.
Business Impact:
Higher value applications with secured data
Lower software TCO [vs. 3rd party solutions for encryption, governance]
Lower compliance/breach risk
Focus Questions:
Navigator Encrypt/KeyTrustee: What is the impact of an information leak from intermediate MR results, log files, etc?
Sentry/RecordService: How are you planning to secure access to sensitive data across Hive and Spark?
Navigator: Do your governance needs extend beyond Hive?
Manager: How will you keep end users from damaging your production environments?
Could reduce cost for storage
Diverse data set increased value of insights
Link to account record in SFDC (valid for Cloudera employees only): https://na6.salesforce.com/0018000000wN6YU
A global casino operator gains 360-degree customer view across global properties -- online and offline -- to maximize marketing ROI.
Background: The gaming and casino industry is a competitive one. This organization has casinos in four different continents, and also manages some online gambling properties. While revenues continue to grow, they aren’t accelerating at the rate that the company would like.
Challenge: The organization recognized that better customer understanding and a more robust client loyalty program would be key to achieving the growth they wanted, but they were already spending about $15 million every year on the legacy data warehouse environment, comprised of a Teradata EDW and data marts from Oracle and Microsoft. They wanted to be able to ingest and act upon unstructured data, such as feeds from gaming, and combine data from their casino games, restaurants, shops, shows, hotels, and online gaming to get a holistic view of each customer in order to tailor their communications and marketing offers.
Solution: After a competitive situation with Hortonworks, the casino operator decided to move forward with Cloudera, largely due to our product roadmap and technical differentiation through offerings such as Impala, Search, and Navigator. They also value the ease-of-management offered by Cloudera Manager. The organization has sent 60 of its employees to Cloudera training to get them up to speed on the new environment quickly. The casino operator is currently using Cloudera to ingest, process and analyze data for their loyalty program. Data volumes on Cloudera amount to 400 TB across 100 Dell nodes. Data sources feeding the system include their player loyalty cards, credit cards, and marketing systems. They’re using BDR and Navigator to secure their Cloudera cluster, and they use Cloudera Manager to manage it. They’re evaluating Impala and HBase as well. The company uses Tableau and SAS today for BI and analytics, and they’ve been evaluating Revolution Analytics. They run Informatica for ETL. Other use cases that the casino chain is considering migrating to Cloudera are management of its employee healthcare options and offloading workloads that will optimize performance of the Teradata EDW.
Results: This casino operator expects that the multi-channel, 360-degree customer view enabled by Cloudera will drive more loyal customers that spend more. Having a more complete view of each customer will also allow the casino operator to identify customers with similar profiles and preferences to their big spenders so that they can send them cross-sell and upsell offers that are more compelling.In addition, by offloading workloads from legacy systems to Cloudera, the casino operator will be optimize the efficiency of its data management infrastructure, resulting in millions saved overall on IT spend.
Today we are going to talk about one company that is using the internet of things to build a platform and next gen services for their customers.
Opower is a Utility Analytics company that provides 360-degree views into energy usage patterns and similar household comparisons to help consumers save energy.
Challenge: With the advent of smart meters and ever-growing utility data streams, Opower recognized the need to capture, store and analyze this data in order to help consumers save energy.
Solution: Opower built an analytical application on Cloudera Enterprise, leveraging Apache HBase, to bring together utility consumption data along with weather data, consumer behavior data, and other disparate sources of information.
Benefit: By pulling together, processing, analyzing, and then presenting information to homeowners, Opower is helping more than four million homes save hundreds of millions of dollars on their energy bills.
At Intel we can envision a world where we need to scale to trillions of events a day as the boundaries of the traditional enterprise continues to extend (e.g. cloud). Analyzing those events simply wasn’t possible on traditional systems.
Besides scale, correlation based systems weren’t an option as they trigger too many false positives and only detect the known unsophisticated attack. We needed to run anomaly detection algorithms against larger data sets.
ArcSight broke on us.
Tom, having been the CEO of Arcsight, how have you seen Apache Hadoop disrupt the cybersecurity space?
Traditional SIEMs…
Y-axis: Detect known threats using search and correlation based techniques. They are not meant for querying/ or advanced analytics.
X-axis: The data that you store in the SIEM needs to be structured and limited in volume. High large, high volume data sources are not being stored or processed.
Z-axis: The volume of information is limited. SIEMs only store 90 to 120 days worth of data.
Apache Hadoop-based cybersecurity solutions:
Y-axis: Allows a wide range of access to the data depending on analytics technique. Not just a search or SQL engine.
X-axis: Any volume or type of data can be stored on the enterprise data hub.
Z-axis: Months and years worth of data can remain accessible for threat responders to access and analysts to analyze.
But Alan, but you know all of this. You have done some work in the open source community to help but this new technology in the hands of cybersecurity professionals.