1. The Big Data Cloud:
Are You Ready for the Zettabyte?
Steven C. Markey, MSIS, PMP, CISSP, CIPP, CISM, CISA, STS-EV, CCSK, CompTIA Cloud
Essentials
Principal, nControl, LLC
Adjunct Professor
President, Cloud Security Alliance – Delaware Valley Chapter (CSA-DelVal)
2. Big Data Cloud
• Presentation Overview
– Why Should You Care?
– Cloud Overview
– Big Data Overview
– Cloud-Based Big Data Offerings
– Securing Cloud-Based DB Solutions
3. Big Data Cloud
• Why Should You Care
– Organizational Cost Reduction Requirements
• Justify Investments
• Improve Efficiencies (Productivity, Time to Market)
– Digital Information – 60%~ Annual Growth Rate (AGR)
– Data Storage – 15-20% AGR Capital Expense (CapEx)
– Categorization, Classification & Retention Magnify
• Compliance, Legal & Privacy Regulations
– Prevalent & Interconnected Business Ecosystems
• Supply Chains
• Business Process Outsourcers (BPO)
• Information Technology Outsourcers (ITO)
• Vendor’s Vendors Source: IDC
8. Big Data Cloud
• Big Data Overview
– Aggregated Data from the Following Sources
• Traditional
• Source
• Social
9. Big Data Cloud
• Traditional Data
– Database Management Systems
• Relational Database Management Systems (RDBMS)
• Object-Oriented Database Management Systems (OODBMS)
• Non-Relational, Distributed DB Management Systems (NRDBMS)
• Mobile Databases (SQLite, Oracle Lite)
– Online Transaction Processing (OLTP)
• Real-Time Data Warehousing
– Online Analytical Processing (OLAP)
• Operational Data Stores (ODS)
• Enterprise Data Warehouse (EDW)
10. Big Data Cloud
• Traditional Data
– OLAP
• Business Intelligence (BI)
– Data Mining
– Reporting
– OLAP (Continued)
» Relational OLAP (ROLAP)
» Multi-Dimensional OLAP (MOLAP)
» Hybrid OLAP (HOLAP)
OLTPODSEDW (Data Marts)BI (Data Mining)
OLTPODSEDW (Data Marts)BI (Reporting)
OLTPODSEDW (Data Marts)BI (OLAP)
20. Big Data Cloud
• Cloud-Based Big Data Solutions
– PaaS
• DBaaS
– Amazon Web Services (AWS)
» DynamoDB
» SimpleDB
» Relational Database Service (RDS): Oracle 11g / MySQL
– Google App Engine
» Datastore
– Microsoft SQL Azure
– Oracle Public Cloud: 11g
• Processing
– AWS Elastic MapReduce (EMR)
– Google App Engine MapReduce: Mapper API
– Microsoft: Apache Hadoop for Azure
– IBM SmartCloud Enterprise on IBM InfoSphere BigInsights Basics
34. Big Data Cloud
• Securing Cloud-Based NRDBMS Solutions
– General
• Focus on Application / Middleware-Level Security
– SQL Injections Are Still Possible
– Leverage Application IAM for NRDBMS User Rights Mgmt (URM)
– Leverage Application & System Logging for Authentication,
Authorization & Accounting (AAA)
• Segregation of Duties
– Read / Write Namespaces
– Read-Only Namespaces
– Specific
• Document
– Consistency Assurance
• Key / Value
– Ensure Referential Integrity
36. Big Data Cloud
• Securing Big Data in the Cloud
– Identity & Access Management (IAM)
• Security Assertion Markup Language (SAML)
• Representational State Transfer (REST)
– AWS IAM
– Windows Azure Access Control Service (ACS)
• Web Services – Trust Language (WS-Trust)
39. Big Data Cloud
• Securing Big Data in the Cloud
– Identity & Access Management (IAM)
• Security Assertion Markup Language (SAML)
• Representational State Transfer (REST)
– AWS IAM
– Windows Azure Access Control Service (ACS)
• Web Services – Trust Language (WS-Trust)
44. Big Data Cloud
• Securing Big Data in the Cloud
– Identity & Access Management (IAM)
• Security Assertion Markup Language (SAML)
• Representational State Transfer (REST)
– AWS IAM
– Windows Azure Access Control Service (ACS)
• Web Services – Trust Language (WS-Trust)
46. Big Data Cloud
• Securing Big Data in the Cloud
– Electronic Discovery (eDiscovery)
• eDiscovery Reference Model (EDRM)
• Legal Holds
• Litigation Response
– Records & Information Management (RIM)
• Generally Accepted Recordkeeping Principles (GARP®)
• Information Governance Reference Model (IGRM)
• Information Lifecycle Management (ILM)
• MIKE2.0
48. Big Data Cloud
• Privacy & Data Protection for Big Data Clouds
– Jurisdictions*
• Regional: EU DPA
• National: PIPEDA, GLBA, HIPAA / HITECH, COPPA, Safe Harbor
• Statutory: Bavarian, CA SB 1386 / 24, MA 201 CMR 17, NV SB 227
– Data Flow & Jurisdictional Adherence
• Data Sharing with Third Parties
– Pseudonymization / De-Identification
• Consent & Notices
– Contract Clauses
• Model Contracts
– Privacy Best Practices
• Generally Accepted Privacy Principles (GAPP) * Not all inclusive.
49. Big Data Cloud
• Presentation Take-Aways
– Big Data in the Cloud is Here to Stay
– It Has to be Secure
– Segregation of Data
– Access Controls
– Separation / Segregation of Duties
– Federated Identities
– Logging
http://qugstart.com/blog/amazon-web-services/how-to-set-up-db-server-on-amazon-ec2-with-data-stored-on-ebs-drive-formatted-with-xfs/ Here’s the procedure I decided on. It involves symlinking Mysql config files and data directories onto the EBS volume. Another trick I used because I needed to migrate about 20 GiB’s of data to get started, was that I initially set up an “X-tra large” instance, with 10 GiB’s RAM to handle the data import. After the data was migrated and imported to my database, I simply terminated my X-Large instance and spun up a small instance connected to the same EBS volume! All the databases were preserved nicely and I did not have to waste money paying for an X-Large instance anymore. This exemplifies the value of thinking in the “cloud” mindset – where you can spin up and down servers in a matter of seconds! Hope this article helps someone else out there!
http://qugstart.com/blog/amazon-web-services/how-to-set-up-db-server-on-amazon-ec2-with-data-stored-on-ebs-drive-formatted-with-xfs/ Here’s the procedure I decided on. It involves symlinking Mysql config files and data directories onto the EBS volume. Another trick I used because I needed to migrate about 20 GiB’s of data to get started, was that I initially set up an “X-tra large” instance, with 10 GiB’s RAM to handle the data import. After the data was migrated and imported to my database, I simply terminated my X-Large instance and spun up a small instance connected to the same EBS volume! All the databases were preserved nicely and I did not have to waste money paying for an X-Large instance anymore. This exemplifies the value of thinking in the “cloud” mindset – where you can spin up and down servers in a matter of seconds! Hope this article helps someone else out there!