SlideShare une entreprise Scribd logo
1  sur  19
Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 1
Building an enterprise big data platform
from ground up (Lessons Learned)
Roopesh Varier Srinivas Nimmagadda
Director, CPE Technical Director, CPE
Agenda
Introduction1
Platform Objectives2
Decision Criteria & PoC3
Key Lessons4
Q & A5
2Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda
Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 3
CPE Overview
• CPE Charter
– Build a consolidated cloud infrastructure that offers platform
services to host Symantec cloud applications
• Symantec big data platforms already hosting diverse
(security, data management) workloads
– Analytics – Reputation based security, Managed Security
Services, Fraud Detection
– Financial – Consumer and Enterprise transactions
– Dial Home – Appliance logs
• We are building a central big data platform that
provides batch and real time services
– Security, Scalability & Multi-tenancy are core objectives for all
services
Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 4
Core Services
CPE Platform Architecture
5
Compute Networking Storage
CLIs ScriptsCloud Applications
Big Data Messaging
Identity &
Access
(Keystone)
Supporting
Services
Authn
Roles
User Mgmt
Tenancy
Quotas
Logging
Metering
Monitoring
Deployment
Compute
(Nova)
Image
(Glance)
SDN (Neutron)
Load
Balancing
DNS SQL
Batch
Analytics
Stream
Processing
Msg Queue
Mem Cache
Email Relay
SSL
K/V Store
Web Portal
Object Store
REST/JSON API
Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda
Agenda
Introduction1
Platform Objectives2
Decision Criteria & PoC3
Key Lessons4
Q & A5
Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 6
Platform Objectives
Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 7
• Shared central platform
– Generic Architecture with central deployment, monitoring and
logging framework
• Multiple workloads
– Batch Processing
– Interactive Queries
• Multi-tenancy
– Security
– Resource Tenancy
• Scalability
• Reliability
– Active – Active
– Data Ingestion (Teeing)
• Self healing
• Effective Resource Utilization
• Performance
Agenda
Introduction1
Platform Objectives2
Decision Criteria & PoC3
Key Lessons4
Q & A5
Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 8
Decision Criteria
• Management
– Deployment
– Operational Ease
– Add/Remove Nodes
– Add/Remove Packages
– Monitoring
– Software Upgrade
• Multiple workloads
– Opensource tools (Presto, Shark etc)
– SQL query support
• Performance
Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 9
Decision Criteria (contd)
• Security
– Authentication
– Audit of access control
– Kerberos integration
• Scalability
– Support for different file format
• Reliability & Self Healing
– SPoF
– DR Support
• Failure alerts
• Self healing
– Operational Management
Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 10
128 GB, 48 TB NL-SAS (7.2K) 4 Racks 40Gb to Spine
2x10Gb LACP/Server
Proof of Concept
Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 11
Vendor 1
Data Nodes
Vendor 2
Data Nodes
Vendor 3
Data Nodes
Name Nodes &
Client Nodes
Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda
Agenda
Introduction1
Platform Objectives2
Decision Criteria & PoC3
Key Lessons4
Q & A5
Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 13
Key Lessons
• Platform objectives should be defined up front
– Security
– Scale
• Workload, Long term capacity needs
– Fault Tolerance/Availability
• DR vs Active-Active
• Understand your use cases well
• Hardware Selection – Don’t agonize over it
– Go mainstream
• SATA good enough
– 48 TB/node (12 Spindles)
• Data node sizing
– Cost vs Capacity (Sweet Spot)
• CPU to Disk Ratio (Load based)
– Core/TB Ratio - 2:3 (32 cores with HT)
• RAM – 128GB
• Uniform SKU
Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 14
Key Lessons
• Distro Selection (Technology drivers)
– Define the principles
• Opensource vs Proprietary, Maturity vs Cutting Edge
– Features to support use cases
• Batch Processing vs In-memory processing
• POSIX compliance
– Management needs (Build vs Buy)
• Operational ease
• Logging/Monitoring
– Security
• Workflow needs
– Open-source tools
• Customizations/Development efforts required
Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 15
Key Lessons
• Distro Selection (Business drivers)
– Strategic Partnership
• OEM deals
– Long term viability
– Cost
• 2 Year TCO
• 5 Year TCO
– Template driven selection
• Driving adoption
– Early sandboxes
– Documentation
– Reference application
– Keep resources aside for bootstrapping
Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 16
Key Lessons
• Most valuable insight
– Big Data problems comes with big problems and
big ROI
• Challenges
– Development work
– Dev/Ops mind set
– Cultural Change
– Evangelization
• Opportunities
– Customer insights
– New revenue streams
– Start experimentation
• Paralysis by analysis
– Have fun!!!
Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 17
“If everything seems under control, you're not going fast
enough”
- Mario Andretti,
Formula One World Champion
18Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda
Thank you!
Copyright © 2012 Symantec Corporation. All rights reserved.
Thank you!
19
Roopesh Varier Srinivas Nimmagadda
Director, CPE Technical Director, CPE
Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda

Contenu connexe

Tendances

The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...spinningmatt
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...DataStax
 
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...DataStax
 
New Directions for Mahout
New Directions for MahoutNew Directions for Mahout
New Directions for MahoutTed Dunning
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetupWei Ting Chen
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on dockerWei Ting Chen
 
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...DataStax
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun JeongSpark Summit
 
OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014Sergey Lukjanov
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Databricks
 
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...Sergey Lukjanov
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsDataWorks Summit
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeongYousun Jeong
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015Yousun Jeong
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?DataWorks Summit
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureDataWorks Summit
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experienceVitaliy Bashun
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Sparkrhatr
 
Exponea - Kafka and Hadoop as components of architecture
Exponea  - Kafka and Hadoop as components of architectureExponea  - Kafka and Hadoop as components of architecture
Exponea - Kafka and Hadoop as components of architectureMartinStrycek
 

Tendances (20)

The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
 
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
 
New Directions for Mahout
New Directions for MahoutNew Directions for Mahout
New Directions for Mahout
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
 
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experience
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
Exponea - Kafka and Hadoop as components of architecture
Exponea  - Kafka and Hadoop as components of architectureExponea  - Kafka and Hadoop as components of architecture
Exponea - Kafka and Hadoop as components of architecture
 

En vedette

Continuous Monitoring Deck
Continuous Monitoring DeckContinuous Monitoring Deck
Continuous Monitoring DeckBrian Fennimore
 
закон взаємозв’язку маси і енергії2
закон взаємозв’язку маси і енергії2закон взаємозв’язку маси і енергії2
закон взаємозв’язку маси і енергії2Yury Fedorchenko
 
Streaming real time data with Vibe Data Stream
Streaming real time data with Vibe Data StreamStreaming real time data with Vibe Data Stream
Streaming real time data with Vibe Data StreamInformaticaMarketplace
 
What is Information Architecture and How Can It Help My Website?
What is Information Architecture and How Can It Help My Website?What is Information Architecture and How Can It Help My Website?
What is Information Architecture and How Can It Help My Website?Jimmy Smith
 
Architecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationArchitecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationGeorge Long
 
Insight Data Engineering: Open source data ingestion
Insight Data Engineering: Open source data ingestionInsight Data Engineering: Open source data ingestion
Insight Data Engineering: Open source data ingestionTreasure Data, Inc.
 
Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Sean Roberts
 
The Power of Data Visualization
The Power of Data VisualizationThe Power of Data Visualization
The Power of Data VisualizationGautham Pallapa
 
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...Data Con LA
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Pat Patterson
 
John Kret Resume Data Analyst
John Kret Resume Data  AnalystJohn Kret Resume Data  Analyst
John Kret Resume Data AnalystJohn Kret
 
Information architecture and SharePoint
Information architecture and SharePointInformation architecture and SharePoint
Information architecture and SharePointLulu Pachuau
 
Big Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data PlatformBig Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data PlatformNavneet Gupta
 

En vedette (18)

Aplica tic
Aplica ticAplica tic
Aplica tic
 
Beyond the Spreadsheet
Beyond the SpreadsheetBeyond the Spreadsheet
Beyond the Spreadsheet
 
La salud - Los espartacos
La salud - Los espartacosLa salud - Los espartacos
La salud - Los espartacos
 
Internship Wso2
Internship Wso2Internship Wso2
Internship Wso2
 
Continuous Monitoring Deck
Continuous Monitoring DeckContinuous Monitoring Deck
Continuous Monitoring Deck
 
закон взаємозв’язку маси і енергії2
закон взаємозв’язку маси і енергії2закон взаємозв’язку маси і енергії2
закон взаємозв’язку маси і енергії2
 
Streaming real time data with Vibe Data Stream
Streaming real time data with Vibe Data StreamStreaming real time data with Vibe Data Stream
Streaming real time data with Vibe Data Stream
 
What is Information Architecture and How Can It Help My Website?
What is Information Architecture and How Can It Help My Website?What is Information Architecture and How Can It Help My Website?
What is Information Architecture and How Can It Help My Website?
 
Architecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationArchitecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & Manipulation
 
Platform big data 2015 sept
Platform big data 2015 septPlatform big data 2015 sept
Platform big data 2015 sept
 
Insight Data Engineering: Open source data ingestion
Insight Data Engineering: Open source data ingestionInsight Data Engineering: Open source data ingestion
Insight Data Engineering: Open source data ingestion
 
Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015
 
The Power of Data Visualization
The Power of Data VisualizationThe Power of Data Visualization
The Power of Data Visualization
 
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!
 
John Kret Resume Data Analyst
John Kret Resume Data  AnalystJohn Kret Resume Data  Analyst
John Kret Resume Data Analyst
 
Information architecture and SharePoint
Information architecture and SharePointInformation architecture and SharePoint
Information architecture and SharePoint
 
Big Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data PlatformBig Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data Platform
 

Similaire à Lessons Learned from Building an Enterprise Big Data Platform from the Ground up in Months

Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...DataWorks Summit
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全Jianwei Li
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014John Berns
 
Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryNavigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryDataWorks Summit/Hadoop Summit
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise ArchitectureMapR Technologies
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRData Con LA
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...MapR Technologies
 
Data Governance for Data Lakes
Data Governance for Data LakesData Governance for Data Lakes
Data Governance for Data LakesKiran Kamreddy
 
M|18 How We Made the Move to MariaDB at FNI
M|18 How We Made the Move to MariaDB at FNIM|18 How We Made the Move to MariaDB at FNI
M|18 How We Made the Move to MariaDB at FNIMariaDB plc
 
Architecting Your Own DBaaS in a Private Cloud with EM12c
Architecting Your Own DBaaS in a Private Cloud with EM12cArchitecting Your Own DBaaS in a Private Cloud with EM12c
Architecting Your Own DBaaS in a Private Cloud with EM12cGustavo Rene Antunez
 
Fast, Flexible Application Development with Oracle Database Cloud Service
Fast, Flexible Application Development with Oracle Database Cloud ServiceFast, Flexible Application Development with Oracle Database Cloud Service
Fast, Flexible Application Development with Oracle Database Cloud ServiceGustavo Rene Antunez
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!DataWorks Summit
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureMapR Technologies
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiFelicia Haggarty
 
6 GigaSpaces Principles to Survive Black Friday
6 GigaSpaces Principles to Survive Black Friday6 GigaSpaces Principles to Survive Black Friday
6 GigaSpaces Principles to Survive Black FridayAli Hodroj
 
Does Big Data Spell Big Costs- Impetus Webinar
Does Big Data Spell Big Costs- Impetus WebinarDoes Big Data Spell Big Costs- Impetus Webinar
Does Big Data Spell Big Costs- Impetus WebinarImpetus Technologies
 
Using MySQL in the Cloud
Using MySQL in the CloudUsing MySQL in the Cloud
Using MySQL in the CloudMatt Lord
 
Capital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting PlatformCapital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting PlatformDataStax Academy
 
Zeta architecture - Hive London May15
Zeta architecture - Hive London May15Zeta architecture - Hive London May15
Zeta architecture - Hive London May15MapR Technologies
 

Similaire à Lessons Learned from Building an Enterprise Big Data Platform from the Ground up in Months (20)

Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
 
Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryNavigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data Discovery
 
Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise Architecture
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Data Governance for Data Lakes
Data Governance for Data LakesData Governance for Data Lakes
Data Governance for Data Lakes
 
M|18 How We Made the Move to MariaDB at FNI
M|18 How We Made the Move to MariaDB at FNIM|18 How We Made the Move to MariaDB at FNI
M|18 How We Made the Move to MariaDB at FNI
 
Architecting Your Own DBaaS in a Private Cloud with EM12c
Architecting Your Own DBaaS in a Private Cloud with EM12cArchitecting Your Own DBaaS in a Private Cloud with EM12c
Architecting Your Own DBaaS in a Private Cloud with EM12c
 
Fast, Flexible Application Development with Oracle Database Cloud Service
Fast, Flexible Application Development with Oracle Database Cloud ServiceFast, Flexible Application Development with Oracle Database Cloud Service
Fast, Flexible Application Development with Oracle Database Cloud Service
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data Architecture
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
 
6 GigaSpaces Principles to Survive Black Friday
6 GigaSpaces Principles to Survive Black Friday6 GigaSpaces Principles to Survive Black Friday
6 GigaSpaces Principles to Survive Black Friday
 
Does Big Data Spell Big Costs- Impetus Webinar
Does Big Data Spell Big Costs- Impetus WebinarDoes Big Data Spell Big Costs- Impetus Webinar
Does Big Data Spell Big Costs- Impetus Webinar
 
Using MySQL in the Cloud
Using MySQL in the CloudUsing MySQL in the Cloud
Using MySQL in the Cloud
 
Capital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting PlatformCapital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting Platform
 
Zeta architecture - Hive London May15
Zeta architecture - Hive London May15Zeta architecture - Hive London May15
Zeta architecture - Hive London May15
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 

Dernier

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Dernier (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Lessons Learned from Building an Enterprise Big Data Platform from the Ground up in Months

  • 1. Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 1 Building an enterprise big data platform from ground up (Lessons Learned) Roopesh Varier Srinivas Nimmagadda Director, CPE Technical Director, CPE
  • 2. Agenda Introduction1 Platform Objectives2 Decision Criteria & PoC3 Key Lessons4 Q & A5 2Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda
  • 3. Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 3
  • 4. CPE Overview • CPE Charter – Build a consolidated cloud infrastructure that offers platform services to host Symantec cloud applications • Symantec big data platforms already hosting diverse (security, data management) workloads – Analytics – Reputation based security, Managed Security Services, Fraud Detection – Financial – Consumer and Enterprise transactions – Dial Home – Appliance logs • We are building a central big data platform that provides batch and real time services – Security, Scalability & Multi-tenancy are core objectives for all services Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 4
  • 5. Core Services CPE Platform Architecture 5 Compute Networking Storage CLIs ScriptsCloud Applications Big Data Messaging Identity & Access (Keystone) Supporting Services Authn Roles User Mgmt Tenancy Quotas Logging Metering Monitoring Deployment Compute (Nova) Image (Glance) SDN (Neutron) Load Balancing DNS SQL Batch Analytics Stream Processing Msg Queue Mem Cache Email Relay SSL K/V Store Web Portal Object Store REST/JSON API Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda
  • 6. Agenda Introduction1 Platform Objectives2 Decision Criteria & PoC3 Key Lessons4 Q & A5 Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 6
  • 7. Platform Objectives Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 7 • Shared central platform – Generic Architecture with central deployment, monitoring and logging framework • Multiple workloads – Batch Processing – Interactive Queries • Multi-tenancy – Security – Resource Tenancy • Scalability • Reliability – Active – Active – Data Ingestion (Teeing) • Self healing • Effective Resource Utilization • Performance
  • 8. Agenda Introduction1 Platform Objectives2 Decision Criteria & PoC3 Key Lessons4 Q & A5 Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 8
  • 9. Decision Criteria • Management – Deployment – Operational Ease – Add/Remove Nodes – Add/Remove Packages – Monitoring – Software Upgrade • Multiple workloads – Opensource tools (Presto, Shark etc) – SQL query support • Performance Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 9
  • 10. Decision Criteria (contd) • Security – Authentication – Audit of access control – Kerberos integration • Scalability – Support for different file format • Reliability & Self Healing – SPoF – DR Support • Failure alerts • Self healing – Operational Management Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 10
  • 11. 128 GB, 48 TB NL-SAS (7.2K) 4 Racks 40Gb to Spine 2x10Gb LACP/Server Proof of Concept Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 11 Vendor 1 Data Nodes Vendor 2 Data Nodes Vendor 3 Data Nodes Name Nodes & Client Nodes
  • 12. Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda
  • 13. Agenda Introduction1 Platform Objectives2 Decision Criteria & PoC3 Key Lessons4 Q & A5 Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 13
  • 14. Key Lessons • Platform objectives should be defined up front – Security – Scale • Workload, Long term capacity needs – Fault Tolerance/Availability • DR vs Active-Active • Understand your use cases well • Hardware Selection – Don’t agonize over it – Go mainstream • SATA good enough – 48 TB/node (12 Spindles) • Data node sizing – Cost vs Capacity (Sweet Spot) • CPU to Disk Ratio (Load based) – Core/TB Ratio - 2:3 (32 cores with HT) • RAM – 128GB • Uniform SKU Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 14
  • 15. Key Lessons • Distro Selection (Technology drivers) – Define the principles • Opensource vs Proprietary, Maturity vs Cutting Edge – Features to support use cases • Batch Processing vs In-memory processing • POSIX compliance – Management needs (Build vs Buy) • Operational ease • Logging/Monitoring – Security • Workflow needs – Open-source tools • Customizations/Development efforts required Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 15
  • 16. Key Lessons • Distro Selection (Business drivers) – Strategic Partnership • OEM deals – Long term viability – Cost • 2 Year TCO • 5 Year TCO – Template driven selection • Driving adoption – Early sandboxes – Documentation – Reference application – Keep resources aside for bootstrapping Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 16
  • 17. Key Lessons • Most valuable insight – Big Data problems comes with big problems and big ROI • Challenges – Development work – Dev/Ops mind set – Cultural Change – Evangelization • Opportunities – Customer insights – New revenue streams – Start experimentation • Paralysis by analysis – Have fun!!! Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 17
  • 18. “If everything seems under control, you're not going fast enough” - Mario Andretti, Formula One World Champion 18Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda
  • 19. Thank you! Copyright © 2012 Symantec Corporation. All rights reserved. Thank you! 19 Roopesh Varier Srinivas Nimmagadda Director, CPE Technical Director, CPE Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda

Notes de l'éditeur

  1. Introduction to CPE – Quick recap of the CPE charter/mission
  2. Why hadoop? Make sure you tie the lessons
  3. - Given a sense of diversity workloads and scope of the platform. This not just a single application or OpenStack configured to be optimized for single app. - Out focus here is not just addressing scale related issues in various openstack environments, but to have the right framework for us to be able to identify and quickly remediate these issues. E.g SDN testing – rapid creation of control plane elements – VMs, networks, routers and tenant etc allowed us to identify scale issues in Neutron and fix some of them. - Architecture goals Secure, scalable and reliable OpenStack based cloud platform
  4. we are following a RESTful SOA model - closely resembles OpenStack design philosophy - a collection of loosely coupled independent services that expose a REST/JSON endpoint. This enables two things – federated development model that allow services to independently designed, developed, tested and deployed to production (2) applications don’t view the platform as rigid framework, but a menu of services can be used to "compose" an application ( not an "all or nothing" proposition) - keystone as a core IAC service, this means all other CPE services with integrate with keystone for Authn and Authz - all CPE services exposed thru a REST/JSON API. - secure multi tenancy a core objective for all CPE services (you might want to touch on multiple levels of tenancy here - domains and projects)
  5. Introduction to CPE – Quick recap of the CPE charter/mission
  6. ----- Meeting Notes (6/2/14 23:05) ----- Central Platform for - Audit purpose - No re-learn every time a new service is deployed - Seemless integration between the services ----- Meeting Notes (6/3/14 00:24) ----- Scalability Legacy technologies had limitations when it came to scale out Reliability Any part of the infrastructure could fail Platform has to be highly available as we have external customers who are dependent on this critical infrastructure Self healing Platform should be able to seamlessly migrate jobs & workloads to other DCs
  7. Introduction to CPE – Quick recap of the CPE charter/mission
  8. Authentication (Integration with LDAP or AD for Authentication, Other modes of authentication supported like PAM) ----- Meeting Notes (6/3/14 00:24) ----- We wanted to share the core PoCs and Test plans we executed as we evaluated the different distros
  9. Authentication (Integration with LDAP or AD for Authentication, Other modes of authentication supported like PAM) Operational Management – API Support (User Management, Job Management, Data Ingestion)
  10. ----- Meeting Notes (6/3/14 00:24) ----- Apple-to-Apple comparison Management testing, Scalability & Reliability testing, Used real life workloads that we migrated from our existing platforms North South, East West
  11. Weighted matrix Customer workloads have been taken into consideration and use that to define your matrix and rating
  12. Introduction to CPE – Quick recap of the CPE charter/mission
  13. Network to Spindle – 20:12 (125 MB – 1 Gbps) CPU – 100%, Memory – 44%, Disk – 71.5% Network – (East West – 23 GBps, North South - - 27 GBps) More memory to make sure that we provision for future Namenode – 100 million files (128 GB) (1 million files – 1 GB) ----- Meeting Notes (6/3/14 00:24) ----- Personal lesson - Agonizing over SATA, NL-SAS, SAS etc
  14. For Symantec – Security was not an afterthought. ----- Meeting Notes (6/3/14 00:24) ----- If you are planning to use open source tools - make sure you understand the investment on the development efforts needed in the short/long term.
  15. ----- Meeting Notes (6/3/14 00:24) ----- crossing the chasm - early adopters, early majority, late majority