SlideShare une entreprise Scribd logo
1  sur  30
How To Implement Hadoop
Successfully!
Based on: Avinash Kaushik
By Adir Sharabi
24%
of Hadoop projects are
actually in production.
Only
By Rainstor
About Conduit
Over 250 million active end users
More than 260,000 publishers
Over 3 billion monthly user interactions
Deployed in 120 countries
Founded in 2005
Acquired Wibiya in 2011
Product Offering
B2B B2C
Agg.
Files
Usage
Files
Usage Records
Hadoop
Hbase
HDFS
DWH
Product
Optimization Engine
Insights
Hive
MySQL
Hue
Integration Services
Reporting Services
Business Objects
R
Mahout
Oozie
Conduit’s Data Platform
Business
Streaming
Kafka WEPs
Real Time
Monitoring
Tip #1
Don't buy the hype of
‘big data’ and throw
millions of dollars away,
but don’t stand still.
Tip #1
 Select 1 well defined use case
 Small super-smart team
 Experiment on the cloud
 Quantify the effort and value for your organization
 ‘fail faster while failing forward’
Conduit’s initial use case
Merge Extract
Users Pings Users Table Daily
Installations
50M 600M
7 Hour 1 Hour
Before: 8-10 Hours
Merge Extract
Users Pings Users Table Daily
Installations
120M 2.2B
Today: 30 Minutes!
0
20
40
60
80
100
120
140
160
180
200
220
240
260
280
300
320
340
360
380
400
420
440
460
data size (TB) # of Nodes
Conduit’s Big Data Growth (5TB to 500TB)
Jan 2009
DWH Launched
Mar 2010
Hadoop Launched
on cloud (8 nodes)
Feb 2011
Hadoop Deployed
on conduit’s data center
(72 nodes)
Jan & Oct 2012
Procurement
(105/120 nodes)
Sep 2013
Procurement – DR
Conduit’s Data Platform in Numbers
• Hardware:
125 Nodes (+70 after DR) on 6 racks
500TB Used/1.2 PB Total
• Daily processed data:
50,000 files
500,000,000 records
700 GB
• Daily jobs submitted: Over 5,000
• Data freshness: 60 minutes
Tip #2
Data is turning challenges
into business opportunities.
8%
8%
9%
9%
10%
11%
13%
15%
19%
0% 5% 10% 15% 20%
analyze complete rather than partial data sets
other
Customer intelligence for more targeted
marketing
Include more semi-structure/unstructured info
into decision making
Improve scientific research
ETL
log analysis
Reduce cost of data analysis
Mine data for business intelligence
Use Cases
Business Model Maturity Index
Business
Insights
Business
Optimization
Business
Monitoring
Data
Monetization
Business
Metamorphosis
Monitoring
business
performance to
flag areas of
interest
Integrate insights
&
recommendations
into existing
business processes
Embed analytics
to optimize
business
processes
Leverage insights
to identify new
revenue
opportunities
Transform
customer and
product insights
to move into
new markets
© Copyright 2013 EMC Corporation. All rights reserved
But…
 Hadoop in the Enterprise Eco System – lot of the features
Enterprises need or want are put on the back seat
 Hadoop is NOT cheap (H/W & operations cost) – Make
sure company’s decision makers are on board
 Hadoop is still rough on the edges – tooling may not be
as mature as Enterprises are used to
 Data access is batch oriented
Tip #3
The 10/90 rule for magnificent
data success.
Tip #3
 Nurture your ‘big brains’
 Hadoop cutting edge technology – Investment in related
skills and training is crucial
 Good Data Scientists are “unicorns”
 Embrace the Open Source culture it will payoff
 BI team is essential for connecting the dots
Data Roles @ Conduit
Product
Mobile
Data Infra Team
Data BI Team
Data Science Team
Wibiya Quick Launch
Toolbar
BI
Scientist Scientist Scientist Scientist
BI BI BI
Other
Scientist
BI
Tip #4
Shoot for right time data,
not real time data.
Tip #4
 Complex decision making is time consuming therefore
unable to react in real time
 Real time is expensive!
 Taylor the right solution to accommodate the required data
freshness
 Focus on big things!
Data Maturity vs. Freshness @Conduit
0 10 60
Low
Medium
High
Real Time
Monitoring
Hue/Hive
Reporting
Service
Advanced
Analytics
Models
Business
Objective
Advanced
Analytics
Models
Reporting
Service
Freshness
Data Maturity
(Structured,
cleansed &
completed(
Hadoop
DWH
Kafka
Tip #5
Data quality sucks,
just get over it!
Tip #5
 Data will be dirty, schema-less, no foreign keys
 And yet, we are standing on a mountain of gold!
 Make your best and know when to shift to data analysis
 Tune your algorithms to tolerate data deficiencies then
hunt for insights
 Big data is not Data Warehouse
Tip #6
Democratize the data.
Tip #6
Tip #6
Tip #6
Tip #6
Tip #6
 Break down barriers preventing our users/applications from
using their valuable data in more effective ways to glean
meaningful insights
 Provide your users advanced self service tools to access the
data
 Hadoop ecosystem evolving as we speak
 Your performance is measured by the tools effectiveness
and ease of use
To Summarize…
• Start small
• Identify the opportunities
• Invest in people & related skills
• Adjust processes to the organization needs
• Know your data limits
• Self Service Tools are extremely important
Q&A
il.linkedin.com/pub/adir-sharabi/3b/6ab/510/

Contenu connexe

Tendances

Forrester’s View on Accelerating Analytics and Insights with Data Prep
Forrester’s View on Accelerating Analytics and Insights with Data PrepForrester’s View on Accelerating Analytics and Insights with Data Prep
Forrester’s View on Accelerating Analytics and Insights with Data PrepDatawatchCorporation
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitzRaghu Kashyap
 
The Journey to Success with Big Data
The Journey to Success with Big DataThe Journey to Success with Big Data
The Journey to Success with Big DataCloudera, Inc.
 
Moving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in HealthcareMoving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in HealthcarePerficient, Inc.
 
Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Raghu Kashyap
 
Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera, Inc.
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersDatameer
 
8 from zero to insight with real time big data
8 from zero to insight with real time big data8 from zero to insight with real time big data
8 from zero to insight with real time big dataDr. Wilfred Lin (Ph.D.)
 
Transforming Business for the Digital Age (Presented by Microsoft)
Transforming Business for the Digital Age (Presented by Microsoft)Transforming Business for the Digital Age (Presented by Microsoft)
Transforming Business for the Digital Age (Presented by Microsoft)Cloudera, Inc.
 
Informatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemInformatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemCapgemini
 
Accelerate Data Warehousing Projects with Automation and Data Replication
Accelerate Data Warehousing Projects with Automation and Data ReplicationAccelerate Data Warehousing Projects with Automation and Data Replication
Accelerate Data Warehousing Projects with Automation and Data ReplicationWhereScape
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaCloudera, Inc.
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
 
Analytics Solutions from SAP
Analytics Solutions from SAPAnalytics Solutions from SAP
Analytics Solutions from SAPSAP Analytics
 
The Big Picture: Real-time Data is Defining Intelligent Offers
The Big Picture: Real-time Data is Defining Intelligent OffersThe Big Picture: Real-time Data is Defining Intelligent Offers
The Big Picture: Real-time Data is Defining Intelligent OffersCloudera, Inc.
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonCapgemini
 
Optimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsOptimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsCloudera, Inc.
 
Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Datameer
 
Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer
 
The Five Markers on Your Big Data Journey
The Five Markers on Your Big Data JourneyThe Five Markers on Your Big Data Journey
The Five Markers on Your Big Data JourneyCloudera, Inc.
 

Tendances (20)

Forrester’s View on Accelerating Analytics and Insights with Data Prep
Forrester’s View on Accelerating Analytics and Insights with Data PrepForrester’s View on Accelerating Analytics and Insights with Data Prep
Forrester’s View on Accelerating Analytics and Insights with Data Prep
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitz
 
The Journey to Success with Big Data
The Journey to Success with Big DataThe Journey to Success with Big Data
The Journey to Success with Big Data
 
Moving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in HealthcareMoving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in Healthcare
 
Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011
 
Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learning
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business Managers
 
8 from zero to insight with real time big data
8 from zero to insight with real time big data8 from zero to insight with real time big data
8 from zero to insight with real time big data
 
Transforming Business for the Digital Age (Presented by Microsoft)
Transforming Business for the Digital Age (Presented by Microsoft)Transforming Business for the Digital Age (Presented by Microsoft)
Transforming Business for the Digital Age (Presented by Microsoft)
 
Informatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemInformatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake Ecosystem
 
Accelerate Data Warehousing Projects with Automation and Data Replication
Accelerate Data Warehousing Projects with Automation and Data ReplicationAccelerate Data Warehousing Projects with Automation and Data Replication
Accelerate Data Warehousing Projects with Automation and Data Replication
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and Cloudera
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Analytics Solutions from SAP
Analytics Solutions from SAPAnalytics Solutions from SAP
Analytics Solutions from SAP
 
The Big Picture: Real-time Data is Defining Intelligent Offers
The Big Picture: Real-time Data is Defining Intelligent OffersThe Big Picture: Real-time Data is Defining Intelligent Offers
The Big Picture: Real-time Data is Defining Intelligent Offers
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
Optimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsOptimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analytics
 
Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User
 
Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2
 
The Five Markers on Your Big Data Journey
The Five Markers on Your Big Data JourneyThe Five Markers on Your Big Data Journey
The Five Markers on Your Big Data Journey
 

Similaire à How to implement hadoop successfuly

2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 
Operationalizing Data Analytics
Operationalizing Data AnalyticsOperationalizing Data Analytics
Operationalizing Data AnalyticsVMware Tanzu
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with HadoopPrecisely
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldCA Technologies
 
Automate Hadoop Jobs with Real World Business Impact
Automate Hadoop Jobs with Real World Business ImpactAutomate Hadoop Jobs with Real World Business Impact
Automate Hadoop Jobs with Real World Business ImpactCA Technologies
 
Create your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseCreate your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseJeff Kelly
 
Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?Inside Analysis
 
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...ModusOptimum
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunitiesBigdata Meetup Kochi
 
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...Cloudera, Inc.
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaCloudera, Inc.
 
Hadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyHadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyDataWorks Summit
 
A better business case for big data with Hadoop
A better business case for big data with HadoopA better business case for big data with Hadoop
A better business case for big data with HadoopAptitude Software
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopDatameer
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsJane Roberts
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 

Similaire à How to implement hadoop successfuly (20)

2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Operationalizing Data Analytics
Operationalizing Data AnalyticsOperationalizing Data Analytics
Operationalizing Data Analytics
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
Automate Hadoop Jobs with Real World Business Impact
Automate Hadoop Jobs with Real World Business ImpactAutomate Hadoop Jobs with Real World Business Impact
Automate Hadoop Jobs with Real World Business Impact
 
Create your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseCreate your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouse
 
Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?
 
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 
Hadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyHadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata Company
 
A better business case for big data with Hadoop
A better business case for big data with HadoopA better business case for big data with Hadoop
A better business case for big data with Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 

Dernier

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 

Dernier (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

How to implement hadoop successfuly

  • 1. How To Implement Hadoop Successfully! Based on: Avinash Kaushik By Adir Sharabi
  • 2. 24% of Hadoop projects are actually in production. Only By Rainstor
  • 3. About Conduit Over 250 million active end users More than 260,000 publishers Over 3 billion monthly user interactions Deployed in 120 countries Founded in 2005 Acquired Wibiya in 2011
  • 5. Agg. Files Usage Files Usage Records Hadoop Hbase HDFS DWH Product Optimization Engine Insights Hive MySQL Hue Integration Services Reporting Services Business Objects R Mahout Oozie Conduit’s Data Platform Business Streaming Kafka WEPs Real Time Monitoring
  • 6. Tip #1 Don't buy the hype of ‘big data’ and throw millions of dollars away, but don’t stand still.
  • 7. Tip #1  Select 1 well defined use case  Small super-smart team  Experiment on the cloud  Quantify the effort and value for your organization  ‘fail faster while failing forward’
  • 8. Conduit’s initial use case Merge Extract Users Pings Users Table Daily Installations 50M 600M 7 Hour 1 Hour Before: 8-10 Hours Merge Extract Users Pings Users Table Daily Installations 120M 2.2B Today: 30 Minutes!
  • 9. 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420 440 460 data size (TB) # of Nodes Conduit’s Big Data Growth (5TB to 500TB) Jan 2009 DWH Launched Mar 2010 Hadoop Launched on cloud (8 nodes) Feb 2011 Hadoop Deployed on conduit’s data center (72 nodes) Jan & Oct 2012 Procurement (105/120 nodes) Sep 2013 Procurement – DR
  • 10. Conduit’s Data Platform in Numbers • Hardware: 125 Nodes (+70 after DR) on 6 racks 500TB Used/1.2 PB Total • Daily processed data: 50,000 files 500,000,000 records 700 GB • Daily jobs submitted: Over 5,000 • Data freshness: 60 minutes
  • 11. Tip #2 Data is turning challenges into business opportunities.
  • 12. 8% 8% 9% 9% 10% 11% 13% 15% 19% 0% 5% 10% 15% 20% analyze complete rather than partial data sets other Customer intelligence for more targeted marketing Include more semi-structure/unstructured info into decision making Improve scientific research ETL log analysis Reduce cost of data analysis Mine data for business intelligence Use Cases
  • 13. Business Model Maturity Index Business Insights Business Optimization Business Monitoring Data Monetization Business Metamorphosis Monitoring business performance to flag areas of interest Integrate insights & recommendations into existing business processes Embed analytics to optimize business processes Leverage insights to identify new revenue opportunities Transform customer and product insights to move into new markets © Copyright 2013 EMC Corporation. All rights reserved
  • 14. But…  Hadoop in the Enterprise Eco System – lot of the features Enterprises need or want are put on the back seat  Hadoop is NOT cheap (H/W & operations cost) – Make sure company’s decision makers are on board  Hadoop is still rough on the edges – tooling may not be as mature as Enterprises are used to  Data access is batch oriented
  • 15. Tip #3 The 10/90 rule for magnificent data success.
  • 16. Tip #3  Nurture your ‘big brains’  Hadoop cutting edge technology – Investment in related skills and training is crucial  Good Data Scientists are “unicorns”  Embrace the Open Source culture it will payoff  BI team is essential for connecting the dots
  • 17. Data Roles @ Conduit Product Mobile Data Infra Team Data BI Team Data Science Team Wibiya Quick Launch Toolbar BI Scientist Scientist Scientist Scientist BI BI BI Other Scientist BI
  • 18. Tip #4 Shoot for right time data, not real time data.
  • 19. Tip #4  Complex decision making is time consuming therefore unable to react in real time  Real time is expensive!  Taylor the right solution to accommodate the required data freshness  Focus on big things!
  • 20. Data Maturity vs. Freshness @Conduit 0 10 60 Low Medium High Real Time Monitoring Hue/Hive Reporting Service Advanced Analytics Models Business Objective Advanced Analytics Models Reporting Service Freshness Data Maturity (Structured, cleansed & completed( Hadoop DWH Kafka
  • 21. Tip #5 Data quality sucks, just get over it!
  • 22. Tip #5  Data will be dirty, schema-less, no foreign keys  And yet, we are standing on a mountain of gold!  Make your best and know when to shift to data analysis  Tune your algorithms to tolerate data deficiencies then hunt for insights  Big data is not Data Warehouse
  • 28. Tip #6  Break down barriers preventing our users/applications from using their valuable data in more effective ways to glean meaningful insights  Provide your users advanced self service tools to access the data  Hadoop ecosystem evolving as we speak  Your performance is measured by the tools effectiveness and ease of use
  • 29. To Summarize… • Start small • Identify the opportunities • Invest in people & related skills • Adjust processes to the organization needs • Know your data limits • Self Service Tools are extremely important

Notes de l'éditeur

  1. Hadoop in the Enterprise Eco System Hadoop is designed to solve Big Data problems encountered by Web and Social companies. In doing so, lot of the features Enterprises need or want are put on the back seat. For example HDFS does not offer native support for security and authentication. Hadoop is NOT cheap Hardware Cost - lets say a Hadoop node is $5000. A 100 node cluster would be $500,000 for hardware. IT and Operations costs - teams like : Network Admins, IT, Security Admins, System Admins. Also one needs to think about operational costs like Data Center expenses : cooling, electricity ..etc Hadoop is still rough on the edges The development and admin tools for Hadoop are still pretty new. Companies like Cloudera, Horton Works, MapR and Karmasphere have been working on this issue. How ever the tooling may not be as mature as Enterprises are used to (say Oracle Admin ..etc)