SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Transforming Mobile Marketing & Advertising™




                        Harnessing s for Big Data
                        Analytics

                                                                   Jobin Wilson
                                                                   jobin.wilson@flytxt.com




                                                                                             Confidential
               Copyright © 2010 Flytxt B.V. All rights reserved.
Who am I ?

   • Architect @ Flytxt (Big Data Analytics & Automation)

   • Passionate about data, distributed computing , machine learning

   • Previously

        •Virtualization & Cloud Lifecycle Management(BMC)

               • Designed and Implemented Cloud Life Cycle Management Interface for BMC

        • Large Scale Data Centre Automation(AOL)

               • Implemented Centralized Data Center Management Framework for AOL

        •Workflow Systems & Automation (Accenture)

               • Implemented Service Management Suit for various customers




                                                                                          Confidential
             Copyright © 2010 Flytxt B.V. All rights reserved.
Session Agenda!

• Data – What's the big deal?

• What is Hadoop( & What it is not  )

• Map-Reduce Model & HDFS

• Hadoop Ecosystem & Tools

• Lets get started!

• Q&A




                                                                    3   Confidential
                Copyright © 2010 Flytxt B.V. All rights reserved.
Five computers & a 640k ;-)


                                                             "I think there is a world market
                                                             for about five computers"
      Moore’s Law
                                                                        Thomas Watson 1943,
                                                                        Chairman of the board of IBM




       "640k ought to be enough for
       anybody"


                          Attributed to
                          Bill Gates in 1981.




                                                                                                       Confidential
         Copyright © 2010 Flytxt B.V. All rights reserved.
Data Explosion !




                                                             Confidential
         Copyright © 2010 Flytxt B.V. All rights reserved.
Do I also know what you might do next summer?


                                        •     Does your travel company know you visited Goa &
                                              Cochin twice in the last two years?

                                        •     Collaborative Filtering




                                        •     Lots of Data + Statistics = WOW!!!

                                        •     BTW, don’t worry about the eqn 




                                                                                                Confidential
        Copyright © 2010 Flytxt B.V. All rights reserved.
Don‟t throw away data just because it doesn't „fit‟


 •   relational tuples, log files, semi structured textual data (e.g., e-mail),pictures
     , videos

 •   User generated data & System generated data

 •   Applications need more than structured data

 •   My application is not “Dumb” any more!!

 •   “I keep saying that the sexy job in the next 10 years will be
      statisticians, and I’m not kidding.” - Hal Varian (Google’s chief economist)




                                                                                          Confidential
                Copyright © 2010 Flytxt B.V. All rights reserved.
Lets get to business!!

What is Apache Hadoop ?

•   Apache Hadoop is an open-source system to
    reliably store and process extremely large data sets
    across many commodity computers.

•   originally developed to support Nutch search engine
    project.

•   scales linearly with data size or analysis complexity

•   Scale-out ,shared nothing architecture

•   inspired by Google's MapReduce and Google File
    System (GFS) papers




                                                                   Confidential
               Copyright © 2010 Flytxt B.V. All rights reserved.
Basics of Hadoop


 •   Two Core Components – HDFS & Map-Reduce

 •   Machines are un-reliable

 •   Separates distributed fault-tolerant computing code from application
     logic.

 •   No need to worry about identity of a machine

 •   lets you interact with a cluster, not a bunch of machines.

 •   Analysis workloads span across multiple machines

 •   runs as a cloud(cluster) & possibly on a cloud (EC2)




                                                                            Confidential
               Copyright © 2010 Flytxt B.V. All rights reserved.
Lead Actors


•   Name Node – Book keeping metadata server

•   Secondary Name Node – Assistant to Name Node

•   Job Tracker – Scheduler

•   Task Tracker - Task execution

•   Data Node - Block storage




                                                                    Confidential
                Copyright © 2010 Flytxt B.V. All rights reserved.
HDFS Write Model




                                                            Confidential
        Copyright © 2010 Flytxt B.V. All rights reserved.
Map-Reduce Model




                                                          Confidential
      Copyright © 2010 Flytxt B.V. All rights reserved.
Map-Reduce Execution Flow




                                                          Confidential
      Copyright © 2010 Flytxt B.V. All rights reserved.
Hadoop Ecosystem
•   Oozie – Open-source workflow/coordination
    service to manage data processing jobs for Apache
    Hadoop™ - Developed at Yahoo!

•   HBase – Column-store database based on
    Google’s BigTable. Holds extremely large data sets
    (Petabytes)

•   Hive – SQL based data warehousing app with
    features for analyzing very large data sets -
    Developed at Facebook

•   Zoo Keeper – Distributed consensus engine
    providing Leader election, service
    discovery, distributed locking / mutual exclusion

•   Pig - platform for analyzing large data sets that
    consists of a high-level language for expressing
    data analysis steps

•   Ganglia - a scalable distributed monitoring system
    for high-performance computing systems such as
    clusters and Grids
                                                                       Confidential
                   Copyright © 2010 Flytxt B.V. All rights reserved.
Hadoop is not a “Holy Grail”

•   Not a substitute for a database

•   MapReduce is not always the best algorithm

•   HDFS is not a substitute for a
    High Availability SAN-hosted FS

•   HDFS is not a Posix file system

•   Not a place to learn Java programming

•   Not a place to learn Unix/Linux system administration

•   Not a place to learn basics of networking




                                                                    Confidential
                Copyright © 2010 Flytxt B.V. All rights reserved.
Notable Users of Hadoop
(Source: http://en.wikipedia.org/wiki/Hadoop)



     • A9.com                               • Meebo
     • AOL                                  • Metaweb
     • EHarmony                             • The New York Times
     • eBay                                 • Rackspace
     • Facebook                             • StumbleUpon
     • Fox Interactive Media                • Twitter
     • IBM                                  • Yahoo
     • Last.fm                              • Amazon
     • LinkedIn




                                                                        Confidential
                    Copyright © 2010 Flytxt B.V. All rights reserved.
Q&A




                                                    www.flytxt.com
                                                    Confidential
Copyright © 2010 Flytxt B.V. All rights reserved.
THANK YOU
      contact us : dev2dev@flytxt.com/ jobin.wilson@flytxt.com




                                                                 www.flytxt.com
                                                                 Confidential   18
Copyright © 2010 Flytxt B.V. All rights reserved.

Contenu connexe

En vedette

20130412 brand management chapter 5 iba 45 e
20130412 brand management chapter 5 iba 45 e20130412 brand management chapter 5 iba 45 e
20130412 brand management chapter 5 iba 45 eZeeshan Huq
 
2011 p5_and_p6_principal's_dialogue_collated_for_uploading
2011  p5_and_p6_principal's_dialogue_collated_for_uploading2011  p5_and_p6_principal's_dialogue_collated_for_uploading
2011 p5_and_p6_principal's_dialogue_collated_for_uploadingalanpillay79
 
Cl introduction of p1_&_p2
Cl introduction of p1_&_p2Cl introduction of p1_&_p2
Cl introduction of p1_&_p2alanpillay79
 
Recommendation engines : Matching items to users
Recommendation engines : Matching items to usersRecommendation engines : Matching items to users
Recommendation engines : Matching items to usersjobinwilson
 
P1 & p2_cl_powerpoint_slides_2011
P1 & p2_cl_powerpoint_slides_2011P1 & p2_cl_powerpoint_slides_2011
P1 & p2_cl_powerpoint_slides_2011alanpillay79
 
20140128 buyer behavior iba mba48 d
20140128 buyer behavior iba mba48 d20140128 buyer behavior iba mba48 d
20140128 buyer behavior iba mba48 dZeeshan Huq
 
TL P1 & P2 parent's briefing 2011
TL P1 & P2 parent's briefing 2011TL P1 & P2 parent's briefing 2011
TL P1 & P2 parent's briefing 2011alanpillay79
 
Building apps with HBase - Big Data TechCon Boston
Building apps with HBase - Big Data TechCon BostonBuilding apps with HBase - Big Data TechCon Boston
Building apps with HBase - Big Data TechCon Bostonamansk
 
Brightwater Engineering General Presentation
Brightwater Engineering General PresentationBrightwater Engineering General Presentation
Brightwater Engineering General Presentationfletcher_mat
 
Pptpollution 111024083127-phpapp01
Pptpollution 111024083127-phpapp01Pptpollution 111024083127-phpapp01
Pptpollution 111024083127-phpapp01Mukesh Thakur
 
Pharmapack 2012 Competitive Intelligence Report
Pharmapack 2012 Competitive Intelligence ReportPharmapack 2012 Competitive Intelligence Report
Pharmapack 2012 Competitive Intelligence ReportViedoc
 
Program Komuniti Tone Plus
Program Komuniti Tone PlusProgram Komuniti Tone Plus
Program Komuniti Tone PlusVun Chee Vui
 
Rapport de veille_salon_texworld_paris_2010
Rapport de veille_salon_texworld_paris_2010Rapport de veille_salon_texworld_paris_2010
Rapport de veille_salon_texworld_paris_2010Viedoc
 
IT & Big Data 2012 Report
IT & Big Data 2012 ReportIT & Big Data 2012 Report
IT & Big Data 2012 ReportViedoc
 
Mauricio Escalante Tarea Decalogo
Mauricio Escalante Tarea DecalogoMauricio Escalante Tarea Decalogo
Mauricio Escalante Tarea DecalogoMauricio Escalante
 
CFIA 2012 Food Industry ingredients Competitive Intelligence Report
CFIA 2012 Food Industry ingredients Competitive Intelligence ReportCFIA 2012 Food Industry ingredients Competitive Intelligence Report
CFIA 2012 Food Industry ingredients Competitive Intelligence ReportViedoc
 
20140117 buyer behavior iba mba48 d
20140117 buyer behavior iba mba48 d20140117 buyer behavior iba mba48 d
20140117 buyer behavior iba mba48 dZeeshan Huq
 

En vedette (20)

20130412 brand management chapter 5 iba 45 e
20130412 brand management chapter 5 iba 45 e20130412 brand management chapter 5 iba 45 e
20130412 brand management chapter 5 iba 45 e
 
2011 p5_and_p6_principal's_dialogue_collated_for_uploading
2011  p5_and_p6_principal's_dialogue_collated_for_uploading2011  p5_and_p6_principal's_dialogue_collated_for_uploading
2011 p5_and_p6_principal's_dialogue_collated_for_uploading
 
Monavie Presentation
Monavie PresentationMonavie Presentation
Monavie Presentation
 
Cl introduction of p1_&_p2
Cl introduction of p1_&_p2Cl introduction of p1_&_p2
Cl introduction of p1_&_p2
 
Recommendation engines : Matching items to users
Recommendation engines : Matching items to usersRecommendation engines : Matching items to users
Recommendation engines : Matching items to users
 
P1 & p2_cl_powerpoint_slides_2011
P1 & p2_cl_powerpoint_slides_2011P1 & p2_cl_powerpoint_slides_2011
P1 & p2_cl_powerpoint_slides_2011
 
Viral marketing
Viral marketingViral marketing
Viral marketing
 
20140128 buyer behavior iba mba48 d
20140128 buyer behavior iba mba48 d20140128 buyer behavior iba mba48 d
20140128 buyer behavior iba mba48 d
 
TL P1 & P2 parent's briefing 2011
TL P1 & P2 parent's briefing 2011TL P1 & P2 parent's briefing 2011
TL P1 & P2 parent's briefing 2011
 
Building apps with HBase - Big Data TechCon Boston
Building apps with HBase - Big Data TechCon BostonBuilding apps with HBase - Big Data TechCon Boston
Building apps with HBase - Big Data TechCon Boston
 
Brightwater Engineering General Presentation
Brightwater Engineering General PresentationBrightwater Engineering General Presentation
Brightwater Engineering General Presentation
 
Pptpollution 111024083127-phpapp01
Pptpollution 111024083127-phpapp01Pptpollution 111024083127-phpapp01
Pptpollution 111024083127-phpapp01
 
Budjettikone
BudjettikoneBudjettikone
Budjettikone
 
Pharmapack 2012 Competitive Intelligence Report
Pharmapack 2012 Competitive Intelligence ReportPharmapack 2012 Competitive Intelligence Report
Pharmapack 2012 Competitive Intelligence Report
 
Program Komuniti Tone Plus
Program Komuniti Tone PlusProgram Komuniti Tone Plus
Program Komuniti Tone Plus
 
Rapport de veille_salon_texworld_paris_2010
Rapport de veille_salon_texworld_paris_2010Rapport de veille_salon_texworld_paris_2010
Rapport de veille_salon_texworld_paris_2010
 
IT & Big Data 2012 Report
IT & Big Data 2012 ReportIT & Big Data 2012 Report
IT & Big Data 2012 Report
 
Mauricio Escalante Tarea Decalogo
Mauricio Escalante Tarea DecalogoMauricio Escalante Tarea Decalogo
Mauricio Escalante Tarea Decalogo
 
CFIA 2012 Food Industry ingredients Competitive Intelligence Report
CFIA 2012 Food Industry ingredients Competitive Intelligence ReportCFIA 2012 Food Industry ingredients Competitive Intelligence Report
CFIA 2012 Food Industry ingredients Competitive Intelligence Report
 
20140117 buyer behavior iba mba48 d
20140117 buyer behavior iba mba48 d20140117 buyer behavior iba mba48 d
20140117 buyer behavior iba mba48 d
 

Similaire à Harnessing hadoop for big data analytics v0.1

Leveraging open source for big data stack
Leveraging open source for big data stackLeveraging open source for big data stack
Leveraging open source for big data stackFlytxt
 
HTML5--The 30,000' View (A fast-paced overview of HTML5)
HTML5--The 30,000' View (A fast-paced overview of HTML5)HTML5--The 30,000' View (A fast-paced overview of HTML5)
HTML5--The 30,000' View (A fast-paced overview of HTML5)Peter Lubbers
 
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...Taras Filatov
 
Putting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data StoresPutting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data StoresDATAVERSITY
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Adam Muise
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldSean Roberts
 
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012   moscow)Open web platform talk by daniel hladky at rif 2012 (19 april 2012   moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)AI4BD GmbH
 
SharePoint from the Forms-Eye View
SharePoint from the Forms-Eye ViewSharePoint from the Forms-Eye View
SharePoint from the Forms-Eye ViewSteve Weissman
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Building a modern data platform on AWS. Utrecht AWS Dev Day
Building a modern data platform on AWS. Utrecht AWS Dev DayBuilding a modern data platform on AWS. Utrecht AWS Dev Day
Building a modern data platform on AWS. Utrecht AWS Dev Dayjavier ramirez
 
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...Cloudera, Inc.
 
Tw Technology Radar Qtb Sep11
Tw Technology Radar Qtb Sep11Tw Technology Radar Qtb Sep11
Tw Technology Radar Qtb Sep11Adrian Treacy
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorialmarkgrover
 
Alex Wade, Digital Library Interoperability
Alex Wade, Digital Library InteroperabilityAlex Wade, Digital Library Interoperability
Alex Wade, Digital Library Interoperabilityparker01
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
Visualizing IoT: Rapid Business Data Discovery for the Internet of Things
Visualizing IoT: Rapid Business Data Discovery for the Internet of ThingsVisualizing IoT: Rapid Business Data Discovery for the Internet of Things
Visualizing IoT: Rapid Business Data Discovery for the Internet of ThingsMia Yuan Cao
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 KeynotePeter Wang
 

Similaire à Harnessing hadoop for big data analytics v0.1 (20)

Leveraging open source for big data stack
Leveraging open source for big data stackLeveraging open source for big data stack
Leveraging open source for big data stack
 
HTML5--The 30,000' View (A fast-paced overview of HTML5)
HTML5--The 30,000' View (A fast-paced overview of HTML5)HTML5--The 30,000' View (A fast-paced overview of HTML5)
HTML5--The 30,000' View (A fast-paced overview of HTML5)
 
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...
 
Html5 Flyover
Html5 FlyoverHtml5 Flyover
Html5 Flyover
 
Putting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data StoresPutting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data Stores
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
 
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012   moscow)Open web platform talk by daniel hladky at rif 2012 (19 april 2012   moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)
 
SharePoint from the Forms-Eye View
SharePoint from the Forms-Eye ViewSharePoint from the Forms-Eye View
SharePoint from the Forms-Eye View
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Building a modern data platform on AWS. Utrecht AWS Dev Day
Building a modern data platform on AWS. Utrecht AWS Dev DayBuilding a modern data platform on AWS. Utrecht AWS Dev Day
Building a modern data platform on AWS. Utrecht AWS Dev Day
 
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
 
Tw Technology Radar Qtb Sep11
Tw Technology Radar Qtb Sep11Tw Technology Radar Qtb Sep11
Tw Technology Radar Qtb Sep11
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorial
 
IBM Watson
IBM WatsonIBM Watson
IBM Watson
 
Alex Wade, Digital Library Interoperability
Alex Wade, Digital Library InteroperabilityAlex Wade, Digital Library Interoperability
Alex Wade, Digital Library Interoperability
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Plug 20110217
Plug   20110217Plug   20110217
Plug 20110217
 
Visualizing IoT: Rapid Business Data Discovery for the Internet of Things
Visualizing IoT: Rapid Business Data Discovery for the Internet of ThingsVisualizing IoT: Rapid Business Data Discovery for the Internet of Things
Visualizing IoT: Rapid Business Data Discovery for the Internet of Things
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 

Dernier

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 

Dernier (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 

Harnessing hadoop for big data analytics v0.1

  • 1. Transforming Mobile Marketing & Advertising™ Harnessing s for Big Data Analytics Jobin Wilson jobin.wilson@flytxt.com Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 2. Who am I ? • Architect @ Flytxt (Big Data Analytics & Automation) • Passionate about data, distributed computing , machine learning • Previously •Virtualization & Cloud Lifecycle Management(BMC) • Designed and Implemented Cloud Life Cycle Management Interface for BMC • Large Scale Data Centre Automation(AOL) • Implemented Centralized Data Center Management Framework for AOL •Workflow Systems & Automation (Accenture) • Implemented Service Management Suit for various customers Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 3. Session Agenda! • Data – What's the big deal? • What is Hadoop( & What it is not  ) • Map-Reduce Model & HDFS • Hadoop Ecosystem & Tools • Lets get started! • Q&A 3 Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 4. Five computers & a 640k ;-) "I think there is a world market for about five computers" Moore’s Law Thomas Watson 1943, Chairman of the board of IBM "640k ought to be enough for anybody" Attributed to Bill Gates in 1981. Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 5. Data Explosion ! Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 6. Do I also know what you might do next summer? • Does your travel company know you visited Goa & Cochin twice in the last two years? • Collaborative Filtering • Lots of Data + Statistics = WOW!!! • BTW, don’t worry about the eqn  Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 7. Don‟t throw away data just because it doesn't „fit‟ • relational tuples, log files, semi structured textual data (e.g., e-mail),pictures , videos • User generated data & System generated data • Applications need more than structured data • My application is not “Dumb” any more!! • “I keep saying that the sexy job in the next 10 years will be statisticians, and I’m not kidding.” - Hal Varian (Google’s chief economist) Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 8. Lets get to business!! What is Apache Hadoop ? • Apache Hadoop is an open-source system to reliably store and process extremely large data sets across many commodity computers. • originally developed to support Nutch search engine project. • scales linearly with data size or analysis complexity • Scale-out ,shared nothing architecture • inspired by Google's MapReduce and Google File System (GFS) papers Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 9. Basics of Hadoop • Two Core Components – HDFS & Map-Reduce • Machines are un-reliable • Separates distributed fault-tolerant computing code from application logic. • No need to worry about identity of a machine • lets you interact with a cluster, not a bunch of machines. • Analysis workloads span across multiple machines • runs as a cloud(cluster) & possibly on a cloud (EC2) Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 10. Lead Actors • Name Node – Book keeping metadata server • Secondary Name Node – Assistant to Name Node • Job Tracker – Scheduler • Task Tracker - Task execution • Data Node - Block storage Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 11. HDFS Write Model Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 12. Map-Reduce Model Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 13. Map-Reduce Execution Flow Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 14. Hadoop Ecosystem • Oozie – Open-source workflow/coordination service to manage data processing jobs for Apache Hadoop™ - Developed at Yahoo! • HBase – Column-store database based on Google’s BigTable. Holds extremely large data sets (Petabytes) • Hive – SQL based data warehousing app with features for analyzing very large data sets - Developed at Facebook • Zoo Keeper – Distributed consensus engine providing Leader election, service discovery, distributed locking / mutual exclusion • Pig - platform for analyzing large data sets that consists of a high-level language for expressing data analysis steps • Ganglia - a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 15. Hadoop is not a “Holy Grail” • Not a substitute for a database • MapReduce is not always the best algorithm • HDFS is not a substitute for a High Availability SAN-hosted FS • HDFS is not a Posix file system • Not a place to learn Java programming • Not a place to learn Unix/Linux system administration • Not a place to learn basics of networking Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 16. Notable Users of Hadoop (Source: http://en.wikipedia.org/wiki/Hadoop) • A9.com • Meebo • AOL • Metaweb • EHarmony • The New York Times • eBay • Rackspace • Facebook • StumbleUpon • Fox Interactive Media • Twitter • IBM • Yahoo • Last.fm • Amazon • LinkedIn Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 17. Q&A www.flytxt.com Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 18. THANK YOU contact us : dev2dev@flytxt.com/ jobin.wilson@flytxt.com www.flytxt.com Confidential 18 Copyright © 2010 Flytxt B.V. All rights reserved.