SlideShare a Scribd company logo
1 of 17
COMBINING HADOOP &
VERTICA FOR LARGE SCALE
ANALYTICS
Hadoop Summit 2012




Shilpa Lawande
VP Engineering, Vertica, an HP Company
1
    ©2011 Hewlett-Packard Development Company, L.P.
    The information contained herein is subject to change without notice
The Big Data Problem




2
The (popular) Solution




    Vertica’s Real-time Analytics
                  +
    Scale & Flexibility of Hadoop
3
What is Vertica?
                                                                      Speed

    •    SQL Database for Real-time Analytics         Services
          Cloud

    •    Runs on x86 hardware
    •    MPP Columnar Architecture – scales to PBs!
                  Monetize
                                             Real
                                             Time


    •    Extensible analytics capabilities
    •    Easy to setup and use
    •    Elastic - grow/shrink as needed
                   Better
                  Decisions
                                         Statistics



         Mobile                                       Individual
                              Analysis




                                                                   Simplicity
4
What Analytics can Vertica do?




     SQL                   Extended
                           SQL
     •  Window
        functions          •  Sessionization           SDKs
     •  Graph              •  Time series              •  C++
     •  Monte Carlo        •  Pattern                  •  R
                              matching
     •  Statistical
                           •  Event series
     •  Geospatial
                              joins




5   Check out: https://github.com/vertica/Vertica-Extension-Packages
Who uses Vertica?

        600+
    Customers worldwide     “… by partnering with Vertica
                            we’re able to provide operators the
                            tools they need to confidently
                            interpret customer experience…”
                                   Steve Kish, Director, Product Management Empirix




                              “…being able to run social graph
                            analysis on tables with tens of billions
                              of rows with a fast turn around is
                                         amazing…”
                                    Dan McCaffrey, Director of Analytics, Zynga




6
What's different – Hadoop vs Vertica?




    Vertica                                       Hadoop
                                  Both
    •  Designed for           Purpose-built       •  Designed for
       Performance              Scalable             Fault-tolerance
    •  SQL                      Analytics         •  Map-Reduce
    •  Interactive             Platforms          •  Batch Analytics
       Analytics




        Read: http://www.vertica.com/2011/09/21/counting-triangles/
7
Getting the best of both worlds!

                             SQL/                                           Extensions
                                           	
  	
                           In C++, R
                             ODBC/         	
  	
  	
  Ver%ca	
  	
  
                             JDBC          	
  	
  	
  	
  Engine	
  




                                                                        External Tables
                                       Native




                                             User-defined Loads
                      Ver%ca	
  	
  
                      Storage	
  
   8                                       Hadoop/MR Connector
New in Vertica 6
Joint Use Cases

Hadoop for ETL, Vertica for Analytics
    •  Logparsing / tagging / filtering
    •  Convert JSON into relational tuples

HDFS for data storage, Vertica + Hadoop for Analytics
    •  Real-timeanalytics on Vertica (needs speed)
    •  Long-running / exploratory analytics on Hadoop (needs fault tolerance)
    •  Load from HDFS directly to Vertica (needs Vertica 6)
    •  SQL access to HDFS (needs Vertica 6)

Vertica for data storage, Hadoop as a multi-purpose tool
    •  Hadoop as a scheduler / load-balancer
    •  Hadoop to convert to formats for other tools (e.g. STATA)
    •  Hadoop for Backup via Sqoop


9
Customer Stories




10
Accelerating Drug Discovery




                                     The solution
•  Analyzing gene                                                •  Queries went from 5
   variants using SNPs                                              hours to 5 minutes
   and Microarray data   •  Hadoop to find the variants          •  Scale to 100s of TB of
                            between a sample sequence               data
                            and a reference genome               •  More experiments =>
                         •  Vertica to determine oncology           faster discoveries!
                            targets
                         •  Tools: Pipeline Pilot, Spotfire, R
     The problem                                                       The value


11
Digital Consumer Insights


     •  HDFS to store raw    •  Vertica to store &    Faster insights
     input behavioral data   operationalize high     delivered more
     •  Hadoop / MR to       value biz data          consistently with less
     find conversions        •  Reporting &          administrative
     (regexp processing)     analytics via Tableau   overhead, and
                             and R                   cheaper hardware!!
                             •  Custom ETL




12
On a Privacy Assurance Mission


             Collect user              Use MR to                 Use Vertica to
          privacy reporting           process and               analyze stats for
            requests into          structure the data            every 3rd party
               HDFS                into Vertica (ETL)          tag on a website.




     For Consumers:                         For Advertisers:

                                            Provide greater transparency
                                            to end-users    (look for               on an a
     A free browser plugin that
     can tell you who’s tracking            Understand impact of 3rd party tags on
     you!                                   website performance

13
Social Video
      Social Video Analytics                                     Social Video Advertising




▫  Video analytics – 100+ Leading Pubs
                                                    Hadoop for batch processing
                                                    of logs and ETL into Vertica
       ▫ Campaign Measurement – 100+ major brands
                                                    Vertica for ad-hoc analytics
                                                    and interactive dashboards
▫ Industry-Wide Charts


                                                    Redis KV store for serving
                                                    low-latency data needs


                     100s of millions of events collected and processed
                           daily on Petabyte scale infrastructure!


14
Try Vertica for free!



     Community Edition

     Up to 1 TB limit, 3 nodes!

     Check out Vertica
       Extensions on Github!




15
References and Other Info …


 Website:    www.vertica.com

 Community Edition: http://www.vertica.com/community/

 Github:    https://github.com/vertica/Vertica-Extension-Packages

 Questions or Comments: shilpa@vertica.com

 Jobs: resumes@vertica.com     (Awesome new location in Cambridge,
 MA!)

 Follow us on Twitter: @slawande, @verticacorp

16
Sessions will resume at 2:25pm




                             Page 17

More Related Content

What's hot

Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
DataWorks Summit
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
OW2
 
Delivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated ArchitectureDelivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated Architecture
DataWorks Summit
 
APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna Hiremane
IntelAPAC
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...
DataWorks Summit
 

What's hot (20)

End-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed REnd-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed R
 
How Salesforce.com uses Hadoop
How Salesforce.com uses HadoopHow Salesforce.com uses Hadoop
How Salesforce.com uses Hadoop
 
Modernizing your Application Architecture with Microservices
Modernizing your Application Architecture with MicroservicesModernizing your Application Architecture with Microservices
Modernizing your Application Architecture with Microservices
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
Cascading User Group Meet
Cascading User Group MeetCascading User Group Meet
Cascading User Group Meet
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with Hadoop
 
2 - Trafodion and Hadoop HBase
2 - Trafodion and Hadoop HBase2 - Trafodion and Hadoop HBase
2 - Trafodion and Hadoop HBase
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
 
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
 
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
 
Delivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated ArchitectureDelivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated Architecture
 
APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna Hiremane
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...
 

Viewers also liked

Buscadores
Buscadores Buscadores
Buscadores
PauColdR
 
TERRENOS DE PLANTACIONES DE PIÑA
TERRENOS DE PLANTACIONES DE PIÑATERRENOS DE PLANTACIONES DE PIÑA
TERRENOS DE PLANTACIONES DE PIÑA
yensen rioja
 
Expressions pour utiliser en cours de français2
Expressions pour utiliser en cours de français2Expressions pour utiliser en cours de français2
Expressions pour utiliser en cours de français2
Francés Sari
 
Integrated Communications Summer Internship 2016
Integrated Communications Summer Internship 2016Integrated Communications Summer Internship 2016
Integrated Communications Summer Internship 2016
Heather Ellis
 
Lomce datorkigu /Lo que nos viene con la Lomce
Lomce datorkigu /Lo que nos viene con la LomceLomce datorkigu /Lo que nos viene con la Lomce
Lomce datorkigu /Lo que nos viene con la Lomce
eguzsal
 

Viewers also liked (20)

Grammaire intermediare
Grammaire   intermediareGrammaire   intermediare
Grammaire intermediare
 
Big Data in Disease Management
Big Data in Disease ManagementBig Data in Disease Management
Big Data in Disease Management
 
Contrato servicio profesionales pdvsa
Contrato servicio profesionales pdvsaContrato servicio profesionales pdvsa
Contrato servicio profesionales pdvsa
 
updated resume 2016 - QS
updated resume 2016 - QSupdated resume 2016 - QS
updated resume 2016 - QS
 
Real madrid
Real madrid Real madrid
Real madrid
 
Buscadores
Buscadores Buscadores
Buscadores
 
TERRENOS DE PLANTACIONES DE PIÑA
TERRENOS DE PLANTACIONES DE PIÑATERRENOS DE PLANTACIONES DE PIÑA
TERRENOS DE PLANTACIONES DE PIÑA
 
Situations
SituationsSituations
Situations
 
Expressions pour utiliser en cours de français2
Expressions pour utiliser en cours de français2Expressions pour utiliser en cours de français2
Expressions pour utiliser en cours de français2
 
la voix passive - B2 vp eoi
la voix passive - B2 vp eoila voix passive - B2 vp eoi
la voix passive - B2 vp eoi
 
Integrated Communications Summer Internship 2016
Integrated Communications Summer Internship 2016Integrated Communications Summer Internship 2016
Integrated Communications Summer Internship 2016
 
Lomce datorkigu /Lo que nos viene con la Lomce
Lomce datorkigu /Lo que nos viene con la LomceLomce datorkigu /Lo que nos viene con la Lomce
Lomce datorkigu /Lo que nos viene con la Lomce
 
Digital Ethnography: New Ways of Knowing Ourselves and Our Culture
Digital Ethnography: New Ways of Knowing Ourselves and Our CultureDigital Ethnography: New Ways of Knowing Ourselves and Our Culture
Digital Ethnography: New Ways of Knowing Ourselves and Our Culture
 
Digital Strategy Recommendations Written Proposal - Allo Communications
Digital Strategy Recommendations Written Proposal - Allo CommunicationsDigital Strategy Recommendations Written Proposal - Allo Communications
Digital Strategy Recommendations Written Proposal - Allo Communications
 
la voix passive - B2 vp eoi
la voix passive - B2 vp eoila voix passive - B2 vp eoi
la voix passive - B2 vp eoi
 
Argan oil liquid gold with magical benefits
Argan oil  liquid gold with magical benefitsArgan oil  liquid gold with magical benefits
Argan oil liquid gold with magical benefits
 
PLC-SCADA and automation
PLC-SCADA and automationPLC-SCADA and automation
PLC-SCADA and automation
 
PLC and SCADA in Industrial Automation
PLC and SCADA in Industrial AutomationPLC and SCADA in Industrial Automation
PLC and SCADA in Industrial Automation
 
Slideshare.Com Powerpoint
Slideshare.Com PowerpointSlideshare.Com Powerpoint
Slideshare.Com Powerpoint
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Similar to Combining Hadoop RDBMS for Large-Scale Big Data Analytics

Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-Overview
Harry Frost
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
Kognitio
 

Similar to Combining Hadoop RDBMS for Large-Scale Big Data Analytics (20)

Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Data
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQL
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-HadoopHP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
 
Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-Overview
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
 
4AA6-4492ENW
4AA6-4492ENW4AA6-4492ENW
4AA6-4492ENW
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
VMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware HadoopVMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware Hadoop
 

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Combining Hadoop RDBMS for Large-Scale Big Data Analytics

  • 1. COMBINING HADOOP & VERTICA FOR LARGE SCALE ANALYTICS Hadoop Summit 2012 Shilpa Lawande VP Engineering, Vertica, an HP Company 1 ©2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
  • 2. The Big Data Problem 2
  • 3. The (popular) Solution Vertica’s Real-time Analytics + Scale & Flexibility of Hadoop 3
  • 4. What is Vertica? Speed •  SQL Database for Real-time Analytics Services Cloud •  Runs on x86 hardware •  MPP Columnar Architecture – scales to PBs! Monetize Real Time •  Extensible analytics capabilities •  Easy to setup and use •  Elastic - grow/shrink as needed Better Decisions Statistics Mobile Individual Analysis Simplicity 4
  • 5. What Analytics can Vertica do? SQL Extended SQL •  Window functions •  Sessionization SDKs •  Graph •  Time series •  C++ •  Monte Carlo •  Pattern •  R matching •  Statistical •  Event series •  Geospatial joins 5 Check out: https://github.com/vertica/Vertica-Extension-Packages
  • 6. Who uses Vertica? 600+ Customers worldwide “… by partnering with Vertica we’re able to provide operators the tools they need to confidently interpret customer experience…” Steve Kish, Director, Product Management Empirix “…being able to run social graph analysis on tables with tens of billions of rows with a fast turn around is amazing…” Dan McCaffrey, Director of Analytics, Zynga 6
  • 7. What's different – Hadoop vs Vertica? Vertica Hadoop Both •  Designed for Purpose-built •  Designed for Performance Scalable Fault-tolerance •  SQL Analytics •  Map-Reduce •  Interactive Platforms •  Batch Analytics Analytics Read: http://www.vertica.com/2011/09/21/counting-triangles/ 7
  • 8. Getting the best of both worlds! SQL/ Extensions     In C++, R ODBC/      Ver%ca     JDBC        Engine   External Tables Native User-defined Loads Ver%ca     Storage   8 Hadoop/MR Connector New in Vertica 6
  • 9. Joint Use Cases Hadoop for ETL, Vertica for Analytics •  Logparsing / tagging / filtering •  Convert JSON into relational tuples HDFS for data storage, Vertica + Hadoop for Analytics •  Real-timeanalytics on Vertica (needs speed) •  Long-running / exploratory analytics on Hadoop (needs fault tolerance) •  Load from HDFS directly to Vertica (needs Vertica 6) •  SQL access to HDFS (needs Vertica 6) Vertica for data storage, Hadoop as a multi-purpose tool •  Hadoop as a scheduler / load-balancer •  Hadoop to convert to formats for other tools (e.g. STATA) •  Hadoop for Backup via Sqoop 9
  • 11. Accelerating Drug Discovery The solution •  Analyzing gene •  Queries went from 5 variants using SNPs hours to 5 minutes and Microarray data •  Hadoop to find the variants •  Scale to 100s of TB of between a sample sequence data and a reference genome •  More experiments => •  Vertica to determine oncology faster discoveries! targets •  Tools: Pipeline Pilot, Spotfire, R The problem The value 11
  • 12. Digital Consumer Insights •  HDFS to store raw •  Vertica to store & Faster insights input behavioral data operationalize high delivered more •  Hadoop / MR to value biz data consistently with less find conversions •  Reporting & administrative (regexp processing) analytics via Tableau overhead, and and R cheaper hardware!! •  Custom ETL 12
  • 13. On a Privacy Assurance Mission Collect user Use MR to Use Vertica to privacy reporting process and analyze stats for requests into structure the data every 3rd party HDFS into Vertica (ETL) tag on a website. For Consumers: For Advertisers: Provide greater transparency to end-users (look for on an a A free browser plugin that can tell you who’s tracking Understand impact of 3rd party tags on you! website performance 13
  • 14. Social Video Social Video Analytics Social Video Advertising ▫  Video analytics – 100+ Leading Pubs Hadoop for batch processing of logs and ETL into Vertica ▫ Campaign Measurement – 100+ major brands Vertica for ad-hoc analytics and interactive dashboards ▫ Industry-Wide Charts Redis KV store for serving low-latency data needs 100s of millions of events collected and processed daily on Petabyte scale infrastructure! 14
  • 15. Try Vertica for free! Community Edition Up to 1 TB limit, 3 nodes! Check out Vertica Extensions on Github! 15
  • 16. References and Other Info … Website: www.vertica.com Community Edition: http://www.vertica.com/community/ Github: https://github.com/vertica/Vertica-Extension-Packages Questions or Comments: shilpa@vertica.com Jobs: resumes@vertica.com (Awesome new location in Cambridge, MA!) Follow us on Twitter: @slawande, @verticacorp 16
  • 17. Sessions will resume at 2:25pm Page 17