SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3
5. Juni
                                             2012



•  Why Hadoop and HBase?                      2


•  Social Media Monitoring
   •  Prospective Search and Coprocessors
•  Challenges & Lessons Learned
•  Resources to get started




Agenda
5. Juni
                                          2012


           Software Architect              3

           @ sentric

           Co-founder and
           organizer of the
           Swiss HUG

           Contact:
            christian.guegi@sentric.ch
            http://www.sentric.ch
            @chrisgugi




About me
5. Juni
                                            2012



•  Spin-off of MeMo News AG, the              4

   leading provider for Social Media
   Monitoring & Analytics in Switzerland
•  Big Data expert, focused on Hadoop,
   HBase and Solr
•  Objective: Transforming data into
   insights




About sentric
CC 2.0 by Pete Reed | h"p://flic.kr/p/KS9kf	
  
5. Juni
                                                                       2012


                                                                        6




     Information        Information     Analysis &        Insight
      Gathering          Processing   Interpretation   Presentation




Why Hadoop and HBase?

Social Media Monitoring Process
5. Juni
                                                                        2012


                                                                         7
                                          Cost
                                        effective




                                                              High


                                 SMM
                   Reliable
                                                            scalable




                          Analytical
                                                   RT Alerting
                         capabilities




Why Hadoop and HBase?

Requirements
5. Juni
                                                      2012


                                                       8

Storage                      HBase /HDFS



Search                           Solr



Analytics               Hadoop              Mahout



Event mechanism (MQ)         HBase RowLog



Real-time alerting         Prospective search



Why Hadoop and HBase?

Technology Stack
CC 2.0 by nolifebeforecoffee | http://flic.kr/p/c1UTf
5. Juni
                                                                                  2012


                                                                                 10
Downloaded Articles




                                   match?


Search Agents




Output

                          Web-UI     Reports   RT Alerts

                                                  Icons by http://dryicons.com

Social Media Monitoring

Overview
5. Juni
                                                                                2012


                                                                                11
                                            n Crawler




                            REST
                                               HBase

                                   RowLog     Coprocessor
                   Web-UI




                   MySQL      Solr             RT Alerts

                                                Icons by http://dryicons.com

Social Media Monitoring

Solution Architecture
5. Juni
                                            2012



•  Inspired by Google Bigtable              12

   coprocessors
•  HBase version 0.92
•  Embed code directly into server
   processes
•  High-level call interface for clients
•  Automatic scaling, load balancing,
   request routing


Short Primer on Coprocessors

Overview
5. Juni
                                                             2012



•  Like a database trigger                                   13

      •  Provides event based hooks
•  Concrete Implementations
      •  RegionObserver
           •    CRUD or DML type operations
      •  MasterObserver
           •    DDL or metadata operations and cluster
                administration
      •  WALObserver
           •    Write-ahead-log appending and restoration


Short Primer on Coprocessors

Observer Classes
5. Juni
                                                                            2012


                                                                            14
    Client:Get()



        CP1:preGet()           CP2:preGet()     CP3:preGet()



                               Hregion:Get()


            CP1:postGet()      CP2:postGet()   CP3:postGet()

     RegionServer


                                                         client response




Short Primer on Coprocessors

Observer Execution
5. Juni
                                                    2012



•  Comparable to stored procedures                  15

      •  Custom RPC protocol, used between
         client and region server
•  Loaded in region server
•  Client call APIs over single row or a
   row range
      •  Framework translates row keys to region
         location
      •  Parallel execution

Short Primer on Coprocessors

Endpoint Classes
5. Juni
                                                                                 2012


                                                                                 16
 Client code

      Batch.Call<CountProtocol,int>      Region Server 1

        int call(CountProtocol p) {
                                           table,,12345678      CountProtocol
             return p.getRowCount();
        }                          .
                                           table,bbb,12345678   CountProtocol

           HTable

           coprocessorExec()
                                         Region Server 2

                                           table,ccc,12345678   CountProtocol


                                           table,ddd,12345678   CountProtocol
   Map<byte[], Integer> countsByRegion




Short Primer on Coprocessors


Endpoint Call Routine
5. Juni
                                        2012



•  HBase Security (Version 0.94)        17


•  Aggregate operations avg(), sum()
      •  AggregatorProtocol
•  HBASE-3529: Embedded search




Short Primer on Coprocessors

Use Cases
5. Juni
                                                                                 2012


                                                                                18




                                           Processing

 Put operations




                             Prospective
                               Search
               HRegion                           RT Alerts
             HRegionServer

                                                 Icons by http://dryicons.com

Social Media Monitoring

Prospective Search with Coprocessors
5. Juni
                                           2012



•  Standard, virtualized test cluster:     19

   4RS/DN, 1HM, 1NN, 3ZK
•  Test dataset created from 2h of live
   index (1GB)
•  Drive load on RS/DN




Social Media Monitoring

Testing Setup
5. Juni
                                                                     2012

             1800                                                   20
             1600

             1400

             1200
Writes/sec




             1000

             800

             600

             400

             200

               0
                    0    10    50       100       200   400   800

                                    # of agents



     Social Media Monitoring

     Test Results
CC 2.0 by Sean Maurik | h"p://flic.kr/p/JUduu	
  
5. Juni
                                                    2012



•  Everyone is still learning                      22


•  Some issues only appear at scale
•  Production cluster configuration
      •  Hardware issues
      •  Tuning cluster configuration to our work
         loads
•  HBase stability
•  Monitoring health of HBase


Challenges & Lessons Learned

Challenges
5. Juni
                                            2012



•  Be careful with expensive operations    23

   in coprocessors
•  At scale, nothing works as advertised
•  Monitoring/Operational tooling is
   most important
•  Play with all the configurations and
   benchmark for tuning



Challenges & Lessons Learned

Lessons
5. Juni
                                             2012



•  https://blogs.apache.org/hbase/          24

   entry/coprocessor_introduction
•  http://hbase.apache.org/apidocs/
   index.html
•  http://www.lilyproject.org/lily/about/
   playground/hbaserowlog.html
•  http://www.github.com/sentric/
   HBasePS



Resources to get started
5. Juni
                                              2012


                                             25




                        Questions?
                       Christian Gügi
                christian.guegi@sentric.ch



Berlin Buzzwords 2012

Thank you!

Contenu connexe

En vedette

Cited Reference Searching
Cited Reference SearchingCited Reference Searching
Cited Reference SearchingSCULibrarian
 
Intelligent web crawling
Intelligent web crawlingIntelligent web crawling
Intelligent web crawlingDenis Shestakov
 
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...Victor Giannakouris
 
Towards a Semantic Citation Index for the German Social Sciences
Towards a Semantic Citation Index for the German Social SciencesTowards a Semantic Citation Index for the German Social Sciences
Towards a Semantic Citation Index for the German Social SciencesGESIS
 
How to build your own citation index
How to build your own citation indexHow to build your own citation index
How to build your own citation indexGESIS
 
Strategies for Reducing Red Meat and Dairy Consumption in the UK WWF Imperial
Strategies for Reducing Red Meat and Dairy Consumption in the UK WWF ImperialStrategies for Reducing Red Meat and Dairy Consumption in the UK WWF Imperial
Strategies for Reducing Red Meat and Dairy Consumption in the UK WWF Imperialakleanthous
 
Break Down the Content Barriers of Social Networks
Break Down the Content Barriers of Social NetworksBreak Down the Content Barriers of Social Networks
Break Down the Content Barriers of Social NetworksTania Kasongo
 

En vedette (12)

Emerging sources citation index (esci)
Emerging sources citation index (esci)Emerging sources citation index (esci)
Emerging sources citation index (esci)
 
Cited Reference Searching
Cited Reference SearchingCited Reference Searching
Cited Reference Searching
 
Intelligent web crawling
Intelligent web crawlingIntelligent web crawling
Intelligent web crawling
 
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
 
Towards a Semantic Citation Index for the German Social Sciences
Towards a Semantic Citation Index for the German Social SciencesTowards a Semantic Citation Index for the German Social Sciences
Towards a Semantic Citation Index for the German Social Sciences
 
How to build your own citation index
How to build your own citation indexHow to build your own citation index
How to build your own citation index
 
Monitoring and Log Management for
Monitoring and Log Management forMonitoring and Log Management for
Monitoring and Log Management for
 
Clase 5: Analitica Web
Clase 5: Analitica WebClase 5: Analitica Web
Clase 5: Analitica Web
 
Heeringa brieven, rond 1830
Heeringa brieven, rond 1830Heeringa brieven, rond 1830
Heeringa brieven, rond 1830
 
Strategies for Reducing Red Meat and Dairy Consumption in the UK WWF Imperial
Strategies for Reducing Red Meat and Dairy Consumption in the UK WWF ImperialStrategies for Reducing Red Meat and Dairy Consumption in the UK WWF Imperial
Strategies for Reducing Red Meat and Dairy Consumption in the UK WWF Imperial
 
Break Down the Content Barriers of Social Networks
Break Down the Content Barriers of Social NetworksBreak Down the Content Barriers of Social Networks
Break Down the Content Barriers of Social Networks
 
24 John Meat to Eat
24 John Meat to Eat24 John Meat to Eat
24 John Meat to Eat
 

Similaire à Hadoop and HBase for Social Media Monitoring

Near Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBaseNear Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBaseChristian Gügi
 
Top 10 Web and HTML5 Predictions for 2013
Top 10 Web and HTML5 Predictions for 2013Top 10 Web and HTML5 Predictions for 2013
Top 10 Web and HTML5 Predictions for 2013Jonathan Jeon
 
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.OW2
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
The Big Picture on Big Data and Cognos
The Big Picture on Big Data and CognosThe Big Picture on Big Data and Cognos
The Big Picture on Big Data and CognosSenturus
 
Hw09 Data Processing In The Enterprise
Hw09   Data Processing In The EnterpriseHw09   Data Processing In The Enterprise
Hw09 Data Processing In The EnterpriseCloudera, Inc.
 
Standard Issue: Preparing for the Future of Data Management
Standard Issue: Preparing for the Future of Data ManagementStandard Issue: Preparing for the Future of Data Management
Standard Issue: Preparing for the Future of Data ManagementInside Analysis
 
Bay Area Hadoop User Group
Bay Area Hadoop User GroupBay Area Hadoop User Group
Bay Area Hadoop User GroupPentaho
 
Pankaj Resume for Hadoop,Java,J2EE - Outside World
Pankaj Resume for Hadoop,Java,J2EE -  Outside WorldPankaj Resume for Hadoop,Java,J2EE -  Outside World
Pankaj Resume for Hadoop,Java,J2EE - Outside WorldPankaj Kumar
 
The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012m_hepburn
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud
 
Automated BI Modernizations
Automated BI ModernizationsAutomated BI Modernizations
Automated BI Modernizationsdlautzenheiser
 
Big_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperBig_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperScott Gray
 
Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...
Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...
Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...Jesse Cravens
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranJAX London
 

Similaire à Hadoop and HBase for Social Media Monitoring (20)

Near Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBaseNear Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBase
 
Geode Meetup Apachecon
Geode Meetup ApacheconGeode Meetup Apachecon
Geode Meetup Apachecon
 
Top 10 Web and HTML5 Predictions for 2013
Top 10 Web and HTML5 Predictions for 2013Top 10 Web and HTML5 Predictions for 2013
Top 10 Web and HTML5 Predictions for 2013
 
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
The Big Picture on Big Data and Cognos
The Big Picture on Big Data and CognosThe Big Picture on Big Data and Cognos
The Big Picture on Big Data and Cognos
 
Hw09 Data Processing In The Enterprise
Hw09   Data Processing In The EnterpriseHw09   Data Processing In The Enterprise
Hw09 Data Processing In The Enterprise
 
Standard Issue: Preparing for the Future of Data Management
Standard Issue: Preparing for the Future of Data ManagementStandard Issue: Preparing for the Future of Data Management
Standard Issue: Preparing for the Future of Data Management
 
Bay Area Hadoop User Group
Bay Area Hadoop User GroupBay Area Hadoop User Group
Bay Area Hadoop User Group
 
Pankaj Resume for Hadoop,Java,J2EE - Outside World
Pankaj Resume for Hadoop,Java,J2EE -  Outside WorldPankaj Resume for Hadoop,Java,J2EE -  Outside World
Pankaj Resume for Hadoop,Java,J2EE - Outside World
 
The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
 
Automated BI Modernizations
Automated BI ModernizationsAutomated BI Modernizations
Automated BI Modernizations
 
Resume_Karthick
Resume_KarthickResume_Karthick
Resume_Karthick
 
Anoop Saxena
Anoop SaxenaAnoop Saxena
Anoop Saxena
 
Big_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperBig_SQL_3.0_Whitepaper
Big_SQL_3.0_Whitepaper
 
Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...
Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...
Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...
 
Madhava_Sr_JAVA_J2EE
Madhava_Sr_JAVA_J2EEMadhava_Sr_JAVA_J2EE
Madhava_Sr_JAVA_J2EE
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 

Plus de Christian Gügi

Real-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsReal-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsChristian Gügi
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data PipelinesChristian Gügi
 
Case Study: In-Store Analysis
Case Study: In-Store AnalysisCase Study: In-Store Analysis
Case Study: In-Store AnalysisChristian Gügi
 
Apache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data storeApache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data storeChristian Gügi
 
Apachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowApachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowChristian Gügi
 
Online Media Data Stream Processing with Kafka
Online Media Data Stream Processing with KafkaOnline Media Data Stream Processing with Kafka
Online Media Data Stream Processing with KafkaChristian Gügi
 

Plus de Christian Gügi (6)

Real-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsReal-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment Transactions
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data Pipelines
 
Case Study: In-Store Analysis
Case Study: In-Store AnalysisCase Study: In-Store Analysis
Case Study: In-Store Analysis
 
Apache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data storeApache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data store
 
Apachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowApachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to know
 
Online Media Data Stream Processing with Kafka
Online Media Data Stream Processing with KafkaOnline Media Data Stream Processing with Kafka
Online Media Data Stream Processing with Kafka
 

Dernier

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Dernier (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Hadoop and HBase for Social Media Monitoring

  • 1. CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3
  • 2. 5. Juni 2012 •  Why Hadoop and HBase? 2 •  Social Media Monitoring •  Prospective Search and Coprocessors •  Challenges & Lessons Learned •  Resources to get started Agenda
  • 3. 5. Juni 2012 Software Architect 3 @ sentric Co-founder and organizer of the Swiss HUG Contact: christian.guegi@sentric.ch http://www.sentric.ch @chrisgugi About me
  • 4. 5. Juni 2012 •  Spin-off of MeMo News AG, the 4 leading provider for Social Media Monitoring & Analytics in Switzerland •  Big Data expert, focused on Hadoop, HBase and Solr •  Objective: Transforming data into insights About sentric
  • 5. CC 2.0 by Pete Reed | h"p://flic.kr/p/KS9kf  
  • 6. 5. Juni 2012 6 Information Information Analysis & Insight Gathering Processing Interpretation Presentation Why Hadoop and HBase? Social Media Monitoring Process
  • 7. 5. Juni 2012 7 Cost effective High SMM Reliable scalable Analytical RT Alerting capabilities Why Hadoop and HBase? Requirements
  • 8. 5. Juni 2012 8 Storage HBase /HDFS Search Solr Analytics Hadoop Mahout Event mechanism (MQ) HBase RowLog Real-time alerting Prospective search Why Hadoop and HBase? Technology Stack
  • 9. CC 2.0 by nolifebeforecoffee | http://flic.kr/p/c1UTf
  • 10. 5. Juni 2012 10 Downloaded Articles match? Search Agents Output Web-UI Reports RT Alerts Icons by http://dryicons.com Social Media Monitoring Overview
  • 11. 5. Juni 2012 11 n Crawler REST HBase RowLog Coprocessor Web-UI MySQL Solr RT Alerts Icons by http://dryicons.com Social Media Monitoring Solution Architecture
  • 12. 5. Juni 2012 •  Inspired by Google Bigtable 12 coprocessors •  HBase version 0.92 •  Embed code directly into server processes •  High-level call interface for clients •  Automatic scaling, load balancing, request routing Short Primer on Coprocessors Overview
  • 13. 5. Juni 2012 •  Like a database trigger 13 •  Provides event based hooks •  Concrete Implementations •  RegionObserver •  CRUD or DML type operations •  MasterObserver •  DDL or metadata operations and cluster administration •  WALObserver •  Write-ahead-log appending and restoration Short Primer on Coprocessors Observer Classes
  • 14. 5. Juni 2012 14 Client:Get() CP1:preGet() CP2:preGet() CP3:preGet() Hregion:Get() CP1:postGet() CP2:postGet() CP3:postGet() RegionServer client response Short Primer on Coprocessors Observer Execution
  • 15. 5. Juni 2012 •  Comparable to stored procedures 15 •  Custom RPC protocol, used between client and region server •  Loaded in region server •  Client call APIs over single row or a row range •  Framework translates row keys to region location •  Parallel execution Short Primer on Coprocessors Endpoint Classes
  • 16. 5. Juni 2012 16 Client code Batch.Call<CountProtocol,int> Region Server 1 int call(CountProtocol p) { table,,12345678 CountProtocol return p.getRowCount(); } . table,bbb,12345678 CountProtocol HTable coprocessorExec() Region Server 2 table,ccc,12345678 CountProtocol table,ddd,12345678 CountProtocol Map<byte[], Integer> countsByRegion Short Primer on Coprocessors Endpoint Call Routine
  • 17. 5. Juni 2012 •  HBase Security (Version 0.94) 17 •  Aggregate operations avg(), sum() •  AggregatorProtocol •  HBASE-3529: Embedded search Short Primer on Coprocessors Use Cases
  • 18. 5. Juni 2012 18 Processing Put operations Prospective Search HRegion RT Alerts HRegionServer Icons by http://dryicons.com Social Media Monitoring Prospective Search with Coprocessors
  • 19. 5. Juni 2012 •  Standard, virtualized test cluster: 19 4RS/DN, 1HM, 1NN, 3ZK •  Test dataset created from 2h of live index (1GB) •  Drive load on RS/DN Social Media Monitoring Testing Setup
  • 20. 5. Juni 2012 1800 20 1600 1400 1200 Writes/sec 1000 800 600 400 200 0 0 10 50 100 200 400 800 # of agents Social Media Monitoring Test Results
  • 21. CC 2.0 by Sean Maurik | h"p://flic.kr/p/JUduu  
  • 22. 5. Juni 2012 •  Everyone is still learning 22 •  Some issues only appear at scale •  Production cluster configuration •  Hardware issues •  Tuning cluster configuration to our work loads •  HBase stability •  Monitoring health of HBase Challenges & Lessons Learned Challenges
  • 23. 5. Juni 2012 •  Be careful with expensive operations 23 in coprocessors •  At scale, nothing works as advertised •  Monitoring/Operational tooling is most important •  Play with all the configurations and benchmark for tuning Challenges & Lessons Learned Lessons
  • 24. 5. Juni 2012 •  https://blogs.apache.org/hbase/ 24 entry/coprocessor_introduction •  http://hbase.apache.org/apidocs/ index.html •  http://www.lilyproject.org/lily/about/ playground/hbaserowlog.html •  http://www.github.com/sentric/ HBasePS Resources to get started
  • 25. 5. Juni 2012 25 Questions? Christian Gügi christian.guegi@sentric.ch Berlin Buzzwords 2012 Thank you!