SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3
5. Juni
                                             2012



•  Why Hadoop and HBase?                      2


•  Social Media Monitoring
   •  Prospective Search and Coprocessors
•  Challenges & Lessons Learned
•  Resources to get started




Agenda
5. Juni
                                          2012


           Software Architect              3

           @ sentric

           Co-founder and
           organizer of the
           Swiss HUG

           Contact:
            christian.guegi@sentric.ch
            http://www.sentric.ch
            @chrisgugi




About me
5. Juni
                                            2012



•  Spin-off of MeMo News AG, the              4

   leading provider for Social Media
   Monitoring & Analytics in Switzerland
•  Big Data expert, focused on Hadoop,
   HBase and Solr
•  Objective: Transforming data into
   insights




About sentric
CC 2.0 by Pete Reed | h"p://flic.kr/p/KS9kf	
  
5. Juni
                                                                       2012


                                                                        6




     Information        Information     Analysis &        Insight
      Gathering          Processing   Interpretation   Presentation




Why Hadoop and HBase?

Social Media Monitoring Process
5. Juni
                                                                        2012


                                                                         7
                                          Cost
                                        effective




                                                              High


                                 SMM
                   Reliable
                                                            scalable




                          Analytical
                                                   RT Alerting
                         capabilities




Why Hadoop and HBase?

Requirements
5. Juni
                                                      2012


                                                       8

Storage                      HBase /HDFS



Search                           Solr



Analytics               Hadoop              Mahout



Event mechanism (MQ)         HBase RowLog



Real-time alerting         Prospective search



Why Hadoop and HBase?

Technology Stack
CC 2.0 by nolifebeforecoffee | http://flic.kr/p/c1UTf
5. Juni
                                                                                  2012


                                                                                 10
Downloaded Articles




                                   match?


Search Agents




Output

                          Web-UI     Reports   RT Alerts

                                                  Icons by http://dryicons.com

Social Media Monitoring

Overview
5. Juni
                                                                                2012


                                                                                11
                                            n Crawler




                            REST
                                               HBase

                                   RowLog     Coprocessor
                   Web-UI




                   MySQL      Solr             RT Alerts

                                                Icons by http://dryicons.com

Social Media Monitoring

Solution Architecture
5. Juni
                                            2012



•  Inspired by Google Bigtable              12

   coprocessors
•  HBase version 0.92
•  Embed code directly into server
   processes
•  High-level call interface for clients
•  Automatic scaling, load balancing,
   request routing


Short Primer on Coprocessors

Overview
5. Juni
                                                             2012



•  Like a database trigger                                   13

      •  Provides event based hooks
•  Concrete Implementations
      •  RegionObserver
           •    CRUD or DML type operations
      •  MasterObserver
           •    DDL or metadata operations and cluster
                administration
      •  WALObserver
           •    Write-ahead-log appending and restoration


Short Primer on Coprocessors

Observer Classes
5. Juni
                                                                            2012


                                                                            14
    Client:Get()



        CP1:preGet()           CP2:preGet()     CP3:preGet()



                               Hregion:Get()


            CP1:postGet()      CP2:postGet()   CP3:postGet()

     RegionServer


                                                         client response




Short Primer on Coprocessors

Observer Execution
5. Juni
                                                    2012



•  Comparable to stored procedures                  15

      •  Custom RPC protocol, used between
         client and region server
•  Loaded in region server
•  Client call APIs over single row or a
   row range
      •  Framework translates row keys to region
         location
      •  Parallel execution

Short Primer on Coprocessors

Endpoint Classes
5. Juni
                                                                                 2012


                                                                                 16
 Client code

      Batch.Call<CountProtocol,int>      Region Server 1

        int call(CountProtocol p) {
                                           table,,12345678      CountProtocol
             return p.getRowCount();
        }                          .
                                           table,bbb,12345678   CountProtocol

           HTable

           coprocessorExec()
                                         Region Server 2

                                           table,ccc,12345678   CountProtocol


                                           table,ddd,12345678   CountProtocol
   Map<byte[], Integer> countsByRegion




Short Primer on Coprocessors


Endpoint Call Routine
5. Juni
                                        2012



•  HBase Security (Version 0.94)        17


•  Aggregate operations avg(), sum()
      •  AggregatorProtocol
•  HBASE-3529: Embedded search




Short Primer on Coprocessors

Use Cases
5. Juni
                                                                                 2012


                                                                                18




                                           Processing

 Put operations




                             Prospective
                               Search
               HRegion                           RT Alerts
             HRegionServer

                                                 Icons by http://dryicons.com

Social Media Monitoring

Prospective Search with Coprocessors
5. Juni
                                           2012



•  Standard, virtualized test cluster:     19

   4RS/DN, 1HM, 1NN, 3ZK
•  Test dataset created from 2h of live
   index (1GB)
•  Drive load on RS/DN




Social Media Monitoring

Testing Setup
5. Juni
                                                                     2012

             1800                                                   20
             1600

             1400

             1200
Writes/sec




             1000

             800

             600

             400

             200

               0
                    0    10    50       100       200   400   800

                                    # of agents



     Social Media Monitoring

     Test Results
CC 2.0 by Sean Maurik | h"p://flic.kr/p/JUduu	
  
5. Juni
                                                    2012



•  Everyone is still learning                      22


•  Some issues only appear at scale
•  Production cluster configuration
      •  Hardware issues
      •  Tuning cluster configuration to our work
         loads
•  HBase stability
•  Monitoring health of HBase


Challenges & Lessons Learned

Challenges
5. Juni
                                            2012



•  Be careful with expensive operations    23

   in coprocessors
•  At scale, nothing works as advertised
•  Monitoring/Operational tooling is
   most important
•  Play with all the configurations and
   benchmark for tuning



Challenges & Lessons Learned

Lessons
5. Juni
                                             2012



•  https://blogs.apache.org/hbase/          24

   entry/coprocessor_introduction
•  http://hbase.apache.org/apidocs/
   index.html
•  http://www.lilyproject.org/lily/about/
   playground/hbaserowlog.html
•  http://www.github.com/sentric/
   HBasePS



Resources to get started
5. Juni
                                              2012


                                             25




                        Questions?
                       Christian Gügi
                christian.guegi@sentric.ch



Berlin Buzzwords 2012

Thank you!

Contenu connexe

En vedette

Cited Reference Searching
Cited Reference SearchingCited Reference Searching
Cited Reference SearchingSCULibrarian
 
Intelligent web crawling
Intelligent web crawlingIntelligent web crawling
Intelligent web crawlingDenis Shestakov
 
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...Victor Giannakouris
 
Towards a Semantic Citation Index for the German Social Sciences
Towards a Semantic Citation Index for the German Social SciencesTowards a Semantic Citation Index for the German Social Sciences
Towards a Semantic Citation Index for the German Social SciencesGESIS
 
How to build your own citation index
How to build your own citation indexHow to build your own citation index
How to build your own citation indexGESIS
 
Strategies for Reducing Red Meat and Dairy Consumption in the UK WWF Imperial
Strategies for Reducing Red Meat and Dairy Consumption in the UK WWF ImperialStrategies for Reducing Red Meat and Dairy Consumption in the UK WWF Imperial
Strategies for Reducing Red Meat and Dairy Consumption in the UK WWF Imperialakleanthous
 
Break Down the Content Barriers of Social Networks
Break Down the Content Barriers of Social NetworksBreak Down the Content Barriers of Social Networks
Break Down the Content Barriers of Social NetworksTania Kasongo
 

En vedette (12)

Emerging sources citation index (esci)
Emerging sources citation index (esci)Emerging sources citation index (esci)
Emerging sources citation index (esci)
 
Cited Reference Searching
Cited Reference SearchingCited Reference Searching
Cited Reference Searching
 
Intelligent web crawling
Intelligent web crawlingIntelligent web crawling
Intelligent web crawling
 
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
 
Towards a Semantic Citation Index for the German Social Sciences
Towards a Semantic Citation Index for the German Social SciencesTowards a Semantic Citation Index for the German Social Sciences
Towards a Semantic Citation Index for the German Social Sciences
 
How to build your own citation index
How to build your own citation indexHow to build your own citation index
How to build your own citation index
 
Monitoring and Log Management for
Monitoring and Log Management forMonitoring and Log Management for
Monitoring and Log Management for
 
Clase 5: Analitica Web
Clase 5: Analitica WebClase 5: Analitica Web
Clase 5: Analitica Web
 
Heeringa brieven, rond 1830
Heeringa brieven, rond 1830Heeringa brieven, rond 1830
Heeringa brieven, rond 1830
 
Strategies for Reducing Red Meat and Dairy Consumption in the UK WWF Imperial
Strategies for Reducing Red Meat and Dairy Consumption in the UK WWF ImperialStrategies for Reducing Red Meat and Dairy Consumption in the UK WWF Imperial
Strategies for Reducing Red Meat and Dairy Consumption in the UK WWF Imperial
 
Break Down the Content Barriers of Social Networks
Break Down the Content Barriers of Social NetworksBreak Down the Content Barriers of Social Networks
Break Down the Content Barriers of Social Networks
 
24 John Meat to Eat
24 John Meat to Eat24 John Meat to Eat
24 John Meat to Eat
 

Similaire à Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords - June 2012

Near Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBaseNear Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBaseChristian Gügi
 
Top 10 Web and HTML5 Predictions for 2013
Top 10 Web and HTML5 Predictions for 2013Top 10 Web and HTML5 Predictions for 2013
Top 10 Web and HTML5 Predictions for 2013Jonathan Jeon
 
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.OW2
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
The Big Picture on Big Data and Cognos
The Big Picture on Big Data and CognosThe Big Picture on Big Data and Cognos
The Big Picture on Big Data and CognosSenturus
 
Hw09 Data Processing In The Enterprise
Hw09   Data Processing In The EnterpriseHw09   Data Processing In The Enterprise
Hw09 Data Processing In The EnterpriseCloudera, Inc.
 
Standard Issue: Preparing for the Future of Data Management
Standard Issue: Preparing for the Future of Data ManagementStandard Issue: Preparing for the Future of Data Management
Standard Issue: Preparing for the Future of Data ManagementInside Analysis
 
Bay Area Hadoop User Group
Bay Area Hadoop User GroupBay Area Hadoop User Group
Bay Area Hadoop User GroupPentaho
 
Pankaj Resume for Hadoop,Java,J2EE - Outside World
Pankaj Resume for Hadoop,Java,J2EE -  Outside WorldPankaj Resume for Hadoop,Java,J2EE -  Outside World
Pankaj Resume for Hadoop,Java,J2EE - Outside WorldPankaj Kumar
 
The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012m_hepburn
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud
 
Automated BI Modernizations
Automated BI ModernizationsAutomated BI Modernizations
Automated BI Modernizationsdlautzenheiser
 
Big_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperBig_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperScott Gray
 
Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...
Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...
Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...Jesse Cravens
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranJAX London
 

Similaire à Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords - June 2012 (20)

Near Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBaseNear Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBase
 
Geode Meetup Apachecon
Geode Meetup ApacheconGeode Meetup Apachecon
Geode Meetup Apachecon
 
Top 10 Web and HTML5 Predictions for 2013
Top 10 Web and HTML5 Predictions for 2013Top 10 Web and HTML5 Predictions for 2013
Top 10 Web and HTML5 Predictions for 2013
 
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
The Big Picture on Big Data and Cognos
The Big Picture on Big Data and CognosThe Big Picture on Big Data and Cognos
The Big Picture on Big Data and Cognos
 
Hw09 Data Processing In The Enterprise
Hw09   Data Processing In The EnterpriseHw09   Data Processing In The Enterprise
Hw09 Data Processing In The Enterprise
 
Standard Issue: Preparing for the Future of Data Management
Standard Issue: Preparing for the Future of Data ManagementStandard Issue: Preparing for the Future of Data Management
Standard Issue: Preparing for the Future of Data Management
 
Bay Area Hadoop User Group
Bay Area Hadoop User GroupBay Area Hadoop User Group
Bay Area Hadoop User Group
 
Pankaj Resume for Hadoop,Java,J2EE - Outside World
Pankaj Resume for Hadoop,Java,J2EE -  Outside WorldPankaj Resume for Hadoop,Java,J2EE -  Outside World
Pankaj Resume for Hadoop,Java,J2EE - Outside World
 
The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
 
Automated BI Modernizations
Automated BI ModernizationsAutomated BI Modernizations
Automated BI Modernizations
 
Resume_Karthick
Resume_KarthickResume_Karthick
Resume_Karthick
 
Anoop Saxena
Anoop SaxenaAnoop Saxena
Anoop Saxena
 
Big_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperBig_SQL_3.0_Whitepaper
Big_SQL_3.0_Whitepaper
 
Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...
Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...
Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...
 
Madhava_Sr_JAVA_J2EE
Madhava_Sr_JAVA_J2EEMadhava_Sr_JAVA_J2EE
Madhava_Sr_JAVA_J2EE
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 

Plus de Christian Gügi

Real-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsReal-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsChristian Gügi
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data PipelinesChristian Gügi
 
Case Study: In-Store Analysis
Case Study: In-Store AnalysisCase Study: In-Store Analysis
Case Study: In-Store AnalysisChristian Gügi
 
Apache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data storeApache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data storeChristian Gügi
 
Apachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowApachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowChristian Gügi
 
Online Media Data Stream Processing with Kafka
Online Media Data Stream Processing with KafkaOnline Media Data Stream Processing with Kafka
Online Media Data Stream Processing with KafkaChristian Gügi
 

Plus de Christian Gügi (6)

Real-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsReal-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment Transactions
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data Pipelines
 
Case Study: In-Store Analysis
Case Study: In-Store AnalysisCase Study: In-Store Analysis
Case Study: In-Store Analysis
 
Apache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data storeApache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data store
 
Apachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowApachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to know
 
Online Media Data Stream Processing with Kafka
Online Media Data Stream Processing with KafkaOnline Media Data Stream Processing with Kafka
Online Media Data Stream Processing with Kafka
 

Dernier

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Dernier (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords - June 2012

  • 1. CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3
  • 2. 5. Juni 2012 •  Why Hadoop and HBase? 2 •  Social Media Monitoring •  Prospective Search and Coprocessors •  Challenges & Lessons Learned •  Resources to get started Agenda
  • 3. 5. Juni 2012 Software Architect 3 @ sentric Co-founder and organizer of the Swiss HUG Contact: christian.guegi@sentric.ch http://www.sentric.ch @chrisgugi About me
  • 4. 5. Juni 2012 •  Spin-off of MeMo News AG, the 4 leading provider for Social Media Monitoring & Analytics in Switzerland •  Big Data expert, focused on Hadoop, HBase and Solr •  Objective: Transforming data into insights About sentric
  • 5. CC 2.0 by Pete Reed | h"p://flic.kr/p/KS9kf  
  • 6. 5. Juni 2012 6 Information Information Analysis & Insight Gathering Processing Interpretation Presentation Why Hadoop and HBase? Social Media Monitoring Process
  • 7. 5. Juni 2012 7 Cost effective High SMM Reliable scalable Analytical RT Alerting capabilities Why Hadoop and HBase? Requirements
  • 8. 5. Juni 2012 8 Storage HBase /HDFS Search Solr Analytics Hadoop Mahout Event mechanism (MQ) HBase RowLog Real-time alerting Prospective search Why Hadoop and HBase? Technology Stack
  • 9. CC 2.0 by nolifebeforecoffee | http://flic.kr/p/c1UTf
  • 10. 5. Juni 2012 10 Downloaded Articles match? Search Agents Output Web-UI Reports RT Alerts Icons by http://dryicons.com Social Media Monitoring Overview
  • 11. 5. Juni 2012 11 n Crawler REST HBase RowLog Coprocessor Web-UI MySQL Solr RT Alerts Icons by http://dryicons.com Social Media Monitoring Solution Architecture
  • 12. 5. Juni 2012 •  Inspired by Google Bigtable 12 coprocessors •  HBase version 0.92 •  Embed code directly into server processes •  High-level call interface for clients •  Automatic scaling, load balancing, request routing Short Primer on Coprocessors Overview
  • 13. 5. Juni 2012 •  Like a database trigger 13 •  Provides event based hooks •  Concrete Implementations •  RegionObserver •  CRUD or DML type operations •  MasterObserver •  DDL or metadata operations and cluster administration •  WALObserver •  Write-ahead-log appending and restoration Short Primer on Coprocessors Observer Classes
  • 14. 5. Juni 2012 14 Client:Get() CP1:preGet() CP2:preGet() CP3:preGet() Hregion:Get() CP1:postGet() CP2:postGet() CP3:postGet() RegionServer client response Short Primer on Coprocessors Observer Execution
  • 15. 5. Juni 2012 •  Comparable to stored procedures 15 •  Custom RPC protocol, used between client and region server •  Loaded in region server •  Client call APIs over single row or a row range •  Framework translates row keys to region location •  Parallel execution Short Primer on Coprocessors Endpoint Classes
  • 16. 5. Juni 2012 16 Client code Batch.Call<CountProtocol,int> Region Server 1 int call(CountProtocol p) { table,,12345678 CountProtocol return p.getRowCount(); } . table,bbb,12345678 CountProtocol HTable coprocessorExec() Region Server 2 table,ccc,12345678 CountProtocol table,ddd,12345678 CountProtocol Map<byte[], Integer> countsByRegion Short Primer on Coprocessors Endpoint Call Routine
  • 17. 5. Juni 2012 •  HBase Security (Version 0.94) 17 •  Aggregate operations avg(), sum() •  AggregatorProtocol •  HBASE-3529: Embedded search Short Primer on Coprocessors Use Cases
  • 18. 5. Juni 2012 18 Processing Put operations Prospective Search HRegion RT Alerts HRegionServer Icons by http://dryicons.com Social Media Monitoring Prospective Search with Coprocessors
  • 19. 5. Juni 2012 •  Standard, virtualized test cluster: 19 4RS/DN, 1HM, 1NN, 3ZK •  Test dataset created from 2h of live index (1GB) •  Drive load on RS/DN Social Media Monitoring Testing Setup
  • 20. 5. Juni 2012 1800 20 1600 1400 1200 Writes/sec 1000 800 600 400 200 0 0 10 50 100 200 400 800 # of agents Social Media Monitoring Test Results
  • 21. CC 2.0 by Sean Maurik | h"p://flic.kr/p/JUduu  
  • 22. 5. Juni 2012 •  Everyone is still learning 22 •  Some issues only appear at scale •  Production cluster configuration •  Hardware issues •  Tuning cluster configuration to our work loads •  HBase stability •  Monitoring health of HBase Challenges & Lessons Learned Challenges
  • 23. 5. Juni 2012 •  Be careful with expensive operations 23 in coprocessors •  At scale, nothing works as advertised •  Monitoring/Operational tooling is most important •  Play with all the configurations and benchmark for tuning Challenges & Lessons Learned Lessons
  • 24. 5. Juni 2012 •  https://blogs.apache.org/hbase/ 24 entry/coprocessor_introduction •  http://hbase.apache.org/apidocs/ index.html •  http://www.lilyproject.org/lily/about/ playground/hbaserowlog.html •  http://www.github.com/sentric/ HBasePS Resources to get started
  • 25. 5. Juni 2012 25 Questions? Christian Gügi christian.guegi@sentric.ch Berlin Buzzwords 2012 Thank you!