SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
Hadoop For OpenStack Log Analysis
Havana Summit, April 16th, 2013,

Mike Pittaro
Principal Architect, Big Data Solutions


@pmikeyp            Freenode: mikeyp
michael_pittaro@dell.com
The Problem : Operating OpenStack at Scale




2                                     Revolutionary Cloud Team
The Search for the Holy Grail of OpenStack Operations

    • Imagine if we could follow a request …
       –   Through the entire system …
       –   Across compute, storage, network …
       –   Independent of physical nodes …
       –   With timestamps …
       –   Correlated with events outside OpenStack …




3                                                       Revolutionary Cloud Team
It’s Easy !




    Collect
              Analyze ?      Sleep !
     Logs




4                                      Revolutionary Cloud Team
OpenStack Log Analysis is a Big Data Problem

      Big Data is when the data itself is part of the
                       problem.


                         Volume
    • A large amount of data, growing at large rates
                        Velocity
    • The speed at which the data must be processed
                         Variety
      • The range of data types and data structure


5                                                       Revolutionary Cloud Team
Initial Focus and Scope of Our Efforts

                        • The Operators
                           – Assist in running OpenStack
                           – Not for tenants
                        • The Data
                           – Load all detail into Hadoop
                           – Extract and index significant fields
                           – Enable future analysis
                        • The Patterns
                           – What works
                           – What is repeatable across installs
                           – How can we collaborate



6                                          Revolutionary Cloud Team
Hadoop 101 – Simplified Block Diagram

                                             HUE               Mahout
                             Oozie                            (Machine
                           (Workflow)      (Web UI)           Learning)
    Sqoop

                                      Pig           Hive
                API’s
    Flume
                                    (Batch        (SQL-ish
                                  Language)        query)
    JDBC

                                               MapReduce Distributed
                        HBase Database
                                                   Processing

                             HDFS Distributed Storage


7                                                       Revolutionary Cloud Team
The Big Pieces

    • Log Collection
       – Continuous streaming of log data into Hadoop from OpenStack
    • Intelligent Log parsing, Indexing and Search
          – Should ‘know’ about OpenStack

    • Well Defined Storage Organization
          – Defined schema for the data
          – Predefined queries for high level status - dashboard

    • Straightforward implementation pattern
          – Add as little complexity as possible

    • Ability to perform deeper analysis
          – Hadoop enables this



8                                                                  Revolutionary Cloud Team
OpenStack Log Analysis Block Diagram
           OpenStack Nodes                                                       Search

       Python
                        Syslog
       Logging
                                                                             Solr Cloud
                                        Query
               FlumeNG
                 Agent
                                               Pig          Hive
                                                                                Lucene
                                                                                Indexes

    Nagios +
    Ganglia          Avro
                                        Avro                                MapReduce
                  Sqoop                              HDFS                     Jobs
                  /Flume         Avro



9                                                                  Revolutionary Cloud Team
Current Development Status

     • Batch Only, no Flume Collection
     • Converting logs to AVRO format
     • First cut of schema in place
     • Loading into Hadoop
     • Processing into SOLR indexes
     • Starting to look at data
        – Solr Searches
        – Pig scripts




10                                                     Revolutionary Cloud Team
Schema Thoughts
     2013-03-26 11:57:41 WARNING nova.db.sqlalchemy.session

     [req-ace2ccc0-919e-4fd1-9f3a-671c0c87d28f None None] Got mysql server has gone
     away: (2006, 'MySQL server has gone away')
                                         {"namespace": "logfile.openstack",
                                          "type": "record",
                                          "name": "logentry",
                                          "fields": [
                                              {"name": "hostname", "type": "string"},
                                              {"name": "date", "type": "string"},
                                              {"name": "time",    "type": "string"},
                                              {"name": "level",    "type": "string"},
                                              {"name": "module",   "type": "string"},
                                              {"name": "request_id1",   "type": ["string", "null"]},
                                              {"name": "request_id2",   "type": ["string", "null"]},
                                              {"name": "request_id3",   "type": ["string", "null"]},
                                              {"name": "data",    "type": "string"}
                                          ]
                                         }


11                                                                        Revolutionary Cloud Team
Demo: Where we are today

     • Solr Indexing and Search
        – Example of indexed fields and searching.
     • Pig for batch analysis
        – Reconstruct a sequence of messages related to an API request




12                                                                 Revolutionary Cloud Team
Data Collection Thoughts

     • Sources
        –   OpenStack subsystems
        –   Syslog files
        –   Nagios and Ganglia
        –   General Infrastructure Data
             – Network Switches / Routers

     • Input Formats
        – Mostly Semi-structured text
        – Subsystem, timestamp, hostname, severity and error level are important
     • Output Formats
        – Avro
        – Thrift
        – Protocol Buffers

13                                                                  Revolutionary Cloud Team
Log Collection Thoughts

     • Well understood patterns
        – Evolving best practices
     • Commonly Used Tools
        – Kafka
        – Scribe
        – Flume and FlumeNG
     • Key Requirements
        –   Distributed
        –   Reliable
        –   Aggregators – consolidate streams
        –   Store and Forward – when links are down



14                                                      Revolutionary Cloud Team
Storage Organization Thoughts

     • File Organization within Hadoop
        – Naming Convention
        – Directory Organization


     • Data Lifecycle
        –   Input and Staging
        –   ‘Hot’ and ‘Cold’ Data
        –   Tiered Indexes
        –   Compression
        –   Archival and Deletion




15                                                   Revolutionary Cloud Team
What should we do next ?

     • Document the basic patterns so far ?
     • Are there any related efforts ?
     • Begin deeper discussions
        – Lots of decisions to make
        – Need community input and suggestions
     • Collaborate on Schema Design
     • What upstream OpenStack changes are needed ?
     • Get sample logs in hands of Hadoopians
        – Cleansed reference log sets would be useful



16                                                      Revolutionary Cloud Team
References

     • The unified logging infrastructure for data analytics at Twitter
     • Building LinkedIn’s Real-time Activity Data Pipeline
     • Advances and challenges in log analysis
     • BP: Ceilometer HBase Storage Backend
     • BP: Cross Service Request ID
     • Log Everything All the Time


     • Holy Grail, Gangam Style



17                                                                Revolutionary Cloud Team

Contenu connexe

Tendances

Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceDataWorks Summit
 
Spark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in JapanSpark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in JapanTaro L. Saito
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solrthelabdude
 
xPatterns on Spark, Tachyon and Mesos - Bucharest meetup
xPatterns on Spark, Tachyon and Mesos - Bucharest meetupxPatterns on Spark, Tachyon and Mesos - Bucharest meetup
xPatterns on Spark, Tachyon and Mesos - Bucharest meetupRadu Chilom
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Ryu Kobayashi
 
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...Lucidworks
 
A deeper-understanding-of-spark-internals
A deeper-understanding-of-spark-internalsA deeper-understanding-of-spark-internals
A deeper-understanding-of-spark-internalsCheng Min Chi
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화Henry Jeong
 
Apache cassandra in 2016
Apache cassandra in 2016Apache cassandra in 2016
Apache cassandra in 2016Duyhai Doan
 
Heuritech: Apache Spark REX
Heuritech: Apache Spark REXHeuritech: Apache Spark REX
Heuritech: Apache Spark REXdidmarin
 
Large Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceHortonworks
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsthelabdude
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
 
HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaTed Dunning
 
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Spark Summit
 
Intro to py spark (and cassandra)
Intro to py spark (and cassandra)Intro to py spark (and cassandra)
Intro to py spark (and cassandra)Jon Haddad
 
Spark Programming
Spark ProgrammingSpark Programming
Spark ProgrammingTaewook Eom
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraRobbie Strickland
 

Tendances (20)

Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter Experience
 
Spark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in JapanSpark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in Japan
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
 
Drill dchug-29 nov2012
Drill dchug-29 nov2012Drill dchug-29 nov2012
Drill dchug-29 nov2012
 
xPatterns on Spark, Tachyon and Mesos - Bucharest meetup
xPatterns on Spark, Tachyon and Mesos - Bucharest meetupxPatterns on Spark, Tachyon and Mesos - Bucharest meetup
xPatterns on Spark, Tachyon and Mesos - Bucharest meetup
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014
 
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
 
A deeper-understanding-of-spark-internals
A deeper-understanding-of-spark-internalsA deeper-understanding-of-spark-internals
A deeper-understanding-of-spark-internals
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
 
Apache cassandra in 2016
Apache cassandra in 2016Apache cassandra in 2016
Apache cassandra in 2016
 
Heuritech: Apache Spark REX
Heuritech: Apache Spark REXHeuritech: Apache Spark REX
Heuritech: Apache Spark REX
 
Zaharia spark-scala-days-2012
Zaharia spark-scala-days-2012Zaharia spark-scala-days-2012
Zaharia spark-scala-days-2012
 
Large Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduce
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with Katta
 
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
 
Intro to py spark (and cassandra)
Intro to py spark (and cassandra)Intro to py spark (and cassandra)
Intro to py spark (and cassandra)
 
Spark Programming
Spark ProgrammingSpark Programming
Spark Programming
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and Cassandra
 

En vedette

Hadoop on OpenStack
Hadoop on OpenStackHadoop on OpenStack
Hadoop on OpenStackSandeep Raju
 
Hadoop on OpenStack - Trove Day 2014
Hadoop on OpenStack - Trove Day 2014Hadoop on OpenStack - Trove Day 2014
Hadoop on OpenStack - Trove Day 2014Tesora
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Hortonworks
 
Hadoop and OpenStack - Hadoop Summit San Jose 2014
Hadoop and OpenStack - Hadoop Summit San Jose 2014Hadoop and OpenStack - Hadoop Summit San Jose 2014
Hadoop and OpenStack - Hadoop Summit San Jose 2014spinningmatt
 
Blue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentBlue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentOpenStack Foundation
 
Savanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStackSavanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStackMirantis
 
Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014spinningmatt
 

En vedette (8)

Hadoop on OpenStack
Hadoop on OpenStackHadoop on OpenStack
Hadoop on OpenStack
 
Hadoop on OpenStack - Trove Day 2014
Hadoop on OpenStack - Trove Day 2014Hadoop on OpenStack - Trove Day 2014
Hadoop on OpenStack - Trove Day 2014
 
2012 09-08-josug-jeff
2012 09-08-josug-jeff2012 09-08-josug-jeff
2012 09-08-josug-jeff
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 
Hadoop and OpenStack - Hadoop Summit San Jose 2014
Hadoop and OpenStack - Hadoop Summit San Jose 2014Hadoop and OpenStack - Hadoop Summit San Jose 2014
Hadoop and OpenStack - Hadoop Summit San Jose 2014
 
Blue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentBlue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environment
 
Savanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStackSavanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStack
 
Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014
 

Similaire à Hadoop For OpenStack Log Analysis

Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DBHeriyadi Janwar
 
From oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsFrom oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsGuy Harrison
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchMapR Technologies
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillDataWorks Summit
 
Big data hadoop training in pune course content advanto software
Big data hadoop training in pune course content advanto softwareBig data hadoop training in pune course content advanto software
Big data hadoop training in pune course content advanto softwareAdvanto Software
 
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoIntro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoMapR Technologies
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for HadoopJoe Crobak
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Datacwensel
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlySarah Guido
 
Swiss Big Data User Group - Introduction to Apache Drill
Swiss Big Data User Group - Introduction to Apache DrillSwiss Big Data User Group - Introduction to Apache Drill
Swiss Big Data User Group - Introduction to Apache DrillMapR Technologies
 
Upcoming services in OpenStack
Upcoming services in OpenStackUpcoming services in OpenStack
Upcoming services in OpenStackCisco DevNet
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyRohit Kulkarni
 
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014NoSQLmatters
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentationJoseph Adler
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemCloudera, Inc.
 

Similaire à Hadoop For OpenStack Log Analysis (20)

Data Science
Data ScienceData Science
Data Science
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
From oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsFrom oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other tools
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 March
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache Drill
 
Big data hadoop training in pune course content advanto software
Big data hadoop training in pune course content advanto softwareBig data hadoop training in pune course content advanto software
Big data hadoop training in pune course content advanto software
 
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoIntro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
 
Swiss Big Data User Group - Introduction to Apache Drill
Swiss Big Data User Group - Introduction to Apache DrillSwiss Big Data User Group - Introduction to Apache Drill
Swiss Big Data User Group - Introduction to Apache Drill
 
Upcoming services in OpenStack
Upcoming services in OpenStackUpcoming services in OpenStack
Upcoming services in OpenStack
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
 
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
 
Spark meetup TCHUG
Spark meetup TCHUGSpark meetup TCHUG
Spark meetup TCHUG
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentation
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
 

Plus de OpenStack Foundation

Sponsor Webinar - OpenStack Summit Vancouver 2018
Sponsor Webinar  - OpenStack Summit Vancouver 2018Sponsor Webinar  - OpenStack Summit Vancouver 2018
Sponsor Webinar - OpenStack Summit Vancouver 2018OpenStack Foundation
 
OpenStack Summits 101: A Guide For Attendees
OpenStack Summits 101: A Guide For AttendeesOpenStack Summits 101: A Guide For Attendees
OpenStack Summits 101: A Guide For AttendeesOpenStack Foundation
 
OpenStack Marketing Plan - Community Presentation
OpenStack Marketing Plan - Community PresentationOpenStack Marketing Plan - Community Presentation
OpenStack Marketing Plan - Community PresentationOpenStack Foundation
 
OpenStack 5th Birthday - User Group Parties
OpenStack 5th Birthday - User Group PartiesOpenStack 5th Birthday - User Group Parties
OpenStack 5th Birthday - User Group PartiesOpenStack Foundation
 
Liberty release: Preliminary marketing materials & messages
Liberty release: Preliminary marketing materials & messagesLiberty release: Preliminary marketing materials & messages
Liberty release: Preliminary marketing materials & messagesOpenStack Foundation
 
OpenStack Foundation 2H 2015 Marketing Plan
OpenStack Foundation 2H 2015 Marketing PlanOpenStack Foundation 2H 2015 Marketing Plan
OpenStack Foundation 2H 2015 Marketing PlanOpenStack Foundation
 
OpenStack Summit Tokyo Sponsor Webinar
OpenStack Summit Tokyo Sponsor Webinar OpenStack Summit Tokyo Sponsor Webinar
OpenStack Summit Tokyo Sponsor Webinar OpenStack Foundation
 
Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition OpenStack Foundation
 
Searchlight Updates - Liberty Edition
Searchlight Updates - Liberty EditionSearchlight Updates - Liberty Edition
Searchlight Updates - Liberty EditionOpenStack Foundation
 
Congress Updates - Liberty Edition
Congress Updates - Liberty EditionCongress Updates - Liberty Edition
Congress Updates - Liberty EditionOpenStack Foundation
 
Release Cycle Management Updates - Liberty Edition
Release Cycle Management Updates - Liberty EditionRelease Cycle Management Updates - Liberty Edition
Release Cycle Management Updates - Liberty EditionOpenStack Foundation
 
OpenStack Day CEE 2015: Real-World Use Cases
OpenStack Day CEE 2015: Real-World Use CasesOpenStack Day CEE 2015: Real-World Use Cases
OpenStack Day CEE 2015: Real-World Use CasesOpenStack Foundation
 

Plus de OpenStack Foundation (20)

Sponsor Webinar - OpenStack Summit Vancouver 2018
Sponsor Webinar  - OpenStack Summit Vancouver 2018Sponsor Webinar  - OpenStack Summit Vancouver 2018
Sponsor Webinar - OpenStack Summit Vancouver 2018
 
OpenStack Summits 101: A Guide For Attendees
OpenStack Summits 101: A Guide For AttendeesOpenStack Summits 101: A Guide For Attendees
OpenStack Summits 101: A Guide For Attendees
 
OpenStack Marketing Plan - Community Presentation
OpenStack Marketing Plan - Community PresentationOpenStack Marketing Plan - Community Presentation
OpenStack Marketing Plan - Community Presentation
 
OpenStack 5th Birthday - User Group Parties
OpenStack 5th Birthday - User Group PartiesOpenStack 5th Birthday - User Group Parties
OpenStack 5th Birthday - User Group Parties
 
Liberty release: Preliminary marketing materials & messages
Liberty release: Preliminary marketing materials & messagesLiberty release: Preliminary marketing materials & messages
Liberty release: Preliminary marketing materials & messages
 
OpenStack Foundation 2H 2015 Marketing Plan
OpenStack Foundation 2H 2015 Marketing PlanOpenStack Foundation 2H 2015 Marketing Plan
OpenStack Foundation 2H 2015 Marketing Plan
 
OpenStack Summit Tokyo Sponsor Webinar
OpenStack Summit Tokyo Sponsor Webinar OpenStack Summit Tokyo Sponsor Webinar
OpenStack Summit Tokyo Sponsor Webinar
 
Cinder Updates - Liberty Edition
Cinder Updates - Liberty Edition Cinder Updates - Liberty Edition
Cinder Updates - Liberty Edition
 
Glance Updates - Liberty Edition
Glance Updates - Liberty EditionGlance Updates - Liberty Edition
Glance Updates - Liberty Edition
 
Heat Updates - Liberty Edition
Heat Updates - Liberty EditionHeat Updates - Liberty Edition
Heat Updates - Liberty Edition
 
Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition
 
Nova Updates - Liberty Edition
Nova Updates - Liberty EditionNova Updates - Liberty Edition
Nova Updates - Liberty Edition
 
Sahara Updates - Liberty Edition
Sahara Updates - Liberty EditionSahara Updates - Liberty Edition
Sahara Updates - Liberty Edition
 
Searchlight Updates - Liberty Edition
Searchlight Updates - Liberty EditionSearchlight Updates - Liberty Edition
Searchlight Updates - Liberty Edition
 
Trove Updates - Liberty Edition
Trove Updates - Liberty EditionTrove Updates - Liberty Edition
Trove Updates - Liberty Edition
 
OpenStack: five years in
OpenStack: five years inOpenStack: five years in
OpenStack: five years in
 
Swift Updates - Liberty Edition
Swift Updates - Liberty EditionSwift Updates - Liberty Edition
Swift Updates - Liberty Edition
 
Congress Updates - Liberty Edition
Congress Updates - Liberty EditionCongress Updates - Liberty Edition
Congress Updates - Liberty Edition
 
Release Cycle Management Updates - Liberty Edition
Release Cycle Management Updates - Liberty EditionRelease Cycle Management Updates - Liberty Edition
Release Cycle Management Updates - Liberty Edition
 
OpenStack Day CEE 2015: Real-World Use Cases
OpenStack Day CEE 2015: Real-World Use CasesOpenStack Day CEE 2015: Real-World Use Cases
OpenStack Day CEE 2015: Real-World Use Cases
 

Dernier

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Hadoop For OpenStack Log Analysis

  • 1. Hadoop For OpenStack Log Analysis Havana Summit, April 16th, 2013, Mike Pittaro Principal Architect, Big Data Solutions @pmikeyp Freenode: mikeyp michael_pittaro@dell.com
  • 2. The Problem : Operating OpenStack at Scale 2 Revolutionary Cloud Team
  • 3. The Search for the Holy Grail of OpenStack Operations • Imagine if we could follow a request … – Through the entire system … – Across compute, storage, network … – Independent of physical nodes … – With timestamps … – Correlated with events outside OpenStack … 3 Revolutionary Cloud Team
  • 4. It’s Easy ! Collect Analyze ? Sleep ! Logs 4 Revolutionary Cloud Team
  • 5. OpenStack Log Analysis is a Big Data Problem Big Data is when the data itself is part of the problem. Volume • A large amount of data, growing at large rates Velocity • The speed at which the data must be processed Variety • The range of data types and data structure 5 Revolutionary Cloud Team
  • 6. Initial Focus and Scope of Our Efforts • The Operators – Assist in running OpenStack – Not for tenants • The Data – Load all detail into Hadoop – Extract and index significant fields – Enable future analysis • The Patterns – What works – What is repeatable across installs – How can we collaborate 6 Revolutionary Cloud Team
  • 7. Hadoop 101 – Simplified Block Diagram HUE Mahout Oozie (Machine (Workflow) (Web UI) Learning) Sqoop Pig Hive API’s Flume (Batch (SQL-ish Language) query) JDBC MapReduce Distributed HBase Database Processing HDFS Distributed Storage 7 Revolutionary Cloud Team
  • 8. The Big Pieces • Log Collection – Continuous streaming of log data into Hadoop from OpenStack • Intelligent Log parsing, Indexing and Search – Should ‘know’ about OpenStack • Well Defined Storage Organization – Defined schema for the data – Predefined queries for high level status - dashboard • Straightforward implementation pattern – Add as little complexity as possible • Ability to perform deeper analysis – Hadoop enables this 8 Revolutionary Cloud Team
  • 9. OpenStack Log Analysis Block Diagram OpenStack Nodes Search Python Syslog Logging Solr Cloud Query FlumeNG Agent Pig Hive Lucene Indexes Nagios + Ganglia Avro Avro MapReduce Sqoop HDFS Jobs /Flume Avro 9 Revolutionary Cloud Team
  • 10. Current Development Status • Batch Only, no Flume Collection • Converting logs to AVRO format • First cut of schema in place • Loading into Hadoop • Processing into SOLR indexes • Starting to look at data – Solr Searches – Pig scripts 10 Revolutionary Cloud Team
  • 11. Schema Thoughts 2013-03-26 11:57:41 WARNING nova.db.sqlalchemy.session [req-ace2ccc0-919e-4fd1-9f3a-671c0c87d28f None None] Got mysql server has gone away: (2006, 'MySQL server has gone away') {"namespace": "logfile.openstack", "type": "record", "name": "logentry", "fields": [ {"name": "hostname", "type": "string"}, {"name": "date", "type": "string"}, {"name": "time", "type": "string"}, {"name": "level", "type": "string"}, {"name": "module", "type": "string"}, {"name": "request_id1", "type": ["string", "null"]}, {"name": "request_id2", "type": ["string", "null"]}, {"name": "request_id3", "type": ["string", "null"]}, {"name": "data", "type": "string"} ] } 11 Revolutionary Cloud Team
  • 12. Demo: Where we are today • Solr Indexing and Search – Example of indexed fields and searching. • Pig for batch analysis – Reconstruct a sequence of messages related to an API request 12 Revolutionary Cloud Team
  • 13. Data Collection Thoughts • Sources – OpenStack subsystems – Syslog files – Nagios and Ganglia – General Infrastructure Data – Network Switches / Routers • Input Formats – Mostly Semi-structured text – Subsystem, timestamp, hostname, severity and error level are important • Output Formats – Avro – Thrift – Protocol Buffers 13 Revolutionary Cloud Team
  • 14. Log Collection Thoughts • Well understood patterns – Evolving best practices • Commonly Used Tools – Kafka – Scribe – Flume and FlumeNG • Key Requirements – Distributed – Reliable – Aggregators – consolidate streams – Store and Forward – when links are down 14 Revolutionary Cloud Team
  • 15. Storage Organization Thoughts • File Organization within Hadoop – Naming Convention – Directory Organization • Data Lifecycle – Input and Staging – ‘Hot’ and ‘Cold’ Data – Tiered Indexes – Compression – Archival and Deletion 15 Revolutionary Cloud Team
  • 16. What should we do next ? • Document the basic patterns so far ? • Are there any related efforts ? • Begin deeper discussions – Lots of decisions to make – Need community input and suggestions • Collaborate on Schema Design • What upstream OpenStack changes are needed ? • Get sample logs in hands of Hadoopians – Cleansed reference log sets would be useful 16 Revolutionary Cloud Team
  • 17. References • The unified logging infrastructure for data analytics at Twitter • Building LinkedIn’s Real-time Activity Data Pipeline • Advances and challenges in log analysis • BP: Ceilometer HBase Storage Backend • BP: Cross Service Request ID • Log Everything All the Time • Holy Grail, Gangam Style 17 Revolutionary Cloud Team