SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
Analysis Farm:
               A Cloud-based Scalable Aggregation and
               Query Platform for Network Log Analysis

               Jianwen Wei,Yusu Zhao, Kaida Jiang, Rui Xie,Yaohui Jin
                        School of Electronic Information and Electrical Engineering, SJTU
                                     Network and Information Center, SJTU
                                          Shanghai Jiaotong University

                                       wei.jianwen@gmail.com
                                            Dec 12th, 2011

The 2011 International Workshop on Data Cloud (D-CLOUD 2011), Hong Kong
Outline

• Background
• Design and Implementation
• Experimental Results
• Summary
Outline

• Background
• Design and Implementation
• Experimental Results
• Summary
Background
Motivation: An Overview of SJTU Networks
                   • Serving 50,000 people
                   • 10Gb WDM, MPLS
                   • Network Monitoring
Background
Motivation: An Overview of SJTU Networks (cont.)
                      Applications such as BT etc. use
                      too much BORDER bandwidth!
Background
Deployment of the Network Log Analysis System

                                      6Gbps               ~3MBytes/s                  5000 per sec
                                                            Syslog
                                 Mirrored Traffic         (plain text)
                                  (Raw Data)
                    Border Router                   DPI                  Syslog Collector


                                                                                                     Analysis Farm
Background
                Network Log Analysis System




                  6Gbps               ~3MBytes/s                  5000 per sec
                                        Syslog
             Mirrored Traffic         (plain text)
              (Raw Data)
Border Router                   DPI                  Syslog Collector


                                                                                 Analysis Farm
Background
   Network Log Analysis System: Border Router


            • Handle all incoming and outgoing traffic
            • Connecting to multiple ISPs
            • Traffic at 6Gbps



                  6Gbps               ~3MBytes/s                  5000 per sec
                                        Syslog
             Mirrored Traffic         (plain text)
              (Raw Data)
Border Router                   DPI                  Syslog Collector


                                                                                 Analysis Farm
Background
     Network Log Analysis System: DPI Engine

• Input: 6Gbps raw network traffic
• Output: 3MBytes/s syslog messages
• Running on an x86 server
• Analyze every network session



                  6Gbps               ~3MBytes/s                  5000 per sec
                                        Syslog
             Mirrored Traffic         (plain text)
              (Raw Data)
Border Router                   DPI                  Syslog Collector


                                                                                 Analysis Farm
Background
 Network Log Analysis System: Syslog Collector

              • Java-written syslog collector
              • Running on a virtual machine
              • Insertion rate: 5000/s on average, 12000/s at peak




                  6Gbps               ~3MBytes/s                  5000 per sec
                                        Syslog
             Mirrored Traffic         (plain text)
              (Raw Data)
Border Router                   DPI                  Syslog Collector


                                                                                 Analysis Farm
Background
   Network Log Analysis System: Analysis Farm


                                                 • Store log
                                                 • Analyze log




                  6Gbps               ~3MBytes/s                  5000 per sec
                                        Syslog
             Mirrored Traffic         (plain text)
              (Raw Data)
Border Router                   DPI                  Syslog Collector


                                                                                 Analysis Farm
Background
            Log Analysis Tasks


• Aggregating
 • Get the overall usage of network border
• Querying
 • Inspect network activities
                     http.tcp
                     1320155721-1320155731
                     202.120.2.102:54285-8.8.4.4:80
                     374 24021
Background
             Log Analysis Tasks




400 million log records per day (350GByte) !
Background
        Research Challenges



• Storage Scalability
• Computation Scalability
• Query Agility
Background
                Related Work

• loggly.com
  • “Logging as a Service”
• Yottaa.com
  • Log-based Website performance analysis
• They use cloud-based solutions for
  scalability
Outline

• Background
• Design and Implementation
• Experimental Results
• Summary
Design and Implementation
Our Approach: Cloud Computing + NoSQL


• Cloud Computing
 • manageable, scalable, on demand resources
 • OpenStack open source toolset for building clouds
• NoSQL (Not Only SQL)
 • weaken ACID to improve performance
 • MongoDB document-oriented distributed database
Design and Implementation
The Architecture of Analysis Farm
                       Request                    Users




                   mongos         Configuration
                                     server

                                                          Application Layer

    mongod   mongod     mongod             mongod




     VM       VM             VM               VM          IaaS Layer




             Memory          iSCIS                        Hardware Resource
     CPU                                  Network
                            Storage                             Pool
Design and Implementation
      How we tackle the three challenges?

• Storage Scalability
  •    On line Storage Expansion

• Computation Scalability
  •    MongoDB Scale out

• Query Agility
  •    MongoDB Handles ad hoc queries effectively
Design and Implementation
    Address the Storage Scalability

On Line Storage Expansion

 1.The application servers ask the IaaS layer
  for more disk space.
 2.The IaaS layer asks the hardware resource
  pool to attach new block devices.
 3.The application servers execute on line
  filesystem expansion.
         No service interruption
Design and Implementation
      Address the Computation Scalability

MongoDB Scale out
 1.The IaaS provides a new
  server to the cluster.                                            MapReduce

 2.The MongoDB cluster                                               Request



  rebalances data automatically.                            combiner mongos




 No service interruption
                              mapper, combiner   mapper, combiner   mapper, combiner   mapper, combiner


                                 mongod             mongod             mongod            mongod
Design and Implementation
          Address the Query Agility


MongoDB handles ad hoc queries effectively

  • Expressive Data Model
  • Building Blocks for Compound Queries
  • Aggregating tools such as Group, MapReduce
  • Effective Optimization Methods, such as index
Outline

• Background
• Design and Implementation
• Experimental Results
• Summary
Experimental Results
       Aggregating and Querying




• Aggregating Log
• Ad hoc Querying
        SPEED is our primary focus.
Experimental Results
     Experimental Setup for Aggregating

• Method
 •   Aggregate 10-min log with MongoDB MapReduce

• Dataset
 • One day’s log records, ~400million records
• Configurations for Comparison
 •   1x farm: 4 mongod threads on a single server

 •   4x farm: 4 mongod threads on four servers

 •   8x farm: 8 mongod threads on eight servers
Experimental Results
       Experimental Results for Aggregating

                                              Rate
Type      Records Processed       Time
                                           (records/s)

1x            3201454             523s        6119

4x            3103742             200s       15568

8x            3317013             111s       29883

 Experimental Results for 10-minute Log Aggregating
Experimental Results
 Experimental Setup for ad hoc Querying
• Method
 • Execute ad hoc querying
• Dataset
 • One day’s log records, ~400million records
• Index
 • (start_t,     end_t, src_IP, dst_IP, app)

• Configuration for Analysis Farm
 •   8x farm: 8 mongod threads on eight servers
Experimental Results
Experimental Setup for ad hoc Querying (cont.)


  • Query Types
   • IP-initial Querysrc_IP == IP


   • IP-engaging Query    src_IP == IP OR dst_IP == IP


   • IP-pair Query IP-pair engaging AND app == HTTP


  • Time Scopes
   • 10 minutes, 30 minutes, 60 minutes
Experimental Results
   Experimental Results for IP-initial Query

                                             Rate
Time Scope Execution Time Records Scanned
                                          (records/s)

 10min        3.085s           227581          73770

 30min        8.816s           643259          72965

 60min        18.517s         1370443          73795

       Experimental Results for IP-initial Query
                   (src_IP == IP)
Experimental Results
 Experimental Results for IP-engaging Query

                                             Rate
Time Scope Execution Time Records Scanned
                                          (records/s)

 10min        18.012s        1234582         68542

 30min        54.708s        3673304         67144

 60min       119.034s        7912644         66474

      Experimental Results for IP-engaging Query
            (src_IP == IP OR dst_IP == IP)
Experimental Results
    Experimental Results for IP-pair Query

                                             Rate
Time Scope Execution Time Records Scanned
                                          (records/s)

 10min         5.670s          296772         52340

 30min         6.267s          324813         51829

 60min         19.327s        1027513         53165

         Experimental Results for IP-pair Query
         (the IP-pair engages AND app == http)
Outline

• Background
• Design and Implementation
• Experimental Results
• Summary
Summary

• Analysis Farm is built on OpenStack and
  MongoDB
• Analysis Farm is easy-to-manage and
  easy-to-scale-out
• Feasibility in aggregating and querying is
  verified
• We use Analysis Farm to analyze 400
  million, or 350GB log records every day
Acknowledgement


• 973 program and NFSC
• My partners in Shanghai Jiaotong Univ.
• Dr. Lin Gu in HKUST
• Workshop organizers and reviewers
Analysis Farm:
A Cloud-based Scalable Aggregation and Query Platform for Network Log Analysis
Shanghai Jiaotong University
    wei.jianwen@gmail.com        @JianwenWEI




                  Thank you!


      The 2011 International Workshop on Data Cloud (D-CLOUD 2011), Hong Kong

Contenu connexe

Dernier

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Dernier (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

En vedette

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 

En vedette (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

D-Cloud 2011 A Cloud-based Scalable Aggregation and Query Platform for Network Log Analysis

  • 1. Analysis Farm: A Cloud-based Scalable Aggregation and Query Platform for Network Log Analysis Jianwen Wei,Yusu Zhao, Kaida Jiang, Rui Xie,Yaohui Jin School of Electronic Information and Electrical Engineering, SJTU Network and Information Center, SJTU Shanghai Jiaotong University wei.jianwen@gmail.com Dec 12th, 2011 The 2011 International Workshop on Data Cloud (D-CLOUD 2011), Hong Kong
  • 2. Outline • Background • Design and Implementation • Experimental Results • Summary
  • 3. Outline • Background • Design and Implementation • Experimental Results • Summary
  • 4. Background Motivation: An Overview of SJTU Networks • Serving 50,000 people • 10Gb WDM, MPLS • Network Monitoring
  • 5. Background Motivation: An Overview of SJTU Networks (cont.) Applications such as BT etc. use too much BORDER bandwidth!
  • 6. Background Deployment of the Network Log Analysis System 6Gbps ~3MBytes/s 5000 per sec Syslog Mirrored Traffic (plain text) (Raw Data) Border Router DPI Syslog Collector Analysis Farm
  • 7. Background Network Log Analysis System 6Gbps ~3MBytes/s 5000 per sec Syslog Mirrored Traffic (plain text) (Raw Data) Border Router DPI Syslog Collector Analysis Farm
  • 8. Background Network Log Analysis System: Border Router • Handle all incoming and outgoing traffic • Connecting to multiple ISPs • Traffic at 6Gbps 6Gbps ~3MBytes/s 5000 per sec Syslog Mirrored Traffic (plain text) (Raw Data) Border Router DPI Syslog Collector Analysis Farm
  • 9. Background Network Log Analysis System: DPI Engine • Input: 6Gbps raw network traffic • Output: 3MBytes/s syslog messages • Running on an x86 server • Analyze every network session 6Gbps ~3MBytes/s 5000 per sec Syslog Mirrored Traffic (plain text) (Raw Data) Border Router DPI Syslog Collector Analysis Farm
  • 10. Background Network Log Analysis System: Syslog Collector • Java-written syslog collector • Running on a virtual machine • Insertion rate: 5000/s on average, 12000/s at peak 6Gbps ~3MBytes/s 5000 per sec Syslog Mirrored Traffic (plain text) (Raw Data) Border Router DPI Syslog Collector Analysis Farm
  • 11. Background Network Log Analysis System: Analysis Farm • Store log • Analyze log 6Gbps ~3MBytes/s 5000 per sec Syslog Mirrored Traffic (plain text) (Raw Data) Border Router DPI Syslog Collector Analysis Farm
  • 12. Background Log Analysis Tasks • Aggregating • Get the overall usage of network border • Querying • Inspect network activities http.tcp 1320155721-1320155731 202.120.2.102:54285-8.8.4.4:80 374 24021
  • 13. Background Log Analysis Tasks 400 million log records per day (350GByte) !
  • 14. Background Research Challenges • Storage Scalability • Computation Scalability • Query Agility
  • 15. Background Related Work • loggly.com • “Logging as a Service” • Yottaa.com • Log-based Website performance analysis • They use cloud-based solutions for scalability
  • 16. Outline • Background • Design and Implementation • Experimental Results • Summary
  • 17. Design and Implementation Our Approach: Cloud Computing + NoSQL • Cloud Computing • manageable, scalable, on demand resources • OpenStack open source toolset for building clouds • NoSQL (Not Only SQL) • weaken ACID to improve performance • MongoDB document-oriented distributed database
  • 18. Design and Implementation The Architecture of Analysis Farm Request Users mongos Configuration server Application Layer mongod mongod mongod mongod VM VM VM VM IaaS Layer Memory iSCIS Hardware Resource CPU Network Storage Pool
  • 19. Design and Implementation How we tackle the three challenges? • Storage Scalability • On line Storage Expansion • Computation Scalability • MongoDB Scale out • Query Agility • MongoDB Handles ad hoc queries effectively
  • 20. Design and Implementation Address the Storage Scalability On Line Storage Expansion 1.The application servers ask the IaaS layer for more disk space. 2.The IaaS layer asks the hardware resource pool to attach new block devices. 3.The application servers execute on line filesystem expansion. No service interruption
  • 21. Design and Implementation Address the Computation Scalability MongoDB Scale out 1.The IaaS provides a new server to the cluster. MapReduce 2.The MongoDB cluster Request rebalances data automatically. combiner mongos No service interruption mapper, combiner mapper, combiner mapper, combiner mapper, combiner mongod mongod mongod mongod
  • 22. Design and Implementation Address the Query Agility MongoDB handles ad hoc queries effectively • Expressive Data Model • Building Blocks for Compound Queries • Aggregating tools such as Group, MapReduce • Effective Optimization Methods, such as index
  • 23. Outline • Background • Design and Implementation • Experimental Results • Summary
  • 24. Experimental Results Aggregating and Querying • Aggregating Log • Ad hoc Querying SPEED is our primary focus.
  • 25. Experimental Results Experimental Setup for Aggregating • Method • Aggregate 10-min log with MongoDB MapReduce • Dataset • One day’s log records, ~400million records • Configurations for Comparison • 1x farm: 4 mongod threads on a single server • 4x farm: 4 mongod threads on four servers • 8x farm: 8 mongod threads on eight servers
  • 26. Experimental Results Experimental Results for Aggregating Rate Type Records Processed Time (records/s) 1x 3201454 523s 6119 4x 3103742 200s 15568 8x 3317013 111s 29883 Experimental Results for 10-minute Log Aggregating
  • 27. Experimental Results Experimental Setup for ad hoc Querying • Method • Execute ad hoc querying • Dataset • One day’s log records, ~400million records • Index • (start_t, end_t, src_IP, dst_IP, app) • Configuration for Analysis Farm • 8x farm: 8 mongod threads on eight servers
  • 28. Experimental Results Experimental Setup for ad hoc Querying (cont.) • Query Types • IP-initial Querysrc_IP == IP • IP-engaging Query src_IP == IP OR dst_IP == IP • IP-pair Query IP-pair engaging AND app == HTTP • Time Scopes • 10 minutes, 30 minutes, 60 minutes
  • 29. Experimental Results Experimental Results for IP-initial Query Rate Time Scope Execution Time Records Scanned (records/s) 10min 3.085s 227581 73770 30min 8.816s 643259 72965 60min 18.517s 1370443 73795 Experimental Results for IP-initial Query (src_IP == IP)
  • 30. Experimental Results Experimental Results for IP-engaging Query Rate Time Scope Execution Time Records Scanned (records/s) 10min 18.012s 1234582 68542 30min 54.708s 3673304 67144 60min 119.034s 7912644 66474 Experimental Results for IP-engaging Query (src_IP == IP OR dst_IP == IP)
  • 31. Experimental Results Experimental Results for IP-pair Query Rate Time Scope Execution Time Records Scanned (records/s) 10min 5.670s 296772 52340 30min 6.267s 324813 51829 60min 19.327s 1027513 53165 Experimental Results for IP-pair Query (the IP-pair engages AND app == http)
  • 32. Outline • Background • Design and Implementation • Experimental Results • Summary
  • 33. Summary • Analysis Farm is built on OpenStack and MongoDB • Analysis Farm is easy-to-manage and easy-to-scale-out • Feasibility in aggregating and querying is verified • We use Analysis Farm to analyze 400 million, or 350GB log records every day
  • 34. Acknowledgement • 973 program and NFSC • My partners in Shanghai Jiaotong Univ. • Dr. Lin Gu in HKUST • Workshop organizers and reviewers
  • 35. Analysis Farm: A Cloud-based Scalable Aggregation and Query Platform for Network Log Analysis Shanghai Jiaotong University wei.jianwen@gmail.com @JianwenWEI Thank you! The 2011 International Workshop on Data Cloud (D-CLOUD 2011), Hong Kong