SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
1.5 Million Log Lines per Second 
Big Data Everywhere Chicago 2014 
Mike Keane 
mkeane@conversant.com 
Building and maintaining Flume flows at Conversant
•Quicker insight into production data 
•Reduce complexity of administering/managing new servers, data centers, etc. 
•Scalable 
•No data loss or duplication 
•Replace TSV files with Avro objects 
•Able to be monitored by Network Operations Center (NOC) 
•Able to recover from downtime quickly 
R 
SLA for Event Driven Logging with Flume
•A Flume Flow is a series of flume agents data follows from origination to final destination 
•Data on a Flume Flow is packaged in FlumeEvent Avro objects 
•A FlumeEvent is composed of 
•Headers – A map of string value pairs 
•Body – A byte array 
•A FlumeEvent is an atomic unit of data 
•FlumeEvents are sent in batches 
•When a batch of FlumeEvents only partially makes it to the next flume agent in the flow, the entire batch is resent resulting in duplicates 
R 
Simplistic Flume Overview
R 
Simplistic Flume Overview 
Flume Agent
R 
Simplistic Flume Overview 
EmbeddedAgent 
Compressor 
Agent 
Landing 
Agent
Overview of existing network topology 
•3 data centers divided into 12 lanes participating in the OpenRTB market 
•6 lanes in the east coast data center 
•4 lanes in the west coast data center 
•2 lanes in the European data center 
•Each lane has approximately 75 servers handling OpenRTB operations. 
•30 different logs 
•Over 60,000,000,000 log lines per day
Overview of existing network topology.
•2 Server Flume Flow from East Coast (IAD) to Chicago (ORD) with over 250K TSV lines per second 
•No Data Loss 
•Failover 
•Compression performance 
P.O.C. Can Flume handle our log volume reliably?
P.O.C. Overview
P.O.C. passes 
•Larger Batch sizes helped, but could not reach 250K per second 
•Multiple TSV lines Per FlumeEvent hits over 360K per second 
•Failover passed with duplicates 
•Compression passed but needed to parallelize 7X sinks
Taking Flume to Production 
•Embedding the EmbeddedAgent in existing servers 
•Modify EmbeddedAgent 
•Properties from existing infrastructure 
•Implement Monitoring 
•Create “Flume”Implementation of proprietary logging interface 
•Replace POJO to TSV with Avro to AvroDataFile 
•Preventing duplicates, not removing 
•Add LogType header
Taking Flume to Production 
•Custom Sink for AvroDataFile body (based on HDFSEventSink) 
•Check if UUID header is in HBase 
•Yes – increment duplicate count metric 
•No 
•Write AvroDataFile body to HDFS using Custom Writer 
•Put UUID to HBase
Taking Flume to Production 
•Custom Selector based on MultiplexingChannelSelector 
•Route FlumeEvents to channels by log type or groups of log types 
•Bifurcate to multiple locations each log and each location with its own percentage of data to bifurcate
Configuring Flume Flows 
•Configuring Flume can be tedious, use a templating engine 
•In Q2 2014 Conversant expanded from 7 lanes in 2 data centers to 12 lanes in 3 data centers (~400 more servers to configure). 
•Static headers useful for tracking flows 
•15 minutes to configure all Q2 expansion CompressorLane('iad6', [CompressorAgent("dtiad06flm01p"), CompressorAgent("dtiad06flm02p"), CompressorAgent("dtiad06flm03p")]) compressor.list = dtiad06flm01p, dtiad06flm02p,dtiad06flm03p
Monitoring the Flume Flows 
•Flume metrics are available by JMX or Json over HTTP 
•Metrics to monitor 
•ChannelFillPercentage 
•Rate of change on EventDrainSuccessCount on failover sinks 
•FLUME-2307 – File channel deletes fail after timeout (fixed 1.5) 
•Publishing metrics to TSDB provides great visual insight
Monitoring the Flume Flows 
ChannelFillPercentage
Monitoring the Flume Flows 
Rate of taking events off “Critical Logs” file channel
Monitoring the Flume Flows 
Rate of Flume Events by data center East Coast, West Coast, Europe
Monitoring the Flume Flows 
Monitoring by Groups
Benefits of migrating to Flume 
•Business has insight into data in under 10 minutes 
•Configuring expansion trivial 
•Failover enables automatic recovery from down time 
•Bifurcation 
•enables scaled constant regression lane(s) 
•Subset of data to analytics development cluster
Benefits of migrating to Flume 
5 minute aggregations to business within 10 minutes
Gotchas… 
•Scaling for Compression 
•Auto reloading of properties inconsistent 
•“It is recommended (though not required) to use a separate disk for the File Channel checkpoint.” RAID-6 raid array, Force Write Back 
•Bad configurations not easy to see, not always clear in log file. 
•NetcatSource – Not too useful beyond trivial usage
Gotchas… 
•POM file edits 
•JUnits are not deterministic 
•Hadoop jars added to classpath by startup script – IDE 
•Avoiding cost of Avro schema evolution
What is next 
•Upgrade to Flume 1.5 
•Bifurcate to micro batch (Storm? Spark?) 
•Disable sink switch

Contenu connexe

Dernier

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 

Dernier (20)

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 

En vedette

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

  • 1. 1.5 Million Log Lines per Second Big Data Everywhere Chicago 2014 Mike Keane mkeane@conversant.com Building and maintaining Flume flows at Conversant
  • 2. •Quicker insight into production data •Reduce complexity of administering/managing new servers, data centers, etc. •Scalable •No data loss or duplication •Replace TSV files with Avro objects •Able to be monitored by Network Operations Center (NOC) •Able to recover from downtime quickly R SLA for Event Driven Logging with Flume
  • 3. •A Flume Flow is a series of flume agents data follows from origination to final destination •Data on a Flume Flow is packaged in FlumeEvent Avro objects •A FlumeEvent is composed of •Headers – A map of string value pairs •Body – A byte array •A FlumeEvent is an atomic unit of data •FlumeEvents are sent in batches •When a batch of FlumeEvents only partially makes it to the next flume agent in the flow, the entire batch is resent resulting in duplicates R Simplistic Flume Overview
  • 4. R Simplistic Flume Overview Flume Agent
  • 5. R Simplistic Flume Overview EmbeddedAgent Compressor Agent Landing Agent
  • 6. Overview of existing network topology •3 data centers divided into 12 lanes participating in the OpenRTB market •6 lanes in the east coast data center •4 lanes in the west coast data center •2 lanes in the European data center •Each lane has approximately 75 servers handling OpenRTB operations. •30 different logs •Over 60,000,000,000 log lines per day
  • 7. Overview of existing network topology.
  • 8. •2 Server Flume Flow from East Coast (IAD) to Chicago (ORD) with over 250K TSV lines per second •No Data Loss •Failover •Compression performance P.O.C. Can Flume handle our log volume reliably?
  • 10. P.O.C. passes •Larger Batch sizes helped, but could not reach 250K per second •Multiple TSV lines Per FlumeEvent hits over 360K per second •Failover passed with duplicates •Compression passed but needed to parallelize 7X sinks
  • 11. Taking Flume to Production •Embedding the EmbeddedAgent in existing servers •Modify EmbeddedAgent •Properties from existing infrastructure •Implement Monitoring •Create “Flume”Implementation of proprietary logging interface •Replace POJO to TSV with Avro to AvroDataFile •Preventing duplicates, not removing •Add LogType header
  • 12. Taking Flume to Production •Custom Sink for AvroDataFile body (based on HDFSEventSink) •Check if UUID header is in HBase •Yes – increment duplicate count metric •No •Write AvroDataFile body to HDFS using Custom Writer •Put UUID to HBase
  • 13. Taking Flume to Production •Custom Selector based on MultiplexingChannelSelector •Route FlumeEvents to channels by log type or groups of log types •Bifurcate to multiple locations each log and each location with its own percentage of data to bifurcate
  • 14. Configuring Flume Flows •Configuring Flume can be tedious, use a templating engine •In Q2 2014 Conversant expanded from 7 lanes in 2 data centers to 12 lanes in 3 data centers (~400 more servers to configure). •Static headers useful for tracking flows •15 minutes to configure all Q2 expansion CompressorLane('iad6', [CompressorAgent("dtiad06flm01p"), CompressorAgent("dtiad06flm02p"), CompressorAgent("dtiad06flm03p")]) compressor.list = dtiad06flm01p, dtiad06flm02p,dtiad06flm03p
  • 15. Monitoring the Flume Flows •Flume metrics are available by JMX or Json over HTTP •Metrics to monitor •ChannelFillPercentage •Rate of change on EventDrainSuccessCount on failover sinks •FLUME-2307 – File channel deletes fail after timeout (fixed 1.5) •Publishing metrics to TSDB provides great visual insight
  • 16. Monitoring the Flume Flows ChannelFillPercentage
  • 17. Monitoring the Flume Flows Rate of taking events off “Critical Logs” file channel
  • 18. Monitoring the Flume Flows Rate of Flume Events by data center East Coast, West Coast, Europe
  • 19. Monitoring the Flume Flows Monitoring by Groups
  • 20. Benefits of migrating to Flume •Business has insight into data in under 10 minutes •Configuring expansion trivial •Failover enables automatic recovery from down time •Bifurcation •enables scaled constant regression lane(s) •Subset of data to analytics development cluster
  • 21. Benefits of migrating to Flume 5 minute aggregations to business within 10 minutes
  • 22. Gotchas… •Scaling for Compression •Auto reloading of properties inconsistent •“It is recommended (though not required) to use a separate disk for the File Channel checkpoint.” RAID-6 raid array, Force Write Back •Bad configurations not easy to see, not always clear in log file. •NetcatSource – Not too useful beyond trivial usage
  • 23. Gotchas… •POM file edits •JUnits are not deterministic •Hadoop jars added to classpath by startup script – IDE •Avoiding cost of Avro schema evolution
  • 24. What is next •Upgrade to Flume 1.5 •Bifurcate to micro batch (Storm? Spark?) •Disable sink switch