SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
Hadoop @ eBuddy
eBuddy
Web based chat (Started in 2003)
● Initially no statistics, msn only
● Started basic logging in 2004
● Today
  ○ 34.467.010.693 login records (34x109)
  ○ It takes about 40min to select them all.
XMS (Launched May 23, 2011)
● Today
   ○ 1.334.794.121 records (1,3x109)
Website (google analytics)
Banners (openx)
Warehousing needs
● Product owners
  ○ Comparing product version
     ■ avg duration
     ■ msg sent/received
  ○ Churn analysis
  ○ Feature analysis
● Marketing
  ○ What countries should we focus on
  ○ What people should we target?
● Sales
  ○ Sell banners in countries/products.
● Operations/Dev
  ○ Help solve bugs
  ○ Blocked in countries/providers
Hadoop @ eBuddy
Hadoop @ eBuddy
Interesting to know
● Developers are Java centric
● Hosting in the US but BI people in Amsterdam
● 18 hadoop nodes each having
    ○ 16 cores
    ○ 24G ram
    ○ 4x400G HD's
●   We make money with banners
    ○ So don't expect deep pockets
Warehouse timeline
● Traditional rdbms (2004)
● Custom mapreduce code (2008)
  ○ Joining two files (merge join/map join?)
  ○ Repeating code
  ○ Consider abstraction
  ○ Changing data changing code?
● Pig scripts (2008/2009)
  ○ Much simpler to read but domain specific
● Hive (2009)
  ○ Generic sql but with some limitations
  ○ Existing tools can be used
Hive
● Hey I already know this:
select *
from table1 t1
  left outer join table2 t2 on (t1.id = t2.id)
where t2.id is null;


● Java programmers will like this:
  ○ Spring JdbcTemplates
  ○ Existing jdbc tools (SQuirreL)
  ○ Syntax highlighting
  ○ Code completion
Present
● App servers log to mysql
  ○ Brittle but it works
● Hive
  ○ Sql (most developers know this)
  ○ Partition pruning issues
  ○ No rollup queries
● ETL
  ○ Star schema
  ○ Fair scheduling (ETL vs BI)
     ■ reserved for etl pool
     ■ don't start reducers until 90% mappers done
  ○ Lzo on all jobs
● MicroStrategy (odbc)
● SQuirreL (jdbc)
Future
● Look at users from a to z
  ○ website logs
  ○ banners
● Cassandra handler for hive
  ○ Looking at contact lists (not just size)
● Streaming ETL
  ○ flume
      ■ No more mysql & scripts
      ■ Directly write into the correct partition
  ○ avro
      ■ Less schema related problems
  ○ snappy
      ■ Lightweight compression
Questions?
Hive partition pruning
● Won't work
select count(*)
from chatsessions cs
  inner join calendar c on (c.cldr_id = cs.login_cldr_id)
where c.iso_date = '2012-06-14';


● Will work
select cldr_id from calendar where iso_date = '2012-06-14';
select count(*) from chatsessions where login_cldr_id in (1234);
Hadoop @ eBuddy
Left outer join in Pig
A = LOAD 'file1' USING PigStorage(',') AS (a1:int,a2:chararray);
B = LOAD 'file2' USING PigStorage(',') AS (b1:int,b2:chararray);
C = COGROUP A BY a1, B BY b1 OUTER;
X = FILTER C BY IsEmpty(B);
Z = FOREACH X GENERATE flatten(A.a2);
DUMP Z;
● avro & hive: https://issues.apache.org/jira/browse/HIVE-
  895

● flume:
   https://cwiki.apache.org/FLUME/

Contenu connexe

Tendances

Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...
Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...
Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...PROIDEA
 
Sharding - patterns & antipatterns, Константин Осипов, Алексей Рыбак
Sharding -  patterns & antipatterns, Константин Осипов, Алексей РыбакSharding -  patterns & antipatterns, Константин Осипов, Алексей Рыбак
Sharding - patterns & antipatterns, Константин Осипов, Алексей РыбакOntico
 
Talend connect BE Vincent Harcq - Talend ESB - DI
Talend connect BE Vincent Harcq - Talend  ESB - DITalend connect BE Vincent Harcq - Talend  ESB - DI
Talend connect BE Vincent Harcq - Talend ESB - DIVincent Harcq
 
Neo4j Spatial at LocationDay 2013 in Malmö
Neo4j Spatial at LocationDay 2013 in MalmöNeo4j Spatial at LocationDay 2013 in Malmö
Neo4j Spatial at LocationDay 2013 in MalmöCraig Taverner
 
Sharding: patterns and antipatterns (Osipov, Rybak, HighLoad'2014)
Sharding: patterns and antipatterns (Osipov, Rybak, HighLoad'2014)Sharding: patterns and antipatterns (Osipov, Rybak, HighLoad'2014)
Sharding: patterns and antipatterns (Osipov, Rybak, HighLoad'2014)Alexey Rybak
 
Data pipelines observability: OpenLineage & Marquez
Data pipelines observability:  OpenLineage & MarquezData pipelines observability:  OpenLineage & Marquez
Data pipelines observability: OpenLineage & MarquezJulien Le Dem
 
10 ways to stumble with big data
10 ways to stumble with big data10 ways to stumble with big data
10 ways to stumble with big dataLars Albertsson
 
Intro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana GoriucIntro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana GoriucFraugster
 
FastReport VCL6 Nuremberg 2018
FastReport VCL6 Nuremberg 2018FastReport VCL6 Nuremberg 2018
FastReport VCL6 Nuremberg 2018Fast Reports
 
Distributed Logging System Using Elasticsearch Logstash,Beat,Kibana Stack and...
Distributed Logging System Using Elasticsearch Logstash,Beat,Kibana Stack and...Distributed Logging System Using Elasticsearch Logstash,Beat,Kibana Stack and...
Distributed Logging System Using Elasticsearch Logstash,Beat,Kibana Stack and...Sanjog Kumar Dash
 
Distributed unique id generation
Distributed unique id generationDistributed unique id generation
Distributed unique id generationTung Nguyen
 
Challenges in knowledge graph visualization
Challenges in knowledge graph visualizationChallenges in knowledge graph visualization
Challenges in knowledge graph visualizationGraphAware
 
ConvNetJS & CaffeJS
ConvNetJS & CaffeJSConvNetJS & CaffeJS
ConvNetJS & CaffeJSAnyline
 
Cypher for Apache Spark
Cypher for Apache SparkCypher for Apache Spark
Cypher for Apache SparkopenCypher
 
Customer segmentation scbcn17
Customer segmentation scbcn17Customer segmentation scbcn17
Customer segmentation scbcn17Julio Martinez
 
Efficient analysis of large scale digital circuits and parasitic informations
Efficient analysis of large scale digital circuits and parasitic informationsEfficient analysis of large scale digital circuits and parasitic informations
Efficient analysis of large scale digital circuits and parasitic informationsDimitris Akridas
 
Introduction to GraphX | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to GraphX | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to GraphX | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to GraphX | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
The immutable database datomic
The immutable database   datomicThe immutable database   datomic
The immutable database datomicLaurence Chen
 

Tendances (20)

Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...
Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...
Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...
 
Sharding - patterns & antipatterns, Константин Осипов, Алексей Рыбак
Sharding -  patterns & antipatterns, Константин Осипов, Алексей РыбакSharding -  patterns & antipatterns, Константин Осипов, Алексей Рыбак
Sharding - patterns & antipatterns, Константин Осипов, Алексей Рыбак
 
Talend connect BE Vincent Harcq - Talend ESB - DI
Talend connect BE Vincent Harcq - Talend  ESB - DITalend connect BE Vincent Harcq - Talend  ESB - DI
Talend connect BE Vincent Harcq - Talend ESB - DI
 
Neo4j Spatial at LocationDay 2013 in Malmö
Neo4j Spatial at LocationDay 2013 in MalmöNeo4j Spatial at LocationDay 2013 in Malmö
Neo4j Spatial at LocationDay 2013 in Malmö
 
Sharding: patterns and antipatterns (Osipov, Rybak, HighLoad'2014)
Sharding: patterns and antipatterns (Osipov, Rybak, HighLoad'2014)Sharding: patterns and antipatterns (Osipov, Rybak, HighLoad'2014)
Sharding: patterns and antipatterns (Osipov, Rybak, HighLoad'2014)
 
Data pipelines observability: OpenLineage & Marquez
Data pipelines observability:  OpenLineage & MarquezData pipelines observability:  OpenLineage & Marquez
Data pipelines observability: OpenLineage & Marquez
 
Ad Placement Challenge
Ad Placement ChallengeAd Placement Challenge
Ad Placement Challenge
 
10 ways to stumble with big data
10 ways to stumble with big data10 ways to stumble with big data
10 ways to stumble with big data
 
Intro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana GoriucIntro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana Goriuc
 
FastReport VCL6 Nuremberg 2018
FastReport VCL6 Nuremberg 2018FastReport VCL6 Nuremberg 2018
FastReport VCL6 Nuremberg 2018
 
Distributed Logging System Using Elasticsearch Logstash,Beat,Kibana Stack and...
Distributed Logging System Using Elasticsearch Logstash,Beat,Kibana Stack and...Distributed Logging System Using Elasticsearch Logstash,Beat,Kibana Stack and...
Distributed Logging System Using Elasticsearch Logstash,Beat,Kibana Stack and...
 
Text Indexing / Inverted Indices
Text Indexing / Inverted IndicesText Indexing / Inverted Indices
Text Indexing / Inverted Indices
 
Distributed unique id generation
Distributed unique id generationDistributed unique id generation
Distributed unique id generation
 
Challenges in knowledge graph visualization
Challenges in knowledge graph visualizationChallenges in knowledge graph visualization
Challenges in knowledge graph visualization
 
ConvNetJS & CaffeJS
ConvNetJS & CaffeJSConvNetJS & CaffeJS
ConvNetJS & CaffeJS
 
Cypher for Apache Spark
Cypher for Apache SparkCypher for Apache Spark
Cypher for Apache Spark
 
Customer segmentation scbcn17
Customer segmentation scbcn17Customer segmentation scbcn17
Customer segmentation scbcn17
 
Efficient analysis of large scale digital circuits and parasitic informations
Efficient analysis of large scale digital circuits and parasitic informationsEfficient analysis of large scale digital circuits and parasitic informations
Efficient analysis of large scale digital circuits and parasitic informations
 
Introduction to GraphX | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to GraphX | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to GraphX | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to GraphX | Big Data Hadoop Spark Tutorial | CloudxLab
 
The immutable database datomic
The immutable database   datomicThe immutable database   datomic
The immutable database datomic
 

En vedette

Extending WordPress. Making use of Custom Post Types
Extending WordPress. Making use of Custom Post TypesExtending WordPress. Making use of Custom Post Types
Extending WordPress. Making use of Custom Post TypesUtsav Singh Rathour
 
When to use WordPress MultiSite WordCamp Nepal 2012
When to use WordPress MultiSite WordCamp Nepal 2012When to use WordPress MultiSite WordCamp Nepal 2012
When to use WordPress MultiSite WordCamp Nepal 2012Utsav Singh Rathour
 
Must see & experience while in australia
Must see & experience while in australiaMust see & experience while in australia
Must see & experience while in australiaMaiju Heinonen
 
Claro luna partitura
Claro luna partituraClaro luna partitura
Claro luna partituraNa Re
 
Nr16 atividades e operações perigosas
Nr16 atividades e operações perigosasNr16 atividades e operações perigosas
Nr16 atividades e operações perigosasCarlos Colombo
 
Ttg on twitter (1)
Ttg on twitter (1)Ttg on twitter (1)
Ttg on twitter (1)drpdwilkins
 
wine and grape with france regions.......
wine and grape with france regions.......wine and grape with france regions.......
wine and grape with france regions.......vikas dobhal
 
WordCamps and how you can make the most of it
WordCamps and how you can make the most of itWordCamps and how you can make the most of it
WordCamps and how you can make the most of itUtsav Singh Rathour
 
What are child themes, and why use them
What are child themes, and why use themWhat are child themes, and why use them
What are child themes, and why use themUtsav Singh Rathour
 

En vedette (19)

La familia
La familiaLa familia
La familia
 
Hive jdbc
Hive jdbcHive jdbc
Hive jdbc
 
Power profesiones
Power profesionesPower profesiones
Power profesiones
 
Extending WordPress. Making use of Custom Post Types
Extending WordPress. Making use of Custom Post TypesExtending WordPress. Making use of Custom Post Types
Extending WordPress. Making use of Custom Post Types
 
Alimentos saludable
Alimentos saludableAlimentos saludable
Alimentos saludable
 
Introducao blue solar
Introducao blue solarIntroducao blue solar
Introducao blue solar
 
Working with WordPress themes
Working with WordPress themesWorking with WordPress themes
Working with WordPress themes
 
When to use WordPress MultiSite WordCamp Nepal 2012
When to use WordPress MultiSite WordCamp Nepal 2012When to use WordPress MultiSite WordCamp Nepal 2012
When to use WordPress MultiSite WordCamp Nepal 2012
 
Must see & experience while in australia
Must see & experience while in australiaMust see & experience while in australia
Must see & experience while in australia
 
Claro luna partitura
Claro luna partituraClaro luna partitura
Claro luna partitura
 
Nr16 atividades e operações perigosas
Nr16 atividades e operações perigosasNr16 atividades e operações perigosas
Nr16 atividades e operações perigosas
 
Ttg on twitter (1)
Ttg on twitter (1)Ttg on twitter (1)
Ttg on twitter (1)
 
Power profesiones
Power profesionesPower profesiones
Power profesiones
 
Power profesiones
Power profesionesPower profesiones
Power profesiones
 
wine and grape with france regions.......
wine and grape with france regions.......wine and grape with france regions.......
wine and grape with france regions.......
 
WordCamps and how you can make the most of it
WordCamps and how you can make the most of itWordCamps and how you can make the most of it
WordCamps and how you can make the most of it
 
Plan anual 2015 cc ee noveno
Plan anual 2015 cc ee novenoPlan anual 2015 cc ee noveno
Plan anual 2015 cc ee noveno
 
What are child themes, and why use them
What are child themes, and why use themWhat are child themes, and why use them
What are child themes, and why use them
 
Branding strategy
Branding strategyBranding strategy
Branding strategy
 

Similaire à Hadoop @ eBuddy

TRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use CaseTRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use CaseHakan Ilter
 
Dart the better Javascript 2015
Dart the better Javascript 2015Dart the better Javascript 2015
Dart the better Javascript 2015Jorg Janke
 
Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodbDeep Kapadia
 
Austin bdug 2011_01_27_small_and_big_data
Austin bdug 2011_01_27_small_and_big_dataAustin bdug 2011_01_27_small_and_big_data
Austin bdug 2011_01_27_small_and_big_dataAlex Pinkin
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopTamas K Lengyel
 
BlaBlaCar Elastic Search Feedback
BlaBlaCar Elastic Search FeedbackBlaBlaCar Elastic Search Feedback
BlaBlaCar Elastic Search Feedbacksinfomicien
 
Devoxx : being productive with JHipster
Devoxx : being productive with JHipsterDevoxx : being productive with JHipster
Devoxx : being productive with JHipsterJulien Dubois
 
Scaling up and accelerating Drupal 8 with NoSQL
Scaling up and accelerating Drupal 8 with NoSQLScaling up and accelerating Drupal 8 with NoSQL
Scaling up and accelerating Drupal 8 with NoSQLOSInet
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at UberDatabricks
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB
 
Elasticsearch as a time series database
Elasticsearch as a time series databaseElasticsearch as a time series database
Elasticsearch as a time series databasefelixbarny
 
Kibana+ElasticSearch+LogStash to handle Log messages on Prod servers
Kibana+ElasticSearch+LogStash to handle Log messages on Prod serversKibana+ElasticSearch+LogStash to handle Log messages on Prod servers
Kibana+ElasticSearch+LogStash to handle Log messages on Prod serversHYS Enterprise
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDBPingCAP
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseJihoon Son
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseGruter
 

Similaire à Hadoop @ eBuddy (20)

TRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use CaseTRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use Case
 
Dart the better Javascript 2015
Dart the better Javascript 2015Dart the better Javascript 2015
Dart the better Javascript 2015
 
Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodb
 
Mongodb meetup
Mongodb meetupMongodb meetup
Mongodb meetup
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
Austin bdug 2011_01_27_small_and_big_data
Austin bdug 2011_01_27_small_and_big_dataAustin bdug 2011_01_27_small_and_big_data
Austin bdug 2011_01_27_small_and_big_data
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshop
 
BlaBlaCar Elastic Search Feedback
BlaBlaCar Elastic Search FeedbackBlaBlaCar Elastic Search Feedback
BlaBlaCar Elastic Search Feedback
 
Devoxx : being productive with JHipster
Devoxx : being productive with JHipsterDevoxx : being productive with JHipster
Devoxx : being productive with JHipster
 
Scaling up and accelerating Drupal 8 with NoSQL
Scaling up and accelerating Drupal 8 with NoSQLScaling up and accelerating Drupal 8 with NoSQL
Scaling up and accelerating Drupal 8 with NoSQL
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Scaling xtext
Scaling xtextScaling xtext
Scaling xtext
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at Uber
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
Elasticsearch as a time series database
Elasticsearch as a time series databaseElasticsearch as a time series database
Elasticsearch as a time series database
 
Kibana+ElasticSearch+LogStash to handle Log messages on Prod servers
Kibana+ElasticSearch+LogStash to handle Log messages on Prod serversKibana+ElasticSearch+LogStash to handle Log messages on Prod servers
Kibana+ElasticSearch+LogStash to handle Log messages on Prod servers
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
 

Dernier

AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 

Dernier (20)

AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 

Hadoop @ eBuddy

  • 2. eBuddy Web based chat (Started in 2003) ● Initially no statistics, msn only ● Started basic logging in 2004 ● Today ○ 34.467.010.693 login records (34x109) ○ It takes about 40min to select them all. XMS (Launched May 23, 2011) ● Today ○ 1.334.794.121 records (1,3x109) Website (google analytics) Banners (openx)
  • 3. Warehousing needs ● Product owners ○ Comparing product version ■ avg duration ■ msg sent/received ○ Churn analysis ○ Feature analysis ● Marketing ○ What countries should we focus on ○ What people should we target? ● Sales ○ Sell banners in countries/products. ● Operations/Dev ○ Help solve bugs ○ Blocked in countries/providers
  • 6. Interesting to know ● Developers are Java centric ● Hosting in the US but BI people in Amsterdam ● 18 hadoop nodes each having ○ 16 cores ○ 24G ram ○ 4x400G HD's ● We make money with banners ○ So don't expect deep pockets
  • 7. Warehouse timeline ● Traditional rdbms (2004) ● Custom mapreduce code (2008) ○ Joining two files (merge join/map join?) ○ Repeating code ○ Consider abstraction ○ Changing data changing code? ● Pig scripts (2008/2009) ○ Much simpler to read but domain specific ● Hive (2009) ○ Generic sql but with some limitations ○ Existing tools can be used
  • 8. Hive ● Hey I already know this: select * from table1 t1 left outer join table2 t2 on (t1.id = t2.id) where t2.id is null; ● Java programmers will like this: ○ Spring JdbcTemplates ○ Existing jdbc tools (SQuirreL) ○ Syntax highlighting ○ Code completion
  • 9. Present ● App servers log to mysql ○ Brittle but it works ● Hive ○ Sql (most developers know this) ○ Partition pruning issues ○ No rollup queries ● ETL ○ Star schema ○ Fair scheduling (ETL vs BI) ■ reserved for etl pool ■ don't start reducers until 90% mappers done ○ Lzo on all jobs ● MicroStrategy (odbc) ● SQuirreL (jdbc)
  • 10. Future ● Look at users from a to z ○ website logs ○ banners ● Cassandra handler for hive ○ Looking at contact lists (not just size) ● Streaming ETL ○ flume ■ No more mysql & scripts ■ Directly write into the correct partition ○ avro ■ Less schema related problems ○ snappy ■ Lightweight compression
  • 12. Hive partition pruning ● Won't work select count(*) from chatsessions cs inner join calendar c on (c.cldr_id = cs.login_cldr_id) where c.iso_date = '2012-06-14'; ● Will work select cldr_id from calendar where iso_date = '2012-06-14'; select count(*) from chatsessions where login_cldr_id in (1234);
  • 14. Left outer join in Pig A = LOAD 'file1' USING PigStorage(',') AS (a1:int,a2:chararray); B = LOAD 'file2' USING PigStorage(',') AS (b1:int,b2:chararray); C = COGROUP A BY a1, B BY b1 OUTER; X = FILTER C BY IsEmpty(B); Z = FOREACH X GENERATE flatten(A.a2); DUMP Z;
  • 15. ● avro & hive: https://issues.apache.org/jira/browse/HIVE- 895 ● flume: https://cwiki.apache.org/FLUME/