SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Ruby for the soul of
  BigData Nerds
Who Am I?
●
    Engineering Team Lead
    Analytics & Data Platforms @ Viki.com




●
    Founder of http://BigData.SG




●
    Contributor to fluentd, pfeed, cartographer, watir
BigData & Its Challenges
"big data" is when the size of the data itself becomes part of the problem
                                                        - Mike Loukides



●
  Twitter produces over 230 million tweets per day
●
  Wal-Mart is logging one million transactions per hour
●
  Facebook creates over 30 billion pieces of content
ranging from web links, news, blogs, photo
Everyone has a big data problem
Evolving Trends

       Batch Processing
        Hadoop , HPCC, Google BigQuery



      Stream Processing
         STORM (Twitter) & S4 (Yahoo)
Common Engineering Challenges

●
    Data Collection
●
    Filtering / Segmentation
●
    Data Storage
●
    Analysis
●
    Visualization
●
    Prediction / Extrapolation
Data Collection + Filtering /
Segmentation




           http://fluentd.org/
Data Collection + Filtering /
Segmentation
                You send events as:
                Http://domain:8080/namespace?key1=value1&key2=value2



                Fluent forwards the data as:
                <timestamp> <namespace> {key1:value1,key2:value2}




           http://fluentd.org/
Screencast:
http://www.bigdata.sg/videos/fluentd/
Storage

          Hadoop HDFS

           OpenTSDB
           (http://opentsdb.net)



          SciDB (DMAS)
Analysis



   Hadoop Streaming (Ruby)

  Hadoop Hive (Using rbhive)
Visualization
          Custom Dashboard
                   (Rails + Google Charts / d3.js)




   Some Hosted Services: tableaupublic.com, geckoboard.com, splunkstorm.com
Stream Computing
What is STORM?
STORM terminology
●
 Streams
●
 Spouts
●
 Bolts
●
 Topologies
RedStorm
        (https://github.com/colinsurprenant/redstorm)


$ rvm use jruby-1.6.3
$ bundle install redstorm
$ bundle exec redstorm install
Visualizing average bandwidth
experienced by users while
watching videos on viki.com across
the globe.
Thank you!




    Let's stay in touch :)
●
    Signup for my newsletter at http://parolkar.com
●
    Visit BigData.SG Meetup in Singapore.

Contenu connexe

Tendances

Using mruby in the nosql database Avocadodb
Using mruby in the nosql database AvocadodbUsing mruby in the nosql database Avocadodb
Using mruby in the nosql database Avocadodbavocadodb
 
SeaDataCloud - Developing the SeaDataCloud Virtual Research Environment (VRE)
SeaDataCloud - Developing the SeaDataCloud Virtual Research Environment (VRE)SeaDataCloud - Developing the SeaDataCloud Virtual Research Environment (VRE)
SeaDataCloud - Developing the SeaDataCloud Virtual Research Environment (VRE)EUDAT
 
Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011Dan Harvey
 
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 2)
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 2)FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 2)
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 2)FIWARE
 
Wikidata and Semantic MediaWiki
Wikidata and Semantic MediaWiki Wikidata and Semantic MediaWiki
Wikidata and Semantic MediaWiki Bernhard Krabina
 
Proteon - DevOps Live 2019 - OpenShift Pitfalls
Proteon - DevOps Live 2019 - OpenShift PitfallsProteon - DevOps Live 2019 - OpenShift Pitfalls
Proteon - DevOps Live 2019 - OpenShift Pitfallsproteon-openshift-services
 
Chaos engineering for start ups and sm es
Chaos engineering for start ups and sm esChaos engineering for start ups and sm es
Chaos engineering for start ups and sm esJagdeep Singh
 
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE
 
Hungarian ClusterGrid and its applications
Hungarian ClusterGrid and its applicationsHungarian ClusterGrid and its applications
Hungarian ClusterGrid and its applicationsFerenc Szalai
 
Webinar: Building a multi-cloud Kubernetes storage on GitLab
Webinar: Building a multi-cloud Kubernetes storage on GitLabWebinar: Building a multi-cloud Kubernetes storage on GitLab
Webinar: Building a multi-cloud Kubernetes storage on GitLabMayaData Inc
 
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysiaFOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysiaLinuxmalaysia Malaysia
 
Expert Roundtable: The Future of Metadata After Hive Metastore
Expert Roundtable: The Future of Metadata After Hive MetastoreExpert Roundtable: The Future of Metadata After Hive Metastore
Expert Roundtable: The Future of Metadata After Hive MetastorelakeFS
 
Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019
Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019
Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019Codemotion
 
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWAREFIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWAREFIWARE
 
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 1)
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 1)FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 1)
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 1)FIWARE
 
Accelerating Spark with Kubernetes
Accelerating Spark with KubernetesAccelerating Spark with Kubernetes
Accelerating Spark with KubernetesAlluxio, Inc.
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user groupAdam Doyle
 

Tendances (20)

Using mruby in the nosql database Avocadodb
Using mruby in the nosql database AvocadodbUsing mruby in the nosql database Avocadodb
Using mruby in the nosql database Avocadodb
 
SeaDataCloud - Developing the SeaDataCloud Virtual Research Environment (VRE)
SeaDataCloud - Developing the SeaDataCloud Virtual Research Environment (VRE)SeaDataCloud - Developing the SeaDataCloud Virtual Research Environment (VRE)
SeaDataCloud - Developing the SeaDataCloud Virtual Research Environment (VRE)
 
Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011
 
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 2)
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 2)FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 2)
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 2)
 
Wikidata and Semantic MediaWiki
Wikidata and Semantic MediaWiki Wikidata and Semantic MediaWiki
Wikidata and Semantic MediaWiki
 
Redis IU
Redis IURedis IU
Redis IU
 
Proteon - DevOps Live 2019 - OpenShift Pitfalls
Proteon - DevOps Live 2019 - OpenShift PitfallsProteon - DevOps Live 2019 - OpenShift Pitfalls
Proteon - DevOps Live 2019 - OpenShift Pitfalls
 
Chaos engineering for start ups and sm es
Chaos engineering for start ups and sm esChaos engineering for start ups and sm es
Chaos engineering for start ups and sm es
 
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
 
Hungarian ClusterGrid and its applications
Hungarian ClusterGrid and its applicationsHungarian ClusterGrid and its applications
Hungarian ClusterGrid and its applications
 
Webinar: Building a multi-cloud Kubernetes storage on GitLab
Webinar: Building a multi-cloud Kubernetes storage on GitLabWebinar: Building a multi-cloud Kubernetes storage on GitLab
Webinar: Building a multi-cloud Kubernetes storage on GitLab
 
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysiaFOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
 
Expert Roundtable: The Future of Metadata After Hive Metastore
Expert Roundtable: The Future of Metadata After Hive MetastoreExpert Roundtable: The Future of Metadata After Hive Metastore
Expert Roundtable: The Future of Metadata After Hive Metastore
 
Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019
Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019
Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019
 
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWAREFIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
 
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 1)
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 1)FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 1)
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 1)
 
Accelerating Spark with Kubernetes
Accelerating Spark with KubernetesAccelerating Spark with Kubernetes
Accelerating Spark with Kubernetes
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user group
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
Maximizing the Impact of Institutional Knowledge Using DSpace
Maximizing the Impact of Institutional Knowledge Using DSpaceMaximizing the Impact of Institutional Knowledge Using DSpace
Maximizing the Impact of Institutional Knowledge Using DSpace
 

Similaire à Ruby for soul of BigData Nerds

Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding HadoopAhmed Ossama
 
Strategies for Context Data Persistence
Strategies for Context Data PersistenceStrategies for Context Data Persistence
Strategies for Context Data PersistenceFIWARE
 
How @twitterhadoop chose google cloud
How @twitterhadoop chose google cloudHow @twitterhadoop chose google cloud
How @twitterhadoop chose google cloudlohitvijayarenu
 
FIWARE Wednesday Webinars - Strategies for Context Data Persistence
FIWARE Wednesday Webinars - Strategies for Context Data PersistenceFIWARE Wednesday Webinars - Strategies for Context Data Persistence
FIWARE Wednesday Webinars - Strategies for Context Data PersistenceFIWARE
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
Thrombus Training Dec. 2013
Thrombus Training Dec. 2013Thrombus Training Dec. 2013
Thrombus Training Dec. 2013CREATIS
 
Build an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsBuild an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsShawn Zhu
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaborationJulien Pivotto
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...Márton Kodok
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...Marcin Bielak
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big AnalyticsAjay Ohri
 
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryGDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryMárton Kodok
 
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramSkillspeed
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?CodePolitan
 
Drupal and the semantic web - SemTechBiz 2012
Drupal and the semantic web - SemTechBiz 2012Drupal and the semantic web - SemTechBiz 2012
Drupal and the semantic web - SemTechBiz 2012scorlosquet
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions Yugabyte
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Mark Rittman
 

Similaire à Ruby for soul of BigData Nerds (20)

Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Strategies for Context Data Persistence
Strategies for Context Data PersistenceStrategies for Context Data Persistence
Strategies for Context Data Persistence
 
Big Trends in Big Data
Big Trends in Big DataBig Trends in Big Data
Big Trends in Big Data
 
How @twitterhadoop chose google cloud
How @twitterhadoop chose google cloudHow @twitterhadoop chose google cloud
How @twitterhadoop chose google cloud
 
FIWARE Wednesday Webinars - Strategies for Context Data Persistence
FIWARE Wednesday Webinars - Strategies for Context Data PersistenceFIWARE Wednesday Webinars - Strategies for Context Data Persistence
FIWARE Wednesday Webinars - Strategies for Context Data Persistence
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
Thrombus Training Dec. 2013
Thrombus Training Dec. 2013Thrombus Training Dec. 2013
Thrombus Training Dec. 2013
 
Build an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsBuild an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data Scientists
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaboration
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
 
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryGDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
 
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x Program
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Drupal and the semantic web - SemTechBiz 2012
Drupal and the semantic web - SemTechBiz 2012Drupal and the semantic web - SemTechBiz 2012
Drupal and the semantic web - SemTechBiz 2012
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
 

Plus de Abhishek Parolkar

MyDuniya enterprise offering
MyDuniya enterprise offeringMyDuniya enterprise offering
MyDuniya enterprise offeringAbhishek Parolkar
 
Nirvigna - Rendering Hi-Res graphics on commodity cluster
Nirvigna - Rendering Hi-Res graphics on commodity clusterNirvigna - Rendering Hi-Res graphics on commodity cluster
Nirvigna - Rendering Hi-Res graphics on commodity clusterAbhishek Parolkar
 
Building SuperComputers @ Home
Building SuperComputers @ HomeBuilding SuperComputers @ Home
Building SuperComputers @ HomeAbhishek Parolkar
 
Building SMS Applications with Ruby-SMPP
Building SMS Applications with Ruby-SMPPBuilding SMS Applications with Ruby-SMPP
Building SMS Applications with Ruby-SMPPAbhishek Parolkar
 
Beyond Version Controlling Git By Parolkar
Beyond Version Controlling Git By ParolkarBeyond Version Controlling Git By Parolkar
Beyond Version Controlling Git By ParolkarAbhishek Parolkar
 
Canvas Tag By Abhishek Parolkar
Canvas Tag By Abhishek ParolkarCanvas Tag By Abhishek Parolkar
Canvas Tag By Abhishek ParolkarAbhishek Parolkar
 

Plus de Abhishek Parolkar (6)

MyDuniya enterprise offering
MyDuniya enterprise offeringMyDuniya enterprise offering
MyDuniya enterprise offering
 
Nirvigna - Rendering Hi-Res graphics on commodity cluster
Nirvigna - Rendering Hi-Res graphics on commodity clusterNirvigna - Rendering Hi-Res graphics on commodity cluster
Nirvigna - Rendering Hi-Res graphics on commodity cluster
 
Building SuperComputers @ Home
Building SuperComputers @ HomeBuilding SuperComputers @ Home
Building SuperComputers @ Home
 
Building SMS Applications with Ruby-SMPP
Building SMS Applications with Ruby-SMPPBuilding SMS Applications with Ruby-SMPP
Building SMS Applications with Ruby-SMPP
 
Beyond Version Controlling Git By Parolkar
Beyond Version Controlling Git By ParolkarBeyond Version Controlling Git By Parolkar
Beyond Version Controlling Git By Parolkar
 
Canvas Tag By Abhishek Parolkar
Canvas Tag By Abhishek ParolkarCanvas Tag By Abhishek Parolkar
Canvas Tag By Abhishek Parolkar
 

Dernier

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Dernier (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Ruby for soul of BigData Nerds

  • 1. Ruby for the soul of BigData Nerds
  • 2. Who Am I? ● Engineering Team Lead Analytics & Data Platforms @ Viki.com ● Founder of http://BigData.SG ● Contributor to fluentd, pfeed, cartographer, watir
  • 3. BigData & Its Challenges "big data" is when the size of the data itself becomes part of the problem - Mike Loukides ● Twitter produces over 230 million tweets per day ● Wal-Mart is logging one million transactions per hour ● Facebook creates over 30 billion pieces of content ranging from web links, news, blogs, photo
  • 4. Everyone has a big data problem
  • 5. Evolving Trends Batch Processing Hadoop , HPCC, Google BigQuery Stream Processing STORM (Twitter) & S4 (Yahoo)
  • 6. Common Engineering Challenges ● Data Collection ● Filtering / Segmentation ● Data Storage ● Analysis ● Visualization ● Prediction / Extrapolation
  • 7. Data Collection + Filtering / Segmentation http://fluentd.org/
  • 8. Data Collection + Filtering / Segmentation You send events as: Http://domain:8080/namespace?key1=value1&key2=value2 Fluent forwards the data as: <timestamp> <namespace> {key1:value1,key2:value2} http://fluentd.org/
  • 10. Storage Hadoop HDFS OpenTSDB (http://opentsdb.net) SciDB (DMAS)
  • 11. Analysis Hadoop Streaming (Ruby) Hadoop Hive (Using rbhive)
  • 12. Visualization Custom Dashboard (Rails + Google Charts / d3.js) Some Hosted Services: tableaupublic.com, geckoboard.com, splunkstorm.com
  • 15. STORM terminology ● Streams ● Spouts ● Bolts ● Topologies
  • 16. RedStorm (https://github.com/colinsurprenant/redstorm) $ rvm use jruby-1.6.3 $ bundle install redstorm $ bundle exec redstorm install
  • 17. Visualizing average bandwidth experienced by users while watching videos on viki.com across the globe.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Thank you! Let's stay in touch :) ● Signup for my newsletter at http://parolkar.com ● Visit BigData.SG Meetup in Singapore.