SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
The Big Data
Revolution is an
               Eric Lubow

               @elubow

               elubow@simplereach.co
Overvie
•   Evolution

•   SimpleReach

•   Data Stores / Languages

•   Architecture Implementation

                  Big Data Revolution is an   Eric Lubow  @elubow
                  Evolution                   #NYCassandra2013
We're in the midst of an
evolution, not a revolution.
       Big Data Revolution is an   Eric Lubow  @elubow
       Evolution                   #NYCassandra2013
The 2 Truths




      Big Data Revolution is an   Eric Lubow  @elubow
      Evolution                   #NYCassandra2013
The Real Truth
Even with the right tools, 80% of
the work of building a big data
system is acquiring and refining

          Big Data Revolution is an   Eric Lubow  @elubow
          Evolution                   #NYCassandra2013
30m plays/day + 4m user ratings + 75k movies metadata + 24.4m use
metadata =




    David Fincher + Kevin                   Mitch Hurwitz + Will Arnett +
  Spacey + British House of                     Jason Bateman + Arrested
           Cards                                      Development
                    Big Data Revolution is an     Eric Lubow  @elubow
                    Evolution                     #NYCassandra2013
BRING IT
TOGETHE

       Big Data Revolution is an   Eric Lubow  @elubow
       Evolution                   #NYCassandra2013
revolution                                          evolution
  Insufficient
                                                        New Products
  Capabilities



  Scale/Need                                           Development &
   Changes                                               Integration




                 Big Data Revolution is an   Eric Lubow  @elubow
                 Evolution                   #NYCassandra2013
Big Data Revolution is an   Eric Lubow  @elubow
Evolution                   #NYCassandra2013
Big Data Revolution is an   Eric Lubow  @elubow
Evolution                   #NYCassandra2013
SimpleReach
•   Millions of URLs per day

•   Over 1 billion pageviews per month

•   250m events per day (~3k events/second)

•   Auto-scale 90-130 machines depending on traffic


                   Big Data Revolution is an   Eric Lubow  @elubow
                   Evolution                   #NYCassandra2013
HUMBLE BEGINNINGS




  Big Data Revolution is an   Eric Lubow  @elubow
  Evolution                   #NYCassandra2013
Scale


        Big Data Revolution is an   Eric Lubow  @elubow
        Evolution                   #NYCassandra2013
AND THEN...



 C*


Big Data Revolution is an   Eric Lubow  @elubow
Evolution                   #NYCassandra2013
Cassandra                                                           C*
•   Large data volume ingestion at high velocity

•   Really fast writes to many locations (eventual
    consistency)

•   Query by column groups within rows (slicing)

•   TTLs for small group aggregation

•   Wrote Helenus, Node.js driver for Cassandra

                      Big Data Revolution is an   Eric Lubow  @elubow
                      Evolution                   #NYCassandra2013
•
    MongoDB
    Fast atomic increments (Node.js is native JSON)

•   Sharding

•   Solid ORM for Rails (MongoID)

•   B-Tree Indexes

•   Document based via JSON

•   TTLs for ephemeral data

                      Big Data Revolution is an   Eric Lubow  @elubow
                      Evolution                   #NYCassandra2013
Redis
•   Supports hundreds of thousands transactions per
    second

•   Great caching engine

•   Supports useful variable types like sets, sorted set,
    lists

•   Everything is guaranteed to be Memory Mapped

                      Big Data Revolution is an   Eric Lubow  @elubow
                      Evolution                   #NYCassandra2013
Infobright
•   Works with standard MySQL driver

•   Column Stores for ad-hoc analytics queries
    in SQL

•   Heavy compression of data (avg 12:1)




                     Big Data Revolution is an   Eric Lubow  @elubow
                     Evolution                   #NYCassandra2013
The
•   c0dez
    Polyglottany doesn’t only apply to data stores

•   Each language has its own benefit to each stack
    layer

•   Each language has its own individual benefits

•   Each language has its own development benefits



                     Big Data Revolution is an   Eric Lubow  @elubow
                     Evolution                   #NYCassandra2013
Big Data Revolution is an   Eric Lubow  @elubow
Evolution                   #NYCassandra2013
Cons
•   Redis - Can only utilize a single core. SerDe price.

•   Infobright - DELETE/UPDATEs are VERY expensive

•   Cassandra - No btree indexes or probabilistic counters

•   Mongo - Indexes must fit in memory. Forced Replica ping times

•   Python - Whitespace. Community

•   Ruby - Not high performance enough for our standards
                     Big Data Revolution is an   Eric Lubow  @elubow
                     Evolution                   #NYCassandra2013
•
    Evolution Takes Work
    Service Oriented Architecture (Internal API)

•   Data accuracy checks: visual and programmatic

•   Built framework for testing out engines (Storage,
    Queueing, etc)

•   Access to many toolsets (for all languages, DBs, Engines)




                      Big Data Revolution is an   Eric Lubow  @elubow
                      Evolution                   #NYCassandra2013
Service
  Solr
  C*
Real-time
  C*
                      Internal API


            Big Data Revolution is an   Eric Lubow  @elubow
            Evolution                   #NYCassandra2013
Path of a Packet
           Fire                                                 Solr
           Hos
                                                                 C*




                                                 Internal API
                                  Consumers
           EP



                       Queue
Internet                                                        Mong
           API
                                                                Redis

           SC                                                    IB

                  Big Data Revolution is an   Eric Lubow  @elubow
                  Evolution                   #NYCassandra2013
Architecture Distribution
    US-EAST-1a                  US-EAST-1b               US-EAST-1e

  CASSANDRA-0001            CASSANDRA-0002             CASSANDRA-0003

  CASSANDRA-0010            CASSANDRA-0011             CASSANDRA-0012

    REDIS-0001A                REDIS-0001B

   INFOBRIGHT-00                                        INFOBRIGHT-00
         01                                                   02

MONGO-SHARD-0000-A                                  MONGO-SHARD-0000-B

MONGO-SHARD-0001-B       MONGO-SHARD-0001-A

                         MONGO-SHARD-0002-B         MONGO-SHARD-0002-A

     iAPI-0001                   iAPI-0002                 iAPI-0003

                   Big Data Revolution is an   Eric Lubow  @elubow
                   Evolution                   #NYCassandra2013
The Schrute of the Problem




     Big Data Revolution is an   Eric Lubow  @elubow
     Evolution                   #NYCassandra2013
Evolving Amazon Tools            •   CloudSearch
•   Full Featured API
                                     •   Elastic Beanstalk
•   Simple Queuing Service
                                     •   Elastic MapReduce
•   Data Pipelining
                                     •   Simple Workflow Coordinator
•   OpsWorks
                                     •   S3 / Glacier
•   Cloud Formation

•   Redshift Analytics
                         Big Data Revolution is an   Eric Lubow  @elubow
                         Evolution                   #NYCassandra2013
DevOps Wizardry
•   Extensive use of AWS

•   Monitor: Nagios, Statsd, and Graphite

•   Manage: Chef, OpsWorks, cSSHx

•   Deployments




                     Big Data Revolution is an   Eric Lubow  @elubow
                     Evolution                   #NYCassandra2013
•
    Summary
    Solutions Require Evolution

•   Build, Use, and Integrate Tools

•   Abstraction

•   Distribution

•   Monitoring & Automation



                     Big Data Revolution is an   Eric Lubow  @elubow
                     Evolution                   #NYCassandra2013
Evolution Takes
Time
A revolution only lasts fifteen
years, a period which
coincides with the


          Big Data Revolution is an   Eric Lubow  @elubow
          Evolution                   #NYCassandra2013
We’re
(Ask us about Foodis an
      Big Data Revolution Coma Fridays)
                               Eric Lubow   @elubow
       Evolution                #NYCassandra2013
Questions are guaranteed in life.
Answers aren’t.
                                      Eric Lubow

                                      @elubow

                                      elubow@simplereach.co
                                      Thank

          Big Data Revolution is an
                                      you.
                                         Eric Lubow  @elubow
          Evolution                      #NYCassandra2013

Contenu connexe

En vedette

The evolution of web and big data
The evolution of web and big dataThe evolution of web and big data
The evolution of web and big dataEdward Yoon
 
Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)stasimus
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analyticsNatalino Busa
 
Introduction to Monads in Scala (1)
Introduction to Monads in Scala (1)Introduction to Monads in Scala (1)
Introduction to Monads in Scala (1)stasimus
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big DataBernard Marr
 
Big Data: Evolution? Game Changer? Definitely
Big Data: Evolution? Game Changer? DefinitelyBig Data: Evolution? Game Changer? Definitely
Big Data: Evolution? Game Changer? DefinitelyEMC
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBernard Marr
 

En vedette (9)

The evolution of web and big data
The evolution of web and big dataThe evolution of web and big data
The evolution of web and big data
 
Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
Introduction to Monads in Scala (1)
Introduction to Monads in Scala (1)Introduction to Monads in Scala (1)
Introduction to Monads in Scala (1)
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big Data
 
Big Data: Evolution? Game Changer? Definitely
Big Data: Evolution? Game Changer? DefinitelyBig Data: Evolution? Game Changer? Definitely
Big Data: Evolution? Game Changer? Definitely
 
Big Data
Big DataBig Data
Big Data
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 

Plus de DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

Plus de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Dernier

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
The Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementThe Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementNuwan Dias
 
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideIEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideHironori Washizaki
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5DianaGray10
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdfPaige Cruz
 

Dernier (20)

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
The Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementThe Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API Management
 
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideIEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
 

The Big Data Revolution is an Evolution

  • 1. The Big Data Revolution is an Eric Lubow @elubow elubow@simplereach.co
  • 2. Overvie • Evolution • SimpleReach • Data Stores / Languages • Architecture Implementation Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 3. We're in the midst of an evolution, not a revolution. Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 4. The 2 Truths Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 5. The Real Truth Even with the right tools, 80% of the work of building a big data system is acquiring and refining Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 6. 30m plays/day + 4m user ratings + 75k movies metadata + 24.4m use metadata = David Fincher + Kevin Mitch Hurwitz + Will Arnett + Spacey + British House of Jason Bateman + Arrested Cards Development Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 7. BRING IT TOGETHE Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 8. revolution evolution Insufficient New Products Capabilities Scale/Need Development & Changes Integration Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 9. Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 10. Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 11. SimpleReach • Millions of URLs per day • Over 1 billion pageviews per month • 250m events per day (~3k events/second) • Auto-scale 90-130 machines depending on traffic Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 12. HUMBLE BEGINNINGS Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 13. Scale Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 14. AND THEN... C* Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 15. Cassandra C* • Large data volume ingestion at high velocity • Really fast writes to many locations (eventual consistency) • Query by column groups within rows (slicing) • TTLs for small group aggregation • Wrote Helenus, Node.js driver for Cassandra Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 16. MongoDB Fast atomic increments (Node.js is native JSON) • Sharding • Solid ORM for Rails (MongoID) • B-Tree Indexes • Document based via JSON • TTLs for ephemeral data Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 17. Redis • Supports hundreds of thousands transactions per second • Great caching engine • Supports useful variable types like sets, sorted set, lists • Everything is guaranteed to be Memory Mapped Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 18. Infobright • Works with standard MySQL driver • Column Stores for ad-hoc analytics queries in SQL • Heavy compression of data (avg 12:1) Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 19. The • c0dez Polyglottany doesn’t only apply to data stores • Each language has its own benefit to each stack layer • Each language has its own individual benefits • Each language has its own development benefits Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 20. Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 21. Cons • Redis - Can only utilize a single core. SerDe price. • Infobright - DELETE/UPDATEs are VERY expensive • Cassandra - No btree indexes or probabilistic counters • Mongo - Indexes must fit in memory. Forced Replica ping times • Python - Whitespace. Community • Ruby - Not high performance enough for our standards Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 22. Evolution Takes Work Service Oriented Architecture (Internal API) • Data accuracy checks: visual and programmatic • Built framework for testing out engines (Storage, Queueing, etc) • Access to many toolsets (for all languages, DBs, Engines) Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 23. Service Solr C* Real-time C* Internal API Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 24. Path of a Packet Fire Solr Hos C* Internal API Consumers EP Queue Internet Mong API Redis SC IB Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 25. Architecture Distribution US-EAST-1a US-EAST-1b US-EAST-1e CASSANDRA-0001 CASSANDRA-0002 CASSANDRA-0003 CASSANDRA-0010 CASSANDRA-0011 CASSANDRA-0012 REDIS-0001A REDIS-0001B INFOBRIGHT-00 INFOBRIGHT-00 01 02 MONGO-SHARD-0000-A MONGO-SHARD-0000-B MONGO-SHARD-0001-B MONGO-SHARD-0001-A MONGO-SHARD-0002-B MONGO-SHARD-0002-A iAPI-0001 iAPI-0002 iAPI-0003 Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 26. The Schrute of the Problem Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 27. Evolving Amazon Tools • CloudSearch • Full Featured API • Elastic Beanstalk • Simple Queuing Service • Elastic MapReduce • Data Pipelining • Simple Workflow Coordinator • OpsWorks • S3 / Glacier • Cloud Formation • Redshift Analytics Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 28. DevOps Wizardry • Extensive use of AWS • Monitor: Nagios, Statsd, and Graphite • Manage: Chef, OpsWorks, cSSHx • Deployments Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 29. Summary Solutions Require Evolution • Build, Use, and Integrate Tools • Abstraction • Distribution • Monitoring & Automation Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 30. Evolution Takes Time A revolution only lasts fifteen years, a period which coincides with the Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 31. We’re (Ask us about Foodis an Big Data Revolution Coma Fridays) Eric Lubow @elubow Evolution #NYCassandra2013
  • 32. Questions are guaranteed in life. Answers aren’t. Eric Lubow @elubow elubow@simplereach.co Thank Big Data Revolution is an you. Eric Lubow @elubow Evolution #NYCassandra2013