SlideShare a Scribd company logo
1 of 32
Download to read offline
The Big Data
Revolution is an
               Eric Lubow

               @elubow

               elubow@simplereach.co
Overvie
•   Evolution

•   SimpleReach

•   Data Stores / Languages

•   Architecture Implementation

                  Big Data Revolution is an   Eric Lubow  @elubow
                  Evolution                   #NYCassandra2013
We're in the midst of an
evolution, not a revolution.
       Big Data Revolution is an   Eric Lubow  @elubow
       Evolution                   #NYCassandra2013
The 2 Truths




      Big Data Revolution is an   Eric Lubow  @elubow
      Evolution                   #NYCassandra2013
The Real Truth
Even with the right tools, 80% of
the work of building a big data
system is acquiring and refining

          Big Data Revolution is an   Eric Lubow  @elubow
          Evolution                   #NYCassandra2013
30m plays/day + 4m user ratings + 75k movies metadata + 24.4m use
metadata =




    David Fincher + Kevin                   Mitch Hurwitz + Will Arnett +
  Spacey + British House of                     Jason Bateman + Arrested
           Cards                                      Development
                    Big Data Revolution is an     Eric Lubow  @elubow
                    Evolution                     #NYCassandra2013
BRING IT
TOGETHE

       Big Data Revolution is an   Eric Lubow  @elubow
       Evolution                   #NYCassandra2013
revolution                                          evolution
  Insufficient
                                                        New Products
  Capabilities



  Scale/Need                                           Development &
   Changes                                               Integration




                 Big Data Revolution is an   Eric Lubow  @elubow
                 Evolution                   #NYCassandra2013
Big Data Revolution is an   Eric Lubow  @elubow
Evolution                   #NYCassandra2013
Big Data Revolution is an   Eric Lubow  @elubow
Evolution                   #NYCassandra2013
SimpleReach
•   Millions of URLs per day

•   Over 1 billion pageviews per month

•   250m events per day (~3k events/second)

•   Auto-scale 90-130 machines depending on traffic


                   Big Data Revolution is an   Eric Lubow  @elubow
                   Evolution                   #NYCassandra2013
HUMBLE BEGINNINGS




  Big Data Revolution is an   Eric Lubow  @elubow
  Evolution                   #NYCassandra2013
Scale


        Big Data Revolution is an   Eric Lubow  @elubow
        Evolution                   #NYCassandra2013
AND THEN...



 C*


Big Data Revolution is an   Eric Lubow  @elubow
Evolution                   #NYCassandra2013
Cassandra                                                           C*
•   Large data volume ingestion at high velocity

•   Really fast writes to many locations (eventual
    consistency)

•   Query by column groups within rows (slicing)

•   TTLs for small group aggregation

•   Wrote Helenus, Node.js driver for Cassandra

                      Big Data Revolution is an   Eric Lubow  @elubow
                      Evolution                   #NYCassandra2013
•
    MongoDB
    Fast atomic increments (Node.js is native JSON)

•   Sharding

•   Solid ORM for Rails (MongoID)

•   B-Tree Indexes

•   Document based via JSON

•   TTLs for ephemeral data

                      Big Data Revolution is an   Eric Lubow  @elubow
                      Evolution                   #NYCassandra2013
Redis
•   Supports hundreds of thousands transactions per
    second

•   Great caching engine

•   Supports useful variable types like sets, sorted set,
    lists

•   Everything is guaranteed to be Memory Mapped

                      Big Data Revolution is an   Eric Lubow  @elubow
                      Evolution                   #NYCassandra2013
Infobright
•   Works with standard MySQL driver

•   Column Stores for ad-hoc analytics queries
    in SQL

•   Heavy compression of data (avg 12:1)




                     Big Data Revolution is an   Eric Lubow  @elubow
                     Evolution                   #NYCassandra2013
The
•   c0dez
    Polyglottany doesn’t only apply to data stores

•   Each language has its own benefit to each stack
    layer

•   Each language has its own individual benefits

•   Each language has its own development benefits



                     Big Data Revolution is an   Eric Lubow  @elubow
                     Evolution                   #NYCassandra2013
Big Data Revolution is an   Eric Lubow  @elubow
Evolution                   #NYCassandra2013
Cons
•   Redis - Can only utilize a single core. SerDe price.

•   Infobright - DELETE/UPDATEs are VERY expensive

•   Cassandra - No btree indexes or probabilistic counters

•   Mongo - Indexes must fit in memory. Forced Replica ping times

•   Python - Whitespace. Community

•   Ruby - Not high performance enough for our standards
                     Big Data Revolution is an   Eric Lubow  @elubow
                     Evolution                   #NYCassandra2013
•
    Evolution Takes Work
    Service Oriented Architecture (Internal API)

•   Data accuracy checks: visual and programmatic

•   Built framework for testing out engines (Storage,
    Queueing, etc)

•   Access to many toolsets (for all languages, DBs, Engines)




                      Big Data Revolution is an   Eric Lubow  @elubow
                      Evolution                   #NYCassandra2013
Service
  Solr
  C*
Real-time
  C*
                      Internal API


            Big Data Revolution is an   Eric Lubow  @elubow
            Evolution                   #NYCassandra2013
Path of a Packet
           Fire                                                 Solr
           Hos
                                                                 C*




                                                 Internal API
                                  Consumers
           EP



                       Queue
Internet                                                        Mong
           API
                                                                Redis

           SC                                                    IB

                  Big Data Revolution is an   Eric Lubow  @elubow
                  Evolution                   #NYCassandra2013
Architecture Distribution
    US-EAST-1a                  US-EAST-1b               US-EAST-1e

  CASSANDRA-0001            CASSANDRA-0002             CASSANDRA-0003

  CASSANDRA-0010            CASSANDRA-0011             CASSANDRA-0012

    REDIS-0001A                REDIS-0001B

   INFOBRIGHT-00                                        INFOBRIGHT-00
         01                                                   02

MONGO-SHARD-0000-A                                  MONGO-SHARD-0000-B

MONGO-SHARD-0001-B       MONGO-SHARD-0001-A

                         MONGO-SHARD-0002-B         MONGO-SHARD-0002-A

     iAPI-0001                   iAPI-0002                 iAPI-0003

                   Big Data Revolution is an   Eric Lubow  @elubow
                   Evolution                   #NYCassandra2013
The Schrute of the Problem




     Big Data Revolution is an   Eric Lubow  @elubow
     Evolution                   #NYCassandra2013
Evolving Amazon Tools            •   CloudSearch
•   Full Featured API
                                     •   Elastic Beanstalk
•   Simple Queuing Service
                                     •   Elastic MapReduce
•   Data Pipelining
                                     •   Simple Workflow Coordinator
•   OpsWorks
                                     •   S3 / Glacier
•   Cloud Formation

•   Redshift Analytics
                         Big Data Revolution is an   Eric Lubow  @elubow
                         Evolution                   #NYCassandra2013
DevOps Wizardry
•   Extensive use of AWS

•   Monitor: Nagios, Statsd, and Graphite

•   Manage: Chef, OpsWorks, cSSHx

•   Deployments




                     Big Data Revolution is an   Eric Lubow  @elubow
                     Evolution                   #NYCassandra2013
•
    Summary
    Solutions Require Evolution

•   Build, Use, and Integrate Tools

•   Abstraction

•   Distribution

•   Monitoring & Automation



                     Big Data Revolution is an   Eric Lubow  @elubow
                     Evolution                   #NYCassandra2013
Evolution Takes
Time
A revolution only lasts fifteen
years, a period which
coincides with the


          Big Data Revolution is an   Eric Lubow  @elubow
          Evolution                   #NYCassandra2013
We’re
(Ask us about Foodis an
      Big Data Revolution Coma Fridays)
                               Eric Lubow   @elubow
       Evolution                #NYCassandra2013
Questions are guaranteed in life.
Answers aren’t.
                                      Eric Lubow

                                      @elubow

                                      elubow@simplereach.co
                                      Thank

          Big Data Revolution is an
                                      you.
                                         Eric Lubow  @elubow
          Evolution                      #NYCassandra2013

More Related Content

Viewers also liked

The evolution of web and big data
The evolution of web and big dataThe evolution of web and big data
The evolution of web and big data
Edward Yoon
 
Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)
stasimus
 
Introduction to Monads in Scala (1)
Introduction to Monads in Scala (1)Introduction to Monads in Scala (1)
Introduction to Monads in Scala (1)
stasimus
 

Viewers also liked (9)

The evolution of web and big data
The evolution of web and big dataThe evolution of web and big data
The evolution of web and big data
 
Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
Introduction to Monads in Scala (1)
Introduction to Monads in Scala (1)Introduction to Monads in Scala (1)
Introduction to Monads in Scala (1)
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big Data
 
Big Data: Evolution? Game Changer? Definitely
Big Data: Evolution? Game Changer? DefinitelyBig Data: Evolution? Game Changer? Definitely
Big Data: Evolution? Game Changer? Definitely
 
Big Data
Big DataBig Data
Big Data
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 

More from DataStax Academy

Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

The Big Data Revolution is an Evolution

  • 1. The Big Data Revolution is an Eric Lubow @elubow elubow@simplereach.co
  • 2. Overvie • Evolution • SimpleReach • Data Stores / Languages • Architecture Implementation Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 3. We're in the midst of an evolution, not a revolution. Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 4. The 2 Truths Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 5. The Real Truth Even with the right tools, 80% of the work of building a big data system is acquiring and refining Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 6. 30m plays/day + 4m user ratings + 75k movies metadata + 24.4m use metadata = David Fincher + Kevin Mitch Hurwitz + Will Arnett + Spacey + British House of Jason Bateman + Arrested Cards Development Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 7. BRING IT TOGETHE Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 8. revolution evolution Insufficient New Products Capabilities Scale/Need Development & Changes Integration Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 9. Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 10. Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 11. SimpleReach • Millions of URLs per day • Over 1 billion pageviews per month • 250m events per day (~3k events/second) • Auto-scale 90-130 machines depending on traffic Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 12. HUMBLE BEGINNINGS Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 13. Scale Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 14. AND THEN... C* Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 15. Cassandra C* • Large data volume ingestion at high velocity • Really fast writes to many locations (eventual consistency) • Query by column groups within rows (slicing) • TTLs for small group aggregation • Wrote Helenus, Node.js driver for Cassandra Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 16. MongoDB Fast atomic increments (Node.js is native JSON) • Sharding • Solid ORM for Rails (MongoID) • B-Tree Indexes • Document based via JSON • TTLs for ephemeral data Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 17. Redis • Supports hundreds of thousands transactions per second • Great caching engine • Supports useful variable types like sets, sorted set, lists • Everything is guaranteed to be Memory Mapped Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 18. Infobright • Works with standard MySQL driver • Column Stores for ad-hoc analytics queries in SQL • Heavy compression of data (avg 12:1) Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 19. The • c0dez Polyglottany doesn’t only apply to data stores • Each language has its own benefit to each stack layer • Each language has its own individual benefits • Each language has its own development benefits Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 20. Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 21. Cons • Redis - Can only utilize a single core. SerDe price. • Infobright - DELETE/UPDATEs are VERY expensive • Cassandra - No btree indexes or probabilistic counters • Mongo - Indexes must fit in memory. Forced Replica ping times • Python - Whitespace. Community • Ruby - Not high performance enough for our standards Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 22. Evolution Takes Work Service Oriented Architecture (Internal API) • Data accuracy checks: visual and programmatic • Built framework for testing out engines (Storage, Queueing, etc) • Access to many toolsets (for all languages, DBs, Engines) Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 23. Service Solr C* Real-time C* Internal API Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 24. Path of a Packet Fire Solr Hos C* Internal API Consumers EP Queue Internet Mong API Redis SC IB Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 25. Architecture Distribution US-EAST-1a US-EAST-1b US-EAST-1e CASSANDRA-0001 CASSANDRA-0002 CASSANDRA-0003 CASSANDRA-0010 CASSANDRA-0011 CASSANDRA-0012 REDIS-0001A REDIS-0001B INFOBRIGHT-00 INFOBRIGHT-00 01 02 MONGO-SHARD-0000-A MONGO-SHARD-0000-B MONGO-SHARD-0001-B MONGO-SHARD-0001-A MONGO-SHARD-0002-B MONGO-SHARD-0002-A iAPI-0001 iAPI-0002 iAPI-0003 Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 26. The Schrute of the Problem Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 27. Evolving Amazon Tools • CloudSearch • Full Featured API • Elastic Beanstalk • Simple Queuing Service • Elastic MapReduce • Data Pipelining • Simple Workflow Coordinator • OpsWorks • S3 / Glacier • Cloud Formation • Redshift Analytics Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 28. DevOps Wizardry • Extensive use of AWS • Monitor: Nagios, Statsd, and Graphite • Manage: Chef, OpsWorks, cSSHx • Deployments Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 29. Summary Solutions Require Evolution • Build, Use, and Integrate Tools • Abstraction • Distribution • Monitoring & Automation Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 30. Evolution Takes Time A revolution only lasts fifteen years, a period which coincides with the Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 31. We’re (Ask us about Foodis an Big Data Revolution Coma Fridays) Eric Lubow @elubow Evolution #NYCassandra2013
  • 32. Questions are guaranteed in life. Answers aren’t. Eric Lubow @elubow elubow@simplereach.co Thank Big Data Revolution is an you. Eric Lubow @elubow Evolution #NYCassandra2013