SlideShare une entreprise Scribd logo
1  sur  14
Storing Time Series Metrics

         Implementing Multi-Dimensional
           Aggregate Composites with
           Counters For Reporting
         /*
         Joe Stein
         http://www.linkedin.com/in/charmalloc
         @allthingshadoop
         @cassandranosql
         @allthingsscala
         @charmalloc
         */

         Sample code project up at
           https://github.com/joestein/apophis



                      1
Medialets

What we do




     2
Medialets
• Largest deployment of rich media ads for mobile devices
• Over 300,000,000 devices supported
• 3-4 TB of new data every day
• Thousands of services in production
• Hundreds of Thousands of simultaneous requests per second
• Keeping track of what is and was going on when and where
  used to be difficult before we started using Cassandra
• What do I do for Medialets?
   – Chief Architect and Head of Server Engineering
     Development & Operations.




                             3
What does the schema look like?

CREATE COLUMN FAMILY ByDay                                   Column Families hold
WITH default_validation_class=CounterColumnType              your rows of data. Each
AND key_validation_class=UTF8Type AND comparator=UTF8Type;   row within each column
                                                             family will be equal to the
CREATE COLUMN FAMILY ByHour                                  time period you are
WITH default_validation_class=CounterColumnType              dealing with. So an
AND key_validation_class=UTF8Type AND comparator=UTF8Type;
                                                             “event” occurring at
                                                             10/20/2011 11:22:41 will
CREATE COLUMN FAMILY ByMinute
WITH default_validation_class=CounterColumnType              become 4 rows
AND key_validation_class=UTF8Type AND comparator=UTF8Type;
                                                             BySecond = 20111020112141
                                                             ByMinute= 201110201122
CREATE COLUMN FAMILY BySecond                                ByHour= 2011102011
WITH default_validation_class=CounterColumnType              ByDay=20111020
AND key_validation_class=UTF8Type AND comparator=UTF8Type;




                                            4
Why multiple column families?
http://www.datastax.com/docs/1.0/configuration/storage_configuration




                                 5
Ok now how do we keep track of what?
             Lets setup a quick example data set first

• The Animal Logger – fictitious logger of the world around us
  – animal
  – food
  – sound
  – home

• YYYY/MM/DD HH:MM:SS GET /sample?animal=X&food=Y
  – animal=duck&sound=quack&home=pond
  – animal=cat&sound=meow&home=house
  – animal=cat&sound=meow&home=street
  – animal=pigeon&sound=coo&home=street



                                 6
Now what?
      Columns babe, columns make your aggregates work

• Setup your code for columns you want aggregated
  – animal=
  – animal#sound=
  – animal#home=
  – animal#food=
  – animal#food#home=
  – animal#food#sound=
  – animal#sound#home=
  – food#sound=
  – home#food=
  – sound#animal=



                             7
Inserting data
                   Column aggregate concatenated with values
      2011/10/29 11:22:43 GET /sample?animal=duck&home=pond&sound=quack
•   mutator.insertCounter(“20111029112243, “BySecond”,
    HFactory.createCounterColumn(“animal#sound#home=duck#quack#pond”), 1))
•   mutator.insertCounter(“20111029112243, “BySecond”,
    HFactory.createCounterColumn(“animal#home=duck#pond”), 1))
•   mutator.insertCounter(“20111029112243, “BySecond”, HFactory.createCounterColumn(“animal=duck”), 1))

•   mutator.insertCounter(“201110291122, “ByMinute”,
    HFactory.createCounterColumn(“animal#sound#home=duck#quack#pond”), 1))
•   mutator.insertCounter(“201110291122, “ByMinute”,
    HFactory.createCounterColumn(“animal#home=duck#pond”), 1))
•   mutator.insertCounter(“201110291122, “ByMinute”, HFactory.createCounterColumn(“animal=duck”), 1))

•   mutator.insertCounter(“2011102911, “ByHour”, HFactory.createCounterColumn(“animal#home=duck#pond”),
    1))
•   mutator.insertCounter(“2011102911, “ByHour”,
    HFactory.createCounterColumn(“animal#sound#home=duck#quack#pond”), 1))
•   mutator.insertCounter(“2011102911, “ByHour”, HFactory.createCounterColumn(“animal=duck”), 1))

•   mutator.insertCounter(“20111029, “ByDay”,
    HFactory.createCounterColumn(“animal#sound#home=duck#quack#pond”), 1))
•   mutator.insertCounter(“20111029, “ByDay”, HFactory.createCounterColumn(“animal#home=duck#pond”), 1))
•   mutator.insertCounter(“20111029, “ByDay”, HFactory.createCounterColumn(“animal=duck”), 1))




                                                     8
The implementation, its functional
       kind of like “its electric” but without the boogie woogieoogie

def r(columnName: String): Unit = {
aggregateKeys.foreach{tuple:(ColumnFamily, String) => {
val(columnFamily,row) = tuple
         if (row !=null &&row.size> 0)
                   rows add (columnFamily -> row has columnName inc) //increment the counter
         }
  }
}

def ccAnimal(c: (String) => Unit) = {
c(aggregateColumnNames("Animal") + animal)
}

//rows we are going to write too
aggregateKeys(KEYSPACE  "ByDay") = day
aggregateKeys(KEYSPACE  "ByHour") = hour
aggregateKeys(KEYSPACE  "ByMinute") = minute

aggregateColumnNames("Animal") = "animal=”

ccAnimal(r)


                                                 9
Retrieving Data
                      MultigetSliceCounterQuery

•   setColumnFamily(“ByDay”)
•   setKeys("20111029")
•   setRange(”animal#sound=","animal#sound=~",false,1000)
•   We will get all animals and all of their sounds and counts for
    that day

• setRange(”sound#animal=purr#",”sound#animal=purr#~",false
  ,1000)
• We will get all animals that purr and their count


• What is with the tilde?


                                  10
Sort for success
Not magic, just Cassandra




           11
What it looks like in Cassandra
valsample1: String = "10/12/2011 11:22:33   GET   /sample?animal=duck&sound=quack&home=pond”
valsample4: String = "10/12/2011 11:22:33   GET   /sample?animal=cat&sound=purr&home=house”
valsample5: String = "10/12/2011 11:22:33   GET   /sample?animal=lion&sound=purr&home=zoo”
valsample6: String = "10/12/2011 11:22:33   GET   /sample?animal=dog&sound=woof&home=street"

[default@FixtureTestApophis] get ByDay[20111012];
=> (counter=animal#sound#home=cat#purr#house, value=70)
=> (counter=animal#sound#home=dog#woof#street, value=20)
=> (counter=animal#sound#home=duck#quack#pond, value=98)
=> (counter=animal#sound#home=lion#purr#zoo, value=70)
=> (counter=animal#sound=cat#purr, value=70)
=> (counter=animal#sound=dog#woof, value=20)
=> (counter=animal#sound=duck#quack, value=98)
=> (counter=animal#sound=lion#purr, value=70)
=> (counter=animal=cat, value=70)
=> (counter=animal=dog, value=20)
=> (counter=animal=duck, value=98)
=> (counter=animal=lion, value=70)
=> (counter=sound#animal=purr#cat, value=42)
=> (counter=sound#animal=purr#lion, value=42)
=> (counter=sound#animal=quack#duck, value=43)
=> (counter=sound#animal=woof#dog, value=20)
   (counter=total=, value=258)

https://github.com/joestein/apophis


                                                   12
A few more things about retrieving data

• You need to start backwards from here.
• If you want to-do things adhoc then map/reduce is better
• Sometimes more rows is better allowing more nodes to-dowork
   – If you need to look at 100,000 metrics it is better to pull this out
     of 100 rows than out of 1
   – Don’t be afraid to make CF and composite keys out of Time+
     Aggregate data
       • 20111023#animal=duck
       • This could be the row that holds ALL of the animal duck
         information for that day, if you want to look at 100 animals at
         once with 1000 metrics for each per time period, this is the
         way to go




                                    13
Q&A



Medialets
The rich media
adplatform for mobile.
            connect@medialets.com
            www.medialets.com/showcase




      14

Contenu connexe

Similaire à Storing Time Series Metrics With Cassandra and Composite Columns

jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011Joe Stein
 
Improve Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday AdminImprove Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday AdminEve Lyons-Berg
 
Improve Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday AdminImprove Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday AdminAggregage
 
Architectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based SoftwareArchitectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based SoftwarePooyan Jamshidi
 
Westie - Um Framework canino em prol do Zabbix
Westie - Um Framework canino em prol do ZabbixWestie - Um Framework canino em prol do Zabbix
Westie - Um Framework canino em prol do ZabbixLuiz Sales
 
Very basic functional design patterns
Very basic functional design patternsVery basic functional design patterns
Very basic functional design patternsTomasz Kowal
 
Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.David Tollmyr
 
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...Daniel Katz
 
It Probably Works - QCon 2015
It Probably Works - QCon 2015It Probably Works - QCon 2015
It Probably Works - QCon 2015Fastly
 
Pragmatic Patterns (and Pitfalls) for Event Streaming in Brownfield Environme...
Pragmatic Patterns (and Pitfalls) for Event Streaming in Brownfield Environme...Pragmatic Patterns (and Pitfalls) for Event Streaming in Brownfield Environme...
Pragmatic Patterns (and Pitfalls) for Event Streaming in Brownfield Environme...HostedbyConfluent
 
Stress Testing at Twitter: a tale of New Year Eves
Stress Testing at Twitter: a tale of New Year EvesStress Testing at Twitter: a tale of New Year Eves
Stress Testing at Twitter: a tale of New Year EvesHerval Freire
 
Performance Tuning of .NET Application
Performance Tuning of .NET ApplicationPerformance Tuning of .NET Application
Performance Tuning of .NET ApplicationMainul Islam, CSM®
 
Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016
Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016
Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016Phil Leggetter
 
Xomia_20220602.pptx
Xomia_20220602.pptxXomia_20220602.pptx
Xomia_20220602.pptxLonghow Lam
 
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at TwitterScylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at TwitterScyllaDB
 
Exact Real Arithmetic for Tcl
Exact Real Arithmetic for TclExact Real Arithmetic for Tcl
Exact Real Arithmetic for Tclke9tv
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processingnathanmarz
 
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion Future of Data Meetup
 

Similaire à Storing Time Series Metrics With Cassandra and Composite Columns (20)

jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011
 
Improve Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday AdminImprove Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday Admin
 
Improve Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday AdminImprove Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday Admin
 
Architectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based SoftwareArchitectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based Software
 
Eugene goostman the bot
Eugene goostman the botEugene goostman the bot
Eugene goostman the bot
 
Westie - Um Framework canino em prol do Zabbix
Westie - Um Framework canino em prol do ZabbixWestie - Um Framework canino em prol do Zabbix
Westie - Um Framework canino em prol do Zabbix
 
Very basic functional design patterns
Very basic functional design patternsVery basic functional design patterns
Very basic functional design patterns
 
Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.
 
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...
 
It Probably Works - QCon 2015
It Probably Works - QCon 2015It Probably Works - QCon 2015
It Probably Works - QCon 2015
 
Pragmatic Patterns (and Pitfalls) for Event Streaming in Brownfield Environme...
Pragmatic Patterns (and Pitfalls) for Event Streaming in Brownfield Environme...Pragmatic Patterns (and Pitfalls) for Event Streaming in Brownfield Environme...
Pragmatic Patterns (and Pitfalls) for Event Streaming in Brownfield Environme...
 
700 Tons of Code Later
700 Tons of Code Later700 Tons of Code Later
700 Tons of Code Later
 
Stress Testing at Twitter: a tale of New Year Eves
Stress Testing at Twitter: a tale of New Year EvesStress Testing at Twitter: a tale of New Year Eves
Stress Testing at Twitter: a tale of New Year Eves
 
Performance Tuning of .NET Application
Performance Tuning of .NET ApplicationPerformance Tuning of .NET Application
Performance Tuning of .NET Application
 
Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016
Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016
Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016
 
Xomia_20220602.pptx
Xomia_20220602.pptxXomia_20220602.pptx
Xomia_20220602.pptx
 
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at TwitterScylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter
 
Exact Real Arithmetic for Tcl
Exact Real Arithmetic for TclExact Real Arithmetic for Tcl
Exact Real Arithmetic for Tcl
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processing
 
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
 

Plus de Joe Stein

Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogJoe Stein
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1Joe Stein
 
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosGet started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosJoe Stein
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache MesosJoe Stein
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraJoe Stein
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaJoe Stein
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache MesosJoe Stein
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Joe Stein
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on MesosJoe Stein
 
Making Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosMaking Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosJoe Stein
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloJoe Stein
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosJoe Stein
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosJoe Stein
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaJoe Stein
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache KafkaJoe Stein
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache KafkaJoe Stein
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache MesosJoe Stein
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaJoe Stein
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaJoe Stein
 

Plus de Joe Stein (20)

Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosGet started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache Mesos
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache Mesos
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on Mesos
 
Making Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosMaking Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache Mesos
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache Mesos
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on Mesos
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 

Dernier

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 

Dernier (20)

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 

Storing Time Series Metrics With Cassandra and Composite Columns

  • 1. Storing Time Series Metrics Implementing Multi-Dimensional Aggregate Composites with Counters For Reporting /* Joe Stein http://www.linkedin.com/in/charmalloc @allthingshadoop @cassandranosql @allthingsscala @charmalloc */ Sample code project up at https://github.com/joestein/apophis 1
  • 3. Medialets • Largest deployment of rich media ads for mobile devices • Over 300,000,000 devices supported • 3-4 TB of new data every day • Thousands of services in production • Hundreds of Thousands of simultaneous requests per second • Keeping track of what is and was going on when and where used to be difficult before we started using Cassandra • What do I do for Medialets? – Chief Architect and Head of Server Engineering Development & Operations. 3
  • 4. What does the schema look like? CREATE COLUMN FAMILY ByDay Column Families hold WITH default_validation_class=CounterColumnType your rows of data. Each AND key_validation_class=UTF8Type AND comparator=UTF8Type; row within each column family will be equal to the CREATE COLUMN FAMILY ByHour time period you are WITH default_validation_class=CounterColumnType dealing with. So an AND key_validation_class=UTF8Type AND comparator=UTF8Type; “event” occurring at 10/20/2011 11:22:41 will CREATE COLUMN FAMILY ByMinute WITH default_validation_class=CounterColumnType become 4 rows AND key_validation_class=UTF8Type AND comparator=UTF8Type; BySecond = 20111020112141 ByMinute= 201110201122 CREATE COLUMN FAMILY BySecond ByHour= 2011102011 WITH default_validation_class=CounterColumnType ByDay=20111020 AND key_validation_class=UTF8Type AND comparator=UTF8Type; 4
  • 5. Why multiple column families? http://www.datastax.com/docs/1.0/configuration/storage_configuration 5
  • 6. Ok now how do we keep track of what? Lets setup a quick example data set first • The Animal Logger – fictitious logger of the world around us – animal – food – sound – home • YYYY/MM/DD HH:MM:SS GET /sample?animal=X&food=Y – animal=duck&sound=quack&home=pond – animal=cat&sound=meow&home=house – animal=cat&sound=meow&home=street – animal=pigeon&sound=coo&home=street 6
  • 7. Now what? Columns babe, columns make your aggregates work • Setup your code for columns you want aggregated – animal= – animal#sound= – animal#home= – animal#food= – animal#food#home= – animal#food#sound= – animal#sound#home= – food#sound= – home#food= – sound#animal= 7
  • 8. Inserting data Column aggregate concatenated with values 2011/10/29 11:22:43 GET /sample?animal=duck&home=pond&sound=quack • mutator.insertCounter(“20111029112243, “BySecond”, HFactory.createCounterColumn(“animal#sound#home=duck#quack#pond”), 1)) • mutator.insertCounter(“20111029112243, “BySecond”, HFactory.createCounterColumn(“animal#home=duck#pond”), 1)) • mutator.insertCounter(“20111029112243, “BySecond”, HFactory.createCounterColumn(“animal=duck”), 1)) • mutator.insertCounter(“201110291122, “ByMinute”, HFactory.createCounterColumn(“animal#sound#home=duck#quack#pond”), 1)) • mutator.insertCounter(“201110291122, “ByMinute”, HFactory.createCounterColumn(“animal#home=duck#pond”), 1)) • mutator.insertCounter(“201110291122, “ByMinute”, HFactory.createCounterColumn(“animal=duck”), 1)) • mutator.insertCounter(“2011102911, “ByHour”, HFactory.createCounterColumn(“animal#home=duck#pond”), 1)) • mutator.insertCounter(“2011102911, “ByHour”, HFactory.createCounterColumn(“animal#sound#home=duck#quack#pond”), 1)) • mutator.insertCounter(“2011102911, “ByHour”, HFactory.createCounterColumn(“animal=duck”), 1)) • mutator.insertCounter(“20111029, “ByDay”, HFactory.createCounterColumn(“animal#sound#home=duck#quack#pond”), 1)) • mutator.insertCounter(“20111029, “ByDay”, HFactory.createCounterColumn(“animal#home=duck#pond”), 1)) • mutator.insertCounter(“20111029, “ByDay”, HFactory.createCounterColumn(“animal=duck”), 1)) 8
  • 9. The implementation, its functional kind of like “its electric” but without the boogie woogieoogie def r(columnName: String): Unit = { aggregateKeys.foreach{tuple:(ColumnFamily, String) => { val(columnFamily,row) = tuple if (row !=null &&row.size> 0) rows add (columnFamily -> row has columnName inc) //increment the counter } } } def ccAnimal(c: (String) => Unit) = { c(aggregateColumnNames("Animal") + animal) } //rows we are going to write too aggregateKeys(KEYSPACE "ByDay") = day aggregateKeys(KEYSPACE "ByHour") = hour aggregateKeys(KEYSPACE "ByMinute") = minute aggregateColumnNames("Animal") = "animal=” ccAnimal(r) 9
  • 10. Retrieving Data MultigetSliceCounterQuery • setColumnFamily(“ByDay”) • setKeys("20111029") • setRange(”animal#sound=","animal#sound=~",false,1000) • We will get all animals and all of their sounds and counts for that day • setRange(”sound#animal=purr#",”sound#animal=purr#~",false ,1000) • We will get all animals that purr and their count • What is with the tilde? 10
  • 11. Sort for success Not magic, just Cassandra 11
  • 12. What it looks like in Cassandra valsample1: String = "10/12/2011 11:22:33 GET /sample?animal=duck&sound=quack&home=pond” valsample4: String = "10/12/2011 11:22:33 GET /sample?animal=cat&sound=purr&home=house” valsample5: String = "10/12/2011 11:22:33 GET /sample?animal=lion&sound=purr&home=zoo” valsample6: String = "10/12/2011 11:22:33 GET /sample?animal=dog&sound=woof&home=street" [default@FixtureTestApophis] get ByDay[20111012]; => (counter=animal#sound#home=cat#purr#house, value=70) => (counter=animal#sound#home=dog#woof#street, value=20) => (counter=animal#sound#home=duck#quack#pond, value=98) => (counter=animal#sound#home=lion#purr#zoo, value=70) => (counter=animal#sound=cat#purr, value=70) => (counter=animal#sound=dog#woof, value=20) => (counter=animal#sound=duck#quack, value=98) => (counter=animal#sound=lion#purr, value=70) => (counter=animal=cat, value=70) => (counter=animal=dog, value=20) => (counter=animal=duck, value=98) => (counter=animal=lion, value=70) => (counter=sound#animal=purr#cat, value=42) => (counter=sound#animal=purr#lion, value=42) => (counter=sound#animal=quack#duck, value=43) => (counter=sound#animal=woof#dog, value=20) (counter=total=, value=258) https://github.com/joestein/apophis 12
  • 13. A few more things about retrieving data • You need to start backwards from here. • If you want to-do things adhoc then map/reduce is better • Sometimes more rows is better allowing more nodes to-dowork – If you need to look at 100,000 metrics it is better to pull this out of 100 rows than out of 1 – Don’t be afraid to make CF and composite keys out of Time+ Aggregate data • 20111023#animal=duck • This could be the row that holds ALL of the animal duck information for that day, if you want to look at 100 animals at once with 1000 metrics for each per time period, this is the way to go 13
  • 14. Q&A Medialets The rich media adplatform for mobile. connect@medialets.com www.medialets.com/showcase 14