SlideShare a Scribd company logo
1 of 41
Event counting and statistics at Finn.no
Alexei Bakanov
teamcore@finn.no
Finn.no Cassandra usecases
● Event counting and statistics
● FINNboks messaging system
● IP-Geo
● Your Last searches
● JDBC-activity monitoring
● Scam-control activity log
← Today
Usecase: Event counting and statistics
An Event:
● Anything that has a timestamp on it
(not continious)
Event example: Pageview of an ad
Event example: «Annonsegalleri» banner is shown
Event example: «Lagret søk» email is sent for an ad
Usecase: Event counting and statistics
Usecase: Event counting and statistics
(testdata)
Usecase: Event counting and statistics
“Stone age” system Old system Current system
Counter updates in
Web-app.
Storing in RDBMS.
Raw Event data in C*
(ByteOrderedPartitioner)
Hadoop jobs to rollup
event data
Aggregated data in C*
(RandomPartitioner +
SuperColumns)
CQL3 for raw Event Data
Composite columns for
Aggregated data
Usecase: Event counting and statistics. «Stone age» system
«Stone age» architecture:
● Syncronous counter updates
● Updating counters inside a web-app
● Using Finn's main relational database as a
storage for counters
Usecase: Event counting and statistics. «Stone age» system
RDBS
Web server Web server Web server. . .
++count
++count
++count
Usecase: Event counting and statistics. «Stone age» system
Pros:
● Real time numbers
Cons:
● High DB commit-log write times during peak-
hours. Overall Finn performance degradation.
● No interaval based statistics like daily counters,
just totals
Usecase: Event counting and statistics. «Stone age» system
Very long time ago...
Usecase: Event counting and statistics. Old system
Old architecture:
● Asyncronous event logging via Scribe
● Saving event data in a raw unnormalized format
to C*
● Hadoop jobs to sum up event counters over time
periods
● Serving aggregated statistics from C*
Usecase: Event counting and statistics. Old architecture
Pros:
● Less load on main RDBMS
● Intervall based statistics
● Ability to re-aggregate data and get new insights
● Better Command-Query separation
Cons:
● Not real-time, although jobs run every minute
Usecase: Event counting and statistics.
«Oppdrag»'s interval-based statistics
(testdata)
Usecase: Event counting and statistics. Grouped by subCategory
Usecase: Event counting and statistics. Grouped by client types
Usecase: Event counting and statistics. Grouped by referrer domain
Usecase: Event counting and statistics. Repeated views by a person
Usecase: Event counting and statistics. Old system. Data flow
webserver1.finn.no
Tomcat
Scribe daemon
async event logging
cassandra1.finn.no
Cassandra
(ByteOrderedPartitioner)
Raw event data
webserver2.finn.no
…..
webserver3.finn.no
…..
Scribe daemon
Cassandra
(RandomPartitioner)
Aggregated data
Hadoop aggregation job
Usecase: Event counting and statistics. Old system. Event logging
Event bean (Thrift IDL):
struct Event {
/** Event domain. Typical AD, CV, Oppdrag */
1: required string type;
/** Event name. Typical PageView, EmailSent */
2: optional string subCategory;
/** Arbitrary key-value map with extra info like finnkode or userid */
3: required map<string, string> values;
}
● Event bean-> binary-> base64 + timestamp-> scribe message
Usecase: Event counting and statistics. Old system. Data flow
webserver1.finn.no
Tomcat
Scribe daemon
async event logging
cassandra1.finn.no
Cassandra
(ByteOrderedPartitioner)
Raw event data
webserver2.finn.no
…..
webserver3.finn.no
…..
Scribe daemon
Cassandra
(RandomPartitioner)
Aggregated data
Hadoop aggregation job
Usecase: Event counting and statistics. Old system. Raw event data
● Ca 1500k events per sec to log in peak times
● ByteOrderedPartitioner for storing raw data. TTL = 3mnth
(picture from datastax.com)
Timestamp Eventbean
1369164000 0x1f0562bda6...
1369164001 0x364dd9a5a6...
1369164002 0x4d96508da6...
1369164003 0x64dec775a6...
Sequential rowkeys = Hadoop friendly
get_range_slices(1369164000, 1369164003) to
get data for Hadoop splits
Usecase: Event counting and statistics. Old system. Data flow
webserver1.finn.no
Tomcat
Scribe daemon
async event logging
cassandra1.finn.no
Cassandra
(ByteOrderedPartitioner)
Raw event data
webserver2.finn.no
…..
webserver3.finn.no
…..
Scribe daemon
Cassandra
(RandomPartitioner)
Aggregated data
Hadoop aggregation job
Usecase: Event counting and statistics. Old system. Aggregation
Hadoop jobs:
● Sum up events for each finnkode grouped by
subCategory.
map(events):
for event in events:
finnkode = event.getValues().get(“ad.id”)
subcategoryCount = (event.subcategory, 1)
emit(finnkode, subcategoryCount)
reduce(finnkode, subcategoryCountList):
subcategoryTotals = {}
for subcategory, count in subcategoryCountList:
subcategoryTotals[subcategory] += count
for subcategory, count in subcategoryTotals.iteritems():
incrementCassandraCounter(finnkode, subcategory, count, HOUR_somehour)
incrementCassandraCounter(finnkode, subcategory, count, DAY_someday)
incrementCassandraCounter(finnkode, subcategory, count, TOTAL)
Usecase: Event counting and statistics. Old system. Data flow
webserver1.finn.no
Tomcat
Scribe daemon
async event logging
cassandra1.finn.no
Cassandra
(ByteOrderedPartitioner)
Raw event data
webserver2.finn.no
…..
webserver3.finn.no
…..
Scribe daemon
Cassandra
(RandomPartitioner)
Aggregated data
Hadoop aggregation job
Usecase: Event counting and statistics. Old system. Aggregated data
SuperColumns for interval-based counters
HOUR_2013_05_30_18 DAY_2013_05_30 TOTAL
PageView Email PageView Email PageView Email
3706119 10 2 20 4 50 5
3706052 23 3 102 4 234 10
Min. time resolution QUARTER_HOUR
No TTL on Counter columns, got really wide and slow
http://www.makingitscale.com/2012/scaling-cassandra-counter-columns.html
Usecase: Event counting and statistics. Current system. Aggregated data
HOUR_2013_05
_30_18:
PageView
HOUR_2013_05
_30_18:
Email
DAY_2013_05_30:
PageView
DAY_2013_05_30:
Email
TOTAL:
PageView
TOTAL:
Email
3706119 10 2 20 4 50 5
3706052 23 3 102 4 234 10
HOUR_2013_05_30_18 DAY_2013_05_30 TOTAL
PageView Email PageView Email PageView Email
3706119 10 2 20 4 50 5
3706052 23 3 102 4 234 10
SuperColumns
CompositeColumns
+ clean up jobs to remove old QUARTER_HOUR and HOUR columns
Migration from SuperColumns to Composite columns
Usecase: Event counting and statistics. Current system. Aggregated data
Usecase: Event counting and statistics. Old system. Disadvantages
Old Raw event data cluster disadvantages:
ByteOrderedPartitioner: just one node at work at a time taking all load
Skinny rows: 5 billions rows for 3 month of data on each node
Extreme unstable instances failing with OOM errors
Hadoop jobs fail or hang on init stage despite of QUORUM consistency
Outdated statistics across Finn services
+
=
Usecase: Event counting and statistics. Current system. Raw data
Current system for raw data:
● Same cluster as aggregated data, i.e.
RandomPartitioner + CQL3
Usecase: Event counting and statistics. Current system. Raw data
● CQL = SQL without JOINs, GROUPBYs and other unimportant stuff
● Abstraction over physical C* storage
● CQL Table transposes rows into Composite columns
Timestamp Eventbean
1369164000 0x1f0562bda6...
1369164001 0x364dd9a5a6...
1369164002 0x4d96508da6...
1369164003 0x64dec775a6...
1369164000:
Eventbean
1369164001:
Eventbean
1369164002:
Eventbean
1369164003:
Eventbean
0x1f0562bda6... 0x364dd9a5a6... 0x4d96508da6... 0x64dec775a6..
.
CQL table
Underlying ColumnFamily
Timebucket
1369164000
Timebucket
1369164000
1369164000
1369164000
1369164000
Partition key
(row key)
Clustering key
Partition + clustering = PRIMARY KEY
Other columns are static for every PK pair
Usecase: Event counting and statistics. Current system. Raw data
CREATE TABLE events (
realtb_sharded text, ← Partition key
type text, ← Clustering key
collected_ts timeuuid, ← Clustering key
PRIMARY KEY(realtb_sharded, type, collected_ts),
key_values_json text,← Static column
real_ts timestamp, ← Static column
real_tb bigint, ← Secondary Index
collected_tb bigint ← Secondary Index
);
“real” timestamp – Event occurred at the client
“collected” timestamp – Event reached C*
Usecase: Event counting and statistics. Current system. Raw data
Hadoop data reading:
1. Get a list of InputSplits
HDFS:
A file block is replicated across several machines
FileInputSplit: (“file, start, length”, IP-addresses)
C*:
Rows with same Partition key are replicated across several machines
EventsInputSplit: (Partition key, IP-addresses)
2. InputSplit → Map-tasks on IP-addresses
3. Map-task reads data based on:
HDFS: “file, start, length”
C*: Partition key
Usecase: Event counting and statistics. Current system. Raw data
Process all data collected 18:00 – 19:00 30.05.2013:
1. Get InputSplits:
For minute in (18:00-19:00).getMinutes:
SELECT realtb_sharded FROM events WHERE collected_tb = minute
token(realtb_sharded) → IP-addresses
2. Map-task:
SELECT * FROM events WHERE realtb_sharded='17:58'
SELECT * FROM events WHERE realtb_sharded='17:59'
SELECT * FROM events WHERE realtb_sharded='18:03'
CREATE TABLE events (
realtb_sharded text, ← Partition key
type text, ← Clustering key
collected_ts timeuuid, ← Clustering key
PRIMARY KEY(realtb_sharded, type, collected_ts),
key_values_json text,← Static column
real_ts timestamp, ← Static column
real_tb bigint, ← Secondary Index
collected_tb bigint ← Secondary Index
);
Usecase: Event counting and statistics. Current system. Raw data
Process data of type “AD” collected 18:00 – 19:00 30.05.2013:
1. Get InputSplits:
For minute in (18:00-19:00).getMinutes:
SELECT realtb_sharded FROM events WHERE collected_tb = minute
and type = “AD”
token(realtb_sharded) → IP-addresses
2. Map-task:
SELECT * FROM events WHERE realtb_sharded='17:58' and type = “AD”
and collected_ts>minTimeuuid(“18:00”)
and collected_ts<maxTimeuuid(“19:00”)
CREATE TABLE events (
realtb_sharded text, ← Partition key
type text, ← Clustering key
collected_ts timeuuid, ← Clustering key
PRIMARY KEY(realtb_sharded, type, collected_ts),
key_values_json text,← Static column
real_ts timestamp, ← Static column
real_tb bigint, ← Secondary Index
collected_tb bigint ← Secondary Index
);
Usecase: Event counting and statistics. Current system. Raw data
Getting InputSplits:
Get Partition keys for data of type AD collected during timebucket 29.05.2013 21:18 –
21:19
Hadoop Map-task:
Get rows for a Partition key from split limiting by type and collected timestamp
Usecase: Event counting and statistics. NextGen
Event Counting and Statistics NextGen:
Ad-hoc analytics:
– Apache Hive integration (we have Pig)
– Hive ODBC driver for Tableu integration
Aggregation jobs:
– Higher level library like Cascading or Apache Crunch than raw
M/R-code.
Hadoop 2
Usecase: Event counting and statistics
?

More Related Content

What's hot

WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...Sriskandarajah Suhothayan
 
Tapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and FlinkTapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and FlinkMichael Häusler
 
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности CassandraАндрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности CassandraOlga Lavrentieva
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...Databricks
 
MongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB
 
Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseMongoDB
 
MongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataRostislav Pashuto
 
Deep dive into stateful stream processing in structured streaming by Tathaga...
Deep dive into stateful stream processing in structured streaming  by Tathaga...Deep dive into stateful stream processing in structured streaming  by Tathaga...
Deep dive into stateful stream processing in structured streaming by Tathaga...Databricks
 
RDO hangout on gnocchi
RDO hangout on gnocchiRDO hangout on gnocchi
RDO hangout on gnocchiEoghan Glynn
 
Introduction to Real-time data processing
Introduction to Real-time data processingIntroduction to Real-time data processing
Introduction to Real-time data processingYogi Devendra Vyavahare
 
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
MongoDB Days UK: Using MongoDB and Python for Data Analysis PipelinesMongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
MongoDB Days UK: Using MongoDB and Python for Data Analysis PipelinesMongoDB
 
Apache Spark for Library Developers with William Benton and Erik Erlandson
 Apache Spark for Library Developers with William Benton and Erik Erlandson Apache Spark for Library Developers with William Benton and Erik Erlandson
Apache Spark for Library Developers with William Benton and Erik ErlandsonDatabricks
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkDatabricks
 
Log everything! @DC13
Log everything! @DC13Log everything! @DC13
Log everything! @DC13DECK36
 
Workshop 20140522 BigQuery Implementation
Workshop 20140522   BigQuery ImplementationWorkshop 20140522   BigQuery Implementation
Workshop 20140522 BigQuery ImplementationSimon Su
 
Real-time Data Analytics mit Elasticsearch
Real-time Data Analytics mit ElasticsearchReal-time Data Analytics mit Elasticsearch
Real-time Data Analytics mit Elasticsearchinovex GmbH
 
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig KerstiensFive Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig KerstiensCitus Data
 
Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge Fastly
 

What's hot (20)

WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
 
Tapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and FlinkTapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and Flink
 
Druid
DruidDruid
Druid
 
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности CassandraАндрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
 
MongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema Design
 
Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick Database
 
MongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor Management
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of data
 
Deep dive into stateful stream processing in structured streaming by Tathaga...
Deep dive into stateful stream processing in structured streaming  by Tathaga...Deep dive into stateful stream processing in structured streaming  by Tathaga...
Deep dive into stateful stream processing in structured streaming by Tathaga...
 
RDO hangout on gnocchi
RDO hangout on gnocchiRDO hangout on gnocchi
RDO hangout on gnocchi
 
Introduction to Real-time data processing
Introduction to Real-time data processingIntroduction to Real-time data processing
Introduction to Real-time data processing
 
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
MongoDB Days UK: Using MongoDB and Python for Data Analysis PipelinesMongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
 
Apache Spark for Library Developers with William Benton and Erik Erlandson
 Apache Spark for Library Developers with William Benton and Erik Erlandson Apache Spark for Library Developers with William Benton and Erik Erlandson
Apache Spark for Library Developers with William Benton and Erik Erlandson
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
 
Log everything! @DC13
Log everything! @DC13Log everything! @DC13
Log everything! @DC13
 
Workshop 20140522 BigQuery Implementation
Workshop 20140522   BigQuery ImplementationWorkshop 20140522   BigQuery Implementation
Workshop 20140522 BigQuery Implementation
 
Real-time Data Analytics mit Elasticsearch
Real-time Data Analytics mit ElasticsearchReal-time Data Analytics mit Elasticsearch
Real-time Data Analytics mit Elasticsearch
 
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig KerstiensFive Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
 
Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge
 

Viewers also liked

Kushkalyan shaistratal
Kushkalyan shaistratalKushkalyan shaistratal
Kushkalyan shaistratalLakshay Dahiya
 
There is no difference between the "real" and the "virtual": a brief phenomen...
There is no difference between the "real" and the "virtual": a brief phenomen...There is no difference between the "real" and the "virtual": a brief phenomen...
There is no difference between the "real" and the "virtual": a brief phenomen...cyborgology
 
Presentazione vpl-1-ott-07991
Presentazione vpl-1-ott-07991Presentazione vpl-1-ott-07991
Presentazione vpl-1-ott-07991sorlandoni
 
TEL Tool for Communication and Collaboration
TEL Tool for Communication and CollaborationTEL Tool for Communication and Collaboration
TEL Tool for Communication and CollaborationBlayn Parkinson
 
Women Making Media: Revisiting Questions of Gender, Labor, and Power in the D...
Women Making Media: Revisiting Questions of Gender, Labor, and Power in the D...Women Making Media: Revisiting Questions of Gender, Labor, and Power in the D...
Women Making Media: Revisiting Questions of Gender, Labor, and Power in the D...cyborgology
 
The Republic of Tweets - Olivia Rosane
The Republic of Tweets - Olivia RosaneThe Republic of Tweets - Olivia Rosane
The Republic of Tweets - Olivia Rosanecyborgology
 
ETUG Spidergram report
ETUG Spidergram reportETUG Spidergram report
ETUG Spidergram reportBCcampus
 
nesruduldul - Sample presentation
nesruduldul - Sample presentationnesruduldul - Sample presentation
nesruduldul - Sample presentationNesruduldul
 
#MME15: How Mintigo, the Leading Predictive Marketing Platform for Enterprise...
#MME15: How Mintigo, the Leading Predictive Marketing Platform for Enterprise...#MME15: How Mintigo, the Leading Predictive Marketing Platform for Enterprise...
#MME15: How Mintigo, the Leading Predictive Marketing Platform for Enterprise...Mintigo1
 
TestComplete – A Sophisticated Automated Testing Tool by SmartBear
TestComplete – A Sophisticated Automated Testing Tool by SmartBearTestComplete – A Sophisticated Automated Testing Tool by SmartBear
TestComplete – A Sophisticated Automated Testing Tool by SmartBearSoftware Testing Solution
 
LAGO Talking Furniture
LAGO Talking FurnitureLAGO Talking Furniture
LAGO Talking FurnitureTeamNP
 

Viewers also liked (17)

Kushkalyan shaistratal
Kushkalyan shaistratalKushkalyan shaistratal
Kushkalyan shaistratal
 
7.sosial
7.sosial7.sosial
7.sosial
 
There is no difference between the "real" and the "virtual": a brief phenomen...
There is no difference between the "real" and the "virtual": a brief phenomen...There is no difference between the "real" and the "virtual": a brief phenomen...
There is no difference between the "real" and the "virtual": a brief phenomen...
 
Cat programme
Cat programmeCat programme
Cat programme
 
Presentazione vpl-1-ott-07991
Presentazione vpl-1-ott-07991Presentazione vpl-1-ott-07991
Presentazione vpl-1-ott-07991
 
18.insp
18.insp18.insp
18.insp
 
TEL Tool for Communication and Collaboration
TEL Tool for Communication and CollaborationTEL Tool for Communication and Collaboration
TEL Tool for Communication and Collaboration
 
Women Making Media: Revisiting Questions of Gender, Labor, and Power in the D...
Women Making Media: Revisiting Questions of Gender, Labor, and Power in the D...Women Making Media: Revisiting Questions of Gender, Labor, and Power in the D...
Women Making Media: Revisiting Questions of Gender, Labor, and Power in the D...
 
The Republic of Tweets - Olivia Rosane
The Republic of Tweets - Olivia RosaneThe Republic of Tweets - Olivia Rosane
The Republic of Tweets - Olivia Rosane
 
ETUG Spidergram report
ETUG Spidergram reportETUG Spidergram report
ETUG Spidergram report
 
nesruduldul - Sample presentation
nesruduldul - Sample presentationnesruduldul - Sample presentation
nesruduldul - Sample presentation
 
White school
White schoolWhite school
White school
 
#MME15: How Mintigo, the Leading Predictive Marketing Platform for Enterprise...
#MME15: How Mintigo, the Leading Predictive Marketing Platform for Enterprise...#MME15: How Mintigo, the Leading Predictive Marketing Platform for Enterprise...
#MME15: How Mintigo, the Leading Predictive Marketing Platform for Enterprise...
 
TestComplete – A Sophisticated Automated Testing Tool by SmartBear
TestComplete – A Sophisticated Automated Testing Tool by SmartBearTestComplete – A Sophisticated Automated Testing Tool by SmartBear
TestComplete – A Sophisticated Automated Testing Tool by SmartBear
 
The Crank Slider Crusher
The Crank Slider CrusherThe Crank Slider Crusher
The Crank Slider Crusher
 
LAGO Talking Furniture
LAGO Talking FurnitureLAGO Talking Furniture
LAGO Talking Furniture
 
Data model
Data modelData model
Data model
 

Similar to Cassandra at Finn.io — May 30th 2013

AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호Amazon Web Services Korea
 
[WSO2Con EU 2018] Patterns for Building Streaming Apps
[WSO2Con EU 2018] Patterns for Building Streaming Apps[WSO2Con EU 2018] Patterns for Building Streaming Apps
[WSO2Con EU 2018] Patterns for Building Streaming AppsWSO2
 
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...randyguck
 
Real time analytics at any scale | PostgreSQL User Group NL | Marco Slot
Real time analytics at any scale | PostgreSQL User Group NL | Marco SlotReal time analytics at any scale | PostgreSQL User Group NL | Marco Slot
Real time analytics at any scale | PostgreSQL User Group NL | Marco SlotCitus Data
 
Monitoring as Software Validation
Monitoring as Software ValidationMonitoring as Software Validation
Monitoring as Software ValidationBioDec
 
Fast NoSQL from HDDs?
Fast NoSQL from HDDs? Fast NoSQL from HDDs?
Fast NoSQL from HDDs? ScyllaDB
 
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandracodecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with CassandraDataStax Academy
 
Fabric - Realtime stream processing framework
Fabric - Realtime stream processing frameworkFabric - Realtime stream processing framework
Fabric - Realtime stream processing frameworkShashank Gautam
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in MotionRuhani Arora
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingGuozhang Wang
 
(MBL305) You Have Data from the Devices, Now What?: Getting the Value of the IoT
(MBL305) You Have Data from the Devices, Now What?: Getting the Value of the IoT(MBL305) You Have Data from the Devices, Now What?: Getting the Value of the IoT
(MBL305) You Have Data from the Devices, Now What?: Getting the Value of the IoTAmazon Web Services
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupApache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupRobert Metzger
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingDatabricks
 
Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018
Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018
Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018Amazon Web Services
 
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
Building Big Data Applications with Serverless Architectures -  June 2017 AWS...Building Big Data Applications with Serverless Architectures -  June 2017 AWS...
Building Big Data Applications with Serverless Architectures - June 2017 AWS...Amazon Web Services
 
Building Microservices with Scala, functional domain models and Spring Boot -...
Building Microservices with Scala, functional domain models and Spring Boot -...Building Microservices with Scala, functional domain models and Spring Boot -...
Building Microservices with Scala, functional domain models and Spring Boot -...JAXLondon2014
 
#JaxLondon: Building microservices with Scala, functional domain models and S...
#JaxLondon: Building microservices with Scala, functional domain models and S...#JaxLondon: Building microservices with Scala, functional domain models and S...
#JaxLondon: Building microservices with Scala, functional domain models and S...Chris Richardson
 

Similar to Cassandra at Finn.io — May 30th 2013 (20)

AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
 
Cubes 1.0 Overview
Cubes 1.0 OverviewCubes 1.0 Overview
Cubes 1.0 Overview
 
[WSO2Con EU 2018] Patterns for Building Streaming Apps
[WSO2Con EU 2018] Patterns for Building Streaming Apps[WSO2Con EU 2018] Patterns for Building Streaming Apps
[WSO2Con EU 2018] Patterns for Building Streaming Apps
 
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
 
Real-Time Event Processing
Real-Time Event ProcessingReal-Time Event Processing
Real-Time Event Processing
 
Real time analytics at any scale | PostgreSQL User Group NL | Marco Slot
Real time analytics at any scale | PostgreSQL User Group NL | Marco SlotReal time analytics at any scale | PostgreSQL User Group NL | Marco Slot
Real time analytics at any scale | PostgreSQL User Group NL | Marco Slot
 
Monitoring as Software Validation
Monitoring as Software ValidationMonitoring as Software Validation
Monitoring as Software Validation
 
Fast NoSQL from HDDs?
Fast NoSQL from HDDs? Fast NoSQL from HDDs?
Fast NoSQL from HDDs?
 
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandracodecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
 
Fabric - Realtime stream processing framework
Fabric - Realtime stream processing frameworkFabric - Realtime stream processing framework
Fabric - Realtime stream processing framework
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
(MBL305) You Have Data from the Devices, Now What?: Getting the Value of the IoT
(MBL305) You Have Data from the Devices, Now What?: Getting the Value of the IoT(MBL305) You Have Data from the Devices, Now What?: Getting the Value of the IoT
(MBL305) You Have Data from the Devices, Now What?: Getting the Value of the IoT
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupApache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya Meetup
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
 
Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018
Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018
Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018
 
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
Building Big Data Applications with Serverless Architectures -  June 2017 AWS...Building Big Data Applications with Serverless Architectures -  June 2017 AWS...
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
 
Building Microservices with Scala, functional domain models and Spring Boot -...
Building Microservices with Scala, functional domain models and Spring Boot -...Building Microservices with Scala, functional domain models and Spring Boot -...
Building Microservices with Scala, functional domain models and Spring Boot -...
 
#JaxLondon: Building microservices with Scala, functional domain models and S...
#JaxLondon: Building microservices with Scala, functional domain models and S...#JaxLondon: Building microservices with Scala, functional domain models and S...
#JaxLondon: Building microservices with Scala, functional domain models and S...
 
Javantura v3 - ELK – Big Data for DevOps – Maarten Mulders
Javantura v3 - ELK – Big Data for DevOps – Maarten MuldersJavantura v3 - ELK – Big Data for DevOps – Maarten Mulders
Javantura v3 - ELK – Big Data for DevOps – Maarten Mulders
 

More from DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Recently uploaded

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Recently uploaded (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Cassandra at Finn.io — May 30th 2013

  • 1. Event counting and statistics at Finn.no Alexei Bakanov teamcore@finn.no
  • 2. Finn.no Cassandra usecases ● Event counting and statistics ● FINNboks messaging system ● IP-Geo ● Your Last searches ● JDBC-activity monitoring ● Scam-control activity log ← Today
  • 3. Usecase: Event counting and statistics An Event: ● Anything that has a timestamp on it (not continious)
  • 6. Event example: «Lagret søk» email is sent for an ad
  • 7. Usecase: Event counting and statistics
  • 8. Usecase: Event counting and statistics (testdata)
  • 9. Usecase: Event counting and statistics “Stone age” system Old system Current system Counter updates in Web-app. Storing in RDBMS. Raw Event data in C* (ByteOrderedPartitioner) Hadoop jobs to rollup event data Aggregated data in C* (RandomPartitioner + SuperColumns) CQL3 for raw Event Data Composite columns for Aggregated data
  • 10. Usecase: Event counting and statistics. «Stone age» system «Stone age» architecture: ● Syncronous counter updates ● Updating counters inside a web-app ● Using Finn's main relational database as a storage for counters
  • 11. Usecase: Event counting and statistics. «Stone age» system RDBS Web server Web server Web server. . . ++count ++count ++count
  • 12. Usecase: Event counting and statistics. «Stone age» system Pros: ● Real time numbers Cons: ● High DB commit-log write times during peak- hours. Overall Finn performance degradation. ● No interaval based statistics like daily counters, just totals
  • 13. Usecase: Event counting and statistics. «Stone age» system Very long time ago...
  • 14. Usecase: Event counting and statistics. Old system Old architecture: ● Asyncronous event logging via Scribe ● Saving event data in a raw unnormalized format to C* ● Hadoop jobs to sum up event counters over time periods ● Serving aggregated statistics from C*
  • 15. Usecase: Event counting and statistics. Old architecture Pros: ● Less load on main RDBMS ● Intervall based statistics ● Ability to re-aggregate data and get new insights ● Better Command-Query separation Cons: ● Not real-time, although jobs run every minute
  • 16. Usecase: Event counting and statistics. «Oppdrag»'s interval-based statistics (testdata)
  • 17. Usecase: Event counting and statistics. Grouped by subCategory
  • 18. Usecase: Event counting and statistics. Grouped by client types
  • 19. Usecase: Event counting and statistics. Grouped by referrer domain
  • 20. Usecase: Event counting and statistics. Repeated views by a person
  • 21.
  • 22. Usecase: Event counting and statistics. Old system. Data flow webserver1.finn.no Tomcat Scribe daemon async event logging cassandra1.finn.no Cassandra (ByteOrderedPartitioner) Raw event data webserver2.finn.no ….. webserver3.finn.no ….. Scribe daemon Cassandra (RandomPartitioner) Aggregated data Hadoop aggregation job
  • 23. Usecase: Event counting and statistics. Old system. Event logging Event bean (Thrift IDL): struct Event { /** Event domain. Typical AD, CV, Oppdrag */ 1: required string type; /** Event name. Typical PageView, EmailSent */ 2: optional string subCategory; /** Arbitrary key-value map with extra info like finnkode or userid */ 3: required map<string, string> values; } ● Event bean-> binary-> base64 + timestamp-> scribe message
  • 24. Usecase: Event counting and statistics. Old system. Data flow webserver1.finn.no Tomcat Scribe daemon async event logging cassandra1.finn.no Cassandra (ByteOrderedPartitioner) Raw event data webserver2.finn.no ….. webserver3.finn.no ….. Scribe daemon Cassandra (RandomPartitioner) Aggregated data Hadoop aggregation job
  • 25. Usecase: Event counting and statistics. Old system. Raw event data ● Ca 1500k events per sec to log in peak times ● ByteOrderedPartitioner for storing raw data. TTL = 3mnth (picture from datastax.com) Timestamp Eventbean 1369164000 0x1f0562bda6... 1369164001 0x364dd9a5a6... 1369164002 0x4d96508da6... 1369164003 0x64dec775a6... Sequential rowkeys = Hadoop friendly get_range_slices(1369164000, 1369164003) to get data for Hadoop splits
  • 26. Usecase: Event counting and statistics. Old system. Data flow webserver1.finn.no Tomcat Scribe daemon async event logging cassandra1.finn.no Cassandra (ByteOrderedPartitioner) Raw event data webserver2.finn.no ….. webserver3.finn.no ….. Scribe daemon Cassandra (RandomPartitioner) Aggregated data Hadoop aggregation job
  • 27. Usecase: Event counting and statistics. Old system. Aggregation Hadoop jobs: ● Sum up events for each finnkode grouped by subCategory. map(events): for event in events: finnkode = event.getValues().get(“ad.id”) subcategoryCount = (event.subcategory, 1) emit(finnkode, subcategoryCount) reduce(finnkode, subcategoryCountList): subcategoryTotals = {} for subcategory, count in subcategoryCountList: subcategoryTotals[subcategory] += count for subcategory, count in subcategoryTotals.iteritems(): incrementCassandraCounter(finnkode, subcategory, count, HOUR_somehour) incrementCassandraCounter(finnkode, subcategory, count, DAY_someday) incrementCassandraCounter(finnkode, subcategory, count, TOTAL)
  • 28. Usecase: Event counting and statistics. Old system. Data flow webserver1.finn.no Tomcat Scribe daemon async event logging cassandra1.finn.no Cassandra (ByteOrderedPartitioner) Raw event data webserver2.finn.no ….. webserver3.finn.no ….. Scribe daemon Cassandra (RandomPartitioner) Aggregated data Hadoop aggregation job
  • 29. Usecase: Event counting and statistics. Old system. Aggregated data SuperColumns for interval-based counters HOUR_2013_05_30_18 DAY_2013_05_30 TOTAL PageView Email PageView Email PageView Email 3706119 10 2 20 4 50 5 3706052 23 3 102 4 234 10 Min. time resolution QUARTER_HOUR No TTL on Counter columns, got really wide and slow http://www.makingitscale.com/2012/scaling-cassandra-counter-columns.html
  • 30. Usecase: Event counting and statistics. Current system. Aggregated data HOUR_2013_05 _30_18: PageView HOUR_2013_05 _30_18: Email DAY_2013_05_30: PageView DAY_2013_05_30: Email TOTAL: PageView TOTAL: Email 3706119 10 2 20 4 50 5 3706052 23 3 102 4 234 10 HOUR_2013_05_30_18 DAY_2013_05_30 TOTAL PageView Email PageView Email PageView Email 3706119 10 2 20 4 50 5 3706052 23 3 102 4 234 10 SuperColumns CompositeColumns + clean up jobs to remove old QUARTER_HOUR and HOUR columns Migration from SuperColumns to Composite columns
  • 31. Usecase: Event counting and statistics. Current system. Aggregated data
  • 32. Usecase: Event counting and statistics. Old system. Disadvantages Old Raw event data cluster disadvantages: ByteOrderedPartitioner: just one node at work at a time taking all load Skinny rows: 5 billions rows for 3 month of data on each node Extreme unstable instances failing with OOM errors Hadoop jobs fail or hang on init stage despite of QUORUM consistency Outdated statistics across Finn services + =
  • 33. Usecase: Event counting and statistics. Current system. Raw data Current system for raw data: ● Same cluster as aggregated data, i.e. RandomPartitioner + CQL3
  • 34. Usecase: Event counting and statistics. Current system. Raw data ● CQL = SQL without JOINs, GROUPBYs and other unimportant stuff ● Abstraction over physical C* storage ● CQL Table transposes rows into Composite columns Timestamp Eventbean 1369164000 0x1f0562bda6... 1369164001 0x364dd9a5a6... 1369164002 0x4d96508da6... 1369164003 0x64dec775a6... 1369164000: Eventbean 1369164001: Eventbean 1369164002: Eventbean 1369164003: Eventbean 0x1f0562bda6... 0x364dd9a5a6... 0x4d96508da6... 0x64dec775a6.. . CQL table Underlying ColumnFamily Timebucket 1369164000 Timebucket 1369164000 1369164000 1369164000 1369164000 Partition key (row key) Clustering key Partition + clustering = PRIMARY KEY Other columns are static for every PK pair
  • 35. Usecase: Event counting and statistics. Current system. Raw data CREATE TABLE events ( realtb_sharded text, ← Partition key type text, ← Clustering key collected_ts timeuuid, ← Clustering key PRIMARY KEY(realtb_sharded, type, collected_ts), key_values_json text,← Static column real_ts timestamp, ← Static column real_tb bigint, ← Secondary Index collected_tb bigint ← Secondary Index ); “real” timestamp – Event occurred at the client “collected” timestamp – Event reached C*
  • 36. Usecase: Event counting and statistics. Current system. Raw data Hadoop data reading: 1. Get a list of InputSplits HDFS: A file block is replicated across several machines FileInputSplit: (“file, start, length”, IP-addresses) C*: Rows with same Partition key are replicated across several machines EventsInputSplit: (Partition key, IP-addresses) 2. InputSplit → Map-tasks on IP-addresses 3. Map-task reads data based on: HDFS: “file, start, length” C*: Partition key
  • 37. Usecase: Event counting and statistics. Current system. Raw data Process all data collected 18:00 – 19:00 30.05.2013: 1. Get InputSplits: For minute in (18:00-19:00).getMinutes: SELECT realtb_sharded FROM events WHERE collected_tb = minute token(realtb_sharded) → IP-addresses 2. Map-task: SELECT * FROM events WHERE realtb_sharded='17:58' SELECT * FROM events WHERE realtb_sharded='17:59' SELECT * FROM events WHERE realtb_sharded='18:03' CREATE TABLE events ( realtb_sharded text, ← Partition key type text, ← Clustering key collected_ts timeuuid, ← Clustering key PRIMARY KEY(realtb_sharded, type, collected_ts), key_values_json text,← Static column real_ts timestamp, ← Static column real_tb bigint, ← Secondary Index collected_tb bigint ← Secondary Index );
  • 38. Usecase: Event counting and statistics. Current system. Raw data Process data of type “AD” collected 18:00 – 19:00 30.05.2013: 1. Get InputSplits: For minute in (18:00-19:00).getMinutes: SELECT realtb_sharded FROM events WHERE collected_tb = minute and type = “AD” token(realtb_sharded) → IP-addresses 2. Map-task: SELECT * FROM events WHERE realtb_sharded='17:58' and type = “AD” and collected_ts>minTimeuuid(“18:00”) and collected_ts<maxTimeuuid(“19:00”) CREATE TABLE events ( realtb_sharded text, ← Partition key type text, ← Clustering key collected_ts timeuuid, ← Clustering key PRIMARY KEY(realtb_sharded, type, collected_ts), key_values_json text,← Static column real_ts timestamp, ← Static column real_tb bigint, ← Secondary Index collected_tb bigint ← Secondary Index );
  • 39. Usecase: Event counting and statistics. Current system. Raw data Getting InputSplits: Get Partition keys for data of type AD collected during timebucket 29.05.2013 21:18 – 21:19 Hadoop Map-task: Get rows for a Partition key from split limiting by type and collected timestamp
  • 40. Usecase: Event counting and statistics. NextGen Event Counting and Statistics NextGen: Ad-hoc analytics: – Apache Hive integration (we have Pig) – Hive ODBC driver for Tableu integration Aggregation jobs: – Higher level library like Cascading or Apache Crunch than raw M/R-code. Hadoop 2
  • 41. Usecase: Event counting and statistics ?