SlideShare une entreprise Scribd logo
1  sur  27
Cassandra implementation for collecting
data and presenting data
Robert Chen
robertchen117@gmail.com
Agenda
• SQL vs NOSQL
• Why Cassandra
• Cassandra introduction
• Our architecture and design
• Configuration best practice
• How we write data
• How we read data
• Demo
A highly scalable, eventually consistent, distributed, structured
key-value store.
Cassandra™ is the highly scalable and high performance distributed data infrastructure. Offering distribution
of data across multiple data centers and incremental scalability with no single points of failure, Cassandra is
the logical choice when you need reliability without compromising performance. Cassandra is relied upon by
leading companies like Netflix, Twitter, Cisco, Rackspace, Ooyala, Openwave, and many more.
SQL vs NOSQL
• NOSQL
• Not just SQL, schema free
• Big data
• NOSQL can service heavy read/write workloads
• Probably not consistent in real time read
• SQL
• Can support complex join relationship
• Oracle RAC solution for big data? Too expensive
• Typical RDBMS implementations are tuned for small but frequent read/write transactions or for
large batch transactions with rare write access
• RDBMSs (they say) have shown poor performance on data-intensive applications, including:
• Indexing a large number of documents
• Serving pages on high-traffic websites
• Handling the volumes of social networking data
• Delivering streaming media
• Consistent in all read
Why Cassandra
• To solve our central netapp filer storage bottleneck issue
• Choose cassandra instead of Hbase
• No Single point of failure
• Fast development
• Big data and dynamically changing environment
• Good fit for horizontally production environment
• Low total cost of ownership
• No special hardware needed, just some x86 boxes
Cassandra Design
•High availability (A wily hare has three burrows )
•Eventual consistency
• trade-off strong consistency in favor of high availability
• allows you to choose strong consistency or allow varying degress of more relaxed consistency
•Incremental scalability(linearly scalable), Horizontal!
• Nodes added to a Cassandra cluster (all done online) increase the throughput of your database
in a predictable, linear fashion for both read and write operations
•Optimistic Replication
•
Cassandra Design II
• All nodes are identical: decentralized/symmetric
• No master or SPOF
• Adding is simple
• Distributed, read/write anywhere design
• Massively scalable peer-to-peer architecture
• Based on the best of Amazon Dynamo and Google BigTable
• Minimal administration
• Multi-datacenter replication
• No caching layer required
Cassandra Design III
• very fast writes
• fault tolerant, Guaranteed data safety
• automatic provisioning of new nodes
• big data
• Transparent fault detection and recovery
• Cassandra utilizes gossip protocols to detect machine failure and recover when a
machine is brought back into the cluster – all without your application noticing.
write op
Write op (continue)
• Writes go to log and memory table
• Periodically memory table merged with disk table
Cassandra node
Disk
RAM
Log SSTable file
Memtable
Update
(later)
Read
Query
Closest replica
Cassandra Cluster
Replica A
Result
Replica B Replica C
Digest Query
Digest Response Digest Response
Result
Client
Read repair if
digests differ
Configuration best practice
• Put the data files on good performance RAID volumes
• Start with Sun JDK 1.6+
• Configure with Java Native libs
• The clocks on each node must be synchronized to maintain precision
across the cluster on inserts.
Data collection Architecture
Web UI (High Chart/ JQuery)
Active MQ (Message Bus)
1. collect data
sent to Active
MQ
2. Consume
data, save to
Cassandra
3. Filer the data,
showing on the
plots
Data structure
keyspace
settings
(eg,
partitioner)
column family
settings (eg,
comparator,
type [Std])
column
name value clock
Our Data Model
Company Logo
CoreMetrics (keyspace)
LoadAvg1 (Column family)
host1_131696(row)
Column:6449, value:
0.04
Column:5546, value:
0.02
host2_131811(row)
Column:8227, value:
0.46
Column:9792, value:
1.30
Our Data Model
Company Logo
CoreMetrics (keyspace)
Primary (Column family)
host1:loadAvg1 (row)
Column:1316966449, value:
0.04
Column:1316965546, value: 0.02
host2:loadAvg1 (row)
Column:1318118227, value:
0.46
Column:1318119792, value: 1.30
Our Meta Data Model
Company Logo
CoreMetrics (keyspace)
PrimaryMeta (Column family)
host1.com (row)
Column:loadAvg15:Total value:
1
Column:loadAvg15:Total
value: 1
host2 (row)
Column:loadAvg15:Total value: 1 Column:loadAvg15:Total value: 1
Our Hbase Data Model
Company Logo
Primary (Column family)
host1:loadAvg1:1 (row: host:metric:instance)
Column:c:1316966449, value:
0.04
Column:c:1316965546, value: 0.02
host2:loadAvg1:1 (row: host:metric:instance)
Column:1318118227, value:
0.46
Column:1318119792, value: 1.30
Our Data Model (II)
Company Logo
• Keyspace: CoreMetrics (database name), one per application
• Column families: (metrics, each metric is a column family)
• loadAvg1
• loadAvg5
• etc (About 80 server metrics)
• Rows and columns: inspired by the design of Hbase and opentsdb, we use the
similar way to design our rows and columns:
separate timestamp into row and column keys, which improve tremendously the
reading performance
How we write to cassandra
Multiple data loaders connect to cassandra nodes 9160 port and insert data like this:
$CLIENT = new Cassandra::CassandraClient($PROTOCOL);
$CLIENT->set_keyspace($keyspace);
$CLIENT->insert($rowkey, $column_parent, $column, $consistency_level);
How we read data from cassandra
Using pycassa to multiget of the rows and do some aggregation if too many data points returns.
get_coremetrics(metric_name, host, stime, etime, samples = 1000):
Demo: data model view
Company Logo
Demo: graphing the data
Company Logo
Cassandra monitoring
1.Nagios plugin for cassandra
2.JMX
Thoughts and future
1.Migrate more applications to Cassandra
2.Livestat data (Bids/Listings…)
3.Help other team to do data collection and graphing?
Reference URLs
• Thrift (12 language bindings!)
• http://wiki.apache.org/cassandra/ThriftInterface
• http://thrift.apache.org/download/
• Pycassa
• http://pycassa.github.com/pycassa/tutorial.html
Cassandra implementation for collecting data and presenting data

Contenu connexe

Tendances

NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraFolio3 Software
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overviewPritamKathar
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical dataOleksandr Semenov
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideMohammed Fazuluddin
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introductionPooyan Mehrparvar
 
cassandra
cassandracassandra
cassandraAkash R
 
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Clustrix
 
Cassandra basics 2.0
Cassandra basics 2.0Cassandra basics 2.0
Cassandra basics 2.0Asis Mohanty
 
Introduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache CassandraIntroduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache CassandraChetan Baheti
 
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...Edureka!
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsOleg Magazov
 
Webinar: DataStax Training - Everything you need to become a Cassandra Rockstar
Webinar: DataStax Training - Everything you need to become a Cassandra RockstarWebinar: DataStax Training - Everything you need to become a Cassandra Rockstar
Webinar: DataStax Training - Everything you need to become a Cassandra RockstarDataStax
 
Beyond Aurora. Scale-out SQL databases for AWS
Beyond Aurora. Scale-out SQL databases for AWS Beyond Aurora. Scale-out SQL databases for AWS
Beyond Aurora. Scale-out SQL databases for AWS Clustrix
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra Knoldus Inc.
 
Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2Dave Gardner
 
Benchmark: Beyond Aurora. Scale-out SQL databases for AWS.
Benchmark: Beyond Aurora. Scale-out SQL databases for AWS.Benchmark: Beyond Aurora. Scale-out SQL databases for AWS.
Benchmark: Beyond Aurora. Scale-out SQL databases for AWS.Clustrix
 

Tendances (20)

Cassandra tutorial
Cassandra tutorialCassandra tutorial
Cassandra tutorial
 
Apache Cassandra
Apache CassandraApache Cassandra
Apache Cassandra
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Cassandra Architecture FTW
Cassandra Architecture FTWCassandra Architecture FTW
Cassandra Architecture FTW
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
cassandra
cassandracassandra
cassandra
 
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
 
Cassandra basics 2.0
Cassandra basics 2.0Cassandra basics 2.0
Cassandra basics 2.0
 
Introduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache CassandraIntroduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache Cassandra
 
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
 
Webinar: DataStax Training - Everything you need to become a Cassandra Rockstar
Webinar: DataStax Training - Everything you need to become a Cassandra RockstarWebinar: DataStax Training - Everything you need to become a Cassandra Rockstar
Webinar: DataStax Training - Everything you need to become a Cassandra Rockstar
 
Beyond Aurora. Scale-out SQL databases for AWS
Beyond Aurora. Scale-out SQL databases for AWS Beyond Aurora. Scale-out SQL databases for AWS
Beyond Aurora. Scale-out SQL databases for AWS
 
Project Voldemort
Project VoldemortProject Voldemort
Project Voldemort
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra
 
Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2
 
Benchmark: Beyond Aurora. Scale-out SQL databases for AWS.
Benchmark: Beyond Aurora. Scale-out SQL databases for AWS.Benchmark: Beyond Aurora. Scale-out SQL databases for AWS.
Benchmark: Beyond Aurora. Scale-out SQL databases for AWS.
 

En vedette (7)

Organzation of scores, Uses of a Talligram
Organzation of scores, Uses of a TalligramOrganzation of scores, Uses of a Talligram
Organzation of scores, Uses of a Talligram
 
NCompass Live: Presenting Data in Meaningful and Interesting Ways
NCompass Live: Presenting Data in Meaningful and Interesting WaysNCompass Live: Presenting Data in Meaningful and Interesting Ways
NCompass Live: Presenting Data in Meaningful and Interesting Ways
 
Data collection
Data collectionData collection
Data collection
 
DATA COLLECTION IN RESEARCH
DATA COLLECTION IN RESEARCHDATA COLLECTION IN RESEARCH
DATA COLLECTION IN RESEARCH
 
Educational psychology- Test and measurement
Educational psychology- Test and measurementEducational psychology- Test and measurement
Educational psychology- Test and measurement
 
Thesis Writing
Thesis WritingThesis Writing
Thesis Writing
 
Methods for Collecting Data
Methods for Collecting DataMethods for Collecting Data
Methods for Collecting Data
 

Similaire à Cassandra implementation for collecting data and presenting data

Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 
cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_finalSergioBruno21
 
Migrating Oracle database to Cassandra
Migrating Oracle database to CassandraMigrating Oracle database to Cassandra
Migrating Oracle database to CassandraUmair Mansoob
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdfhothyfa
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Lviv Startup Club
 
Cassandra presentation
Cassandra presentationCassandra presentation
Cassandra presentationSergey Enin
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage systemArunit Gupta
 
Cassandra
Cassandra Cassandra
Cassandra Pooja GV
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandraBrian Enochson
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptxbetalab
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandraPL dream
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache KuduJeff Holoman
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka MeetupCliff Gilmore
 

Similaire à Cassandra implementation for collecting data and presenting data (20)

Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_final
 
Migrating Oracle database to Cassandra
Migrating Oracle database to CassandraMigrating Oracle database to Cassandra
Migrating Oracle database to Cassandra
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
 
Cassandra presentation
Cassandra presentationCassandra presentation
Cassandra presentation
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Cassandra
Cassandra Cassandra
Cassandra
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptx
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
NoSQL
NoSQLNoSQL
NoSQL
 

Dernier

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Dernier (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Cassandra implementation for collecting data and presenting data

  • 1. Cassandra implementation for collecting data and presenting data Robert Chen robertchen117@gmail.com
  • 2. Agenda • SQL vs NOSQL • Why Cassandra • Cassandra introduction • Our architecture and design • Configuration best practice • How we write data • How we read data • Demo
  • 3. A highly scalable, eventually consistent, distributed, structured key-value store. Cassandra™ is the highly scalable and high performance distributed data infrastructure. Offering distribution of data across multiple data centers and incremental scalability with no single points of failure, Cassandra is the logical choice when you need reliability without compromising performance. Cassandra is relied upon by leading companies like Netflix, Twitter, Cisco, Rackspace, Ooyala, Openwave, and many more.
  • 4. SQL vs NOSQL • NOSQL • Not just SQL, schema free • Big data • NOSQL can service heavy read/write workloads • Probably not consistent in real time read • SQL • Can support complex join relationship • Oracle RAC solution for big data? Too expensive • Typical RDBMS implementations are tuned for small but frequent read/write transactions or for large batch transactions with rare write access • RDBMSs (they say) have shown poor performance on data-intensive applications, including: • Indexing a large number of documents • Serving pages on high-traffic websites • Handling the volumes of social networking data • Delivering streaming media • Consistent in all read
  • 5. Why Cassandra • To solve our central netapp filer storage bottleneck issue • Choose cassandra instead of Hbase • No Single point of failure • Fast development • Big data and dynamically changing environment • Good fit for horizontally production environment • Low total cost of ownership • No special hardware needed, just some x86 boxes
  • 6. Cassandra Design •High availability (A wily hare has three burrows ) •Eventual consistency • trade-off strong consistency in favor of high availability • allows you to choose strong consistency or allow varying degress of more relaxed consistency •Incremental scalability(linearly scalable), Horizontal! • Nodes added to a Cassandra cluster (all done online) increase the throughput of your database in a predictable, linear fashion for both read and write operations •Optimistic Replication •
  • 7. Cassandra Design II • All nodes are identical: decentralized/symmetric • No master or SPOF • Adding is simple • Distributed, read/write anywhere design • Massively scalable peer-to-peer architecture • Based on the best of Amazon Dynamo and Google BigTable • Minimal administration • Multi-datacenter replication • No caching layer required
  • 8. Cassandra Design III • very fast writes • fault tolerant, Guaranteed data safety • automatic provisioning of new nodes • big data • Transparent fault detection and recovery • Cassandra utilizes gossip protocols to detect machine failure and recover when a machine is brought back into the cluster – all without your application noticing.
  • 10. Write op (continue) • Writes go to log and memory table • Periodically memory table merged with disk table Cassandra node Disk RAM Log SSTable file Memtable Update (later)
  • 11. Read Query Closest replica Cassandra Cluster Replica A Result Replica B Replica C Digest Query Digest Response Digest Response Result Client Read repair if digests differ
  • 12. Configuration best practice • Put the data files on good performance RAID volumes • Start with Sun JDK 1.6+ • Configure with Java Native libs • The clocks on each node must be synchronized to maintain precision across the cluster on inserts.
  • 13. Data collection Architecture Web UI (High Chart/ JQuery) Active MQ (Message Bus) 1. collect data sent to Active MQ 2. Consume data, save to Cassandra 3. Filer the data, showing on the plots
  • 14. Data structure keyspace settings (eg, partitioner) column family settings (eg, comparator, type [Std]) column name value clock
  • 15. Our Data Model Company Logo CoreMetrics (keyspace) LoadAvg1 (Column family) host1_131696(row) Column:6449, value: 0.04 Column:5546, value: 0.02 host2_131811(row) Column:8227, value: 0.46 Column:9792, value: 1.30
  • 16. Our Data Model Company Logo CoreMetrics (keyspace) Primary (Column family) host1:loadAvg1 (row) Column:1316966449, value: 0.04 Column:1316965546, value: 0.02 host2:loadAvg1 (row) Column:1318118227, value: 0.46 Column:1318119792, value: 1.30
  • 17. Our Meta Data Model Company Logo CoreMetrics (keyspace) PrimaryMeta (Column family) host1.com (row) Column:loadAvg15:Total value: 1 Column:loadAvg15:Total value: 1 host2 (row) Column:loadAvg15:Total value: 1 Column:loadAvg15:Total value: 1
  • 18. Our Hbase Data Model Company Logo Primary (Column family) host1:loadAvg1:1 (row: host:metric:instance) Column:c:1316966449, value: 0.04 Column:c:1316965546, value: 0.02 host2:loadAvg1:1 (row: host:metric:instance) Column:1318118227, value: 0.46 Column:1318119792, value: 1.30
  • 19. Our Data Model (II) Company Logo • Keyspace: CoreMetrics (database name), one per application • Column families: (metrics, each metric is a column family) • loadAvg1 • loadAvg5 • etc (About 80 server metrics) • Rows and columns: inspired by the design of Hbase and opentsdb, we use the similar way to design our rows and columns: separate timestamp into row and column keys, which improve tremendously the reading performance
  • 20. How we write to cassandra Multiple data loaders connect to cassandra nodes 9160 port and insert data like this: $CLIENT = new Cassandra::CassandraClient($PROTOCOL); $CLIENT->set_keyspace($keyspace); $CLIENT->insert($rowkey, $column_parent, $column, $consistency_level);
  • 21. How we read data from cassandra Using pycassa to multiget of the rows and do some aggregation if too many data points returns. get_coremetrics(metric_name, host, stime, etime, samples = 1000):
  • 22. Demo: data model view Company Logo
  • 23. Demo: graphing the data Company Logo
  • 25. Thoughts and future 1.Migrate more applications to Cassandra 2.Livestat data (Bids/Listings…) 3.Help other team to do data collection and graphing?
  • 26. Reference URLs • Thrift (12 language bindings!) • http://wiki.apache.org/cassandra/ThriftInterface • http://thrift.apache.org/download/ • Pycassa • http://pycassa.github.com/pycassa/tutorial.html