Did you know there are some things you should doing when writing your data driven application using Apache Cassandra? Let’s talk about that. There are features in almost every driver that will keep your application online and running fast. You should know what they are. Even if you know these features, I’m going to tell why they work and why they’re a good idea. This will not be language specific; I will be using multiple languages and drivers. This talk should appeal to programmers in general.
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Things YouShould Be Doing When Using Cassandra Drivers
1. Things You Should Be Doing When
Using Cassandra Drivers
Rebecca Mills
Junior Evangelist at Datastax
@rebccamills
2. What do I do?
2
Confidential
• Try to create awareness for open source Cassandra
• Develop content
• Identify problems newcomers might be encountering
• Develop strategies and material to help with that first
ease of initial use
3. Of course all this extends to drivers!
Confidential
3
• Learning and playing with the drivers as much as I
can
• Develop “Getting Started” tutorials for drivers in
various programming languages
• Making it my mission to bring the details to light
4. So How Can We Communicate with
Cassandra in “X” Language?
Confidential
4
5. We have what you need!
Confidential
5
• Datastax provides drivers for Java, Python, C#
• Fresh out of the oven Ruby, Node.js, and C++
• Also loads of open source drivers to chose from
• Check out the Planet Cassandra Client Drivers
section
7. 1. One Cluster instance per cluster
Confidential
7
• Configure different important aspects of the way
connections and queries will be handled.
• Contact points
• Retry Policies
• Load Balancing Policies
cluster
=
Cluster(['10.1.1.3',
'10.1.1.4',
'10.1.1.5'],
compression=True,
load_balancing_policy=TokenAwarePolicy(
DCAwareRoundRobinPolicy(local_dc='US_EAST')))
8. 2. One Session per keyspace
Confidential
8
• Query execution, connection pooling
• Long-lived object
• Not to be used in a request/response short-lived
fashion
• Share the same cluster and session instances
across your application
10. 3. Use Prepared Statements
Confidential
10
• If you execute a statement more than once
• Has multiple benefits
• Prepare once, bind and execute multiple times
• We’ll talk more about this soon!
14. Why use Prepared Statements?
Confidential
14
• More performant than using strings
• Will be parsed only once on the server
• We expect you to use them with repeated queries in
production
• Avoid CQL injection
15. Prepared Statements
Confidential
15
Consider a string
session.execute(""”
INSERT
INTO
users
(lastname,
age,
city,
email,
firstname)
VALUES
(‘Jones’,
35,
‘Austin’,
‘bob@example.com’,
‘Bob’)
"""
16. Prepared Statements
Confidential
16
session.execute("""
INSERT
INTO
users
(lastname,
age,
city,
email,
firstname)
VALUES
(‘Smith’,
24,
‘Tampa’,
‘ken@example.com’,
‘Bob’)
""")
session.execute(""”
INSERT
INTO
users
(lastname,
age,
city,
email,
firstname)
VALUES
(‘Power’,
45,
‘New
York’,
‘kate@example.com’,
‘Kate’)
""")
session.execute(""”
INSERT
INTO
users
(lastname,
age,
city,
email,
firstname)
VALUES
(‘Renolds’,
33,
‘Miami’,
‘carl@example.com’,
‘Carl’)
""")
17. Prepared Statements
Confidential
17
Now the same, as a prepared statement
Prepared_stmt
=
session.prepare
(“INSERT
INTO
users
(lastname,
age,
city,
email,
firstname)
VALUES
(?,
?,
?,
?,
?)”)
Bound_stmt
=
prepared.bind([‘Jones’,
35,
‘Austin’,
‘bob@example.com’,
‘Bob’])
Stmt
=
session.execute(bound_stmt)
19. Prepared Statements
Confidential
19
Client Cassandra
Entire Query String
Client Cassandra
Query ID & BoundValues
INSERT with strings
INSERT with PreparedStatements
Large amount of data
Parse cost
Smaller amount of data
No parsing
23. Prepared Statements
Confidential
23
Putting a prepared statement in a for loop is an anti-
pattern
for
(int
i;
i
<
10;
i++)
{
PreparedStatement
ps
=
session.prepare("UPDATE
user
SET
disabled
=
1
WHERE
id
=
?");
session.execute(ps.bind(i));
}
24. Load Balancing
Confidential
24
• A load balancing policy will determine which node to
run an insert or query.
• Since a client can read or write to any node,
sometimes that can be inefficient.
• If a node receives a read or write owned on another
node, it will coordinate that request for the client.
• We can use a load balancing policy to control that
action.
25. Load Balancing deep dive
Confidential
25
Using this example
Cluster cluster = new Cluster!
.builder().!
.addContactPoint(“10.0.0.1”)!
.withRetryPolicy(DefaultRetryPolicy.INSTANCE)!
.withLoadBalancingPolicy(!
new TokenAwarePolicy(!
new DCAwareRoundRobinPolicy())!
26. Example data model
Confidential
26
CREATE TABLE users (!
username text PRIMARY KEY!
firstName text,!
lastName text!
);!
!
INSERT INTO users (username, firstName, lastName)!
VALUES (‘rmills’, ‘Rebecca’, ‘Mills’);!
!
INSERT INTO users (username, firstName, lastName)!
VALUES (‘pmcfadin’, ‘Patrick’, ‘McFadin’);!
!
37. Without Token Aware
Confidential
37
Using this modified example
Cluster cluster = new Cluster!
.builder().!
.addContactPoint(“10.0.0.1”)!
.withRetryPolicy(DefaultRetryPolicy.INSTANCE)!
.withLoadBalancingPolicy(!
new DCAwareRoundRobinPolicy())!
38. Request for data
Confidential
38
Client 10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
SELECT firstName!
FROM users!
WHERE userName = ‘pmcfadin’;!
pmcfadin Murmur3 Hash Token = 77!
DC1!
42. Load Balancing
Confidential
42
• Default pre-java 2.0.2: RoundRobinPolicy
• Now: TokenAwarePolicy – Adds token awareness to
a child policy
• Acts as a filter, wraps around another policy
• Used to reduce network hops, as only replicas will
be considered
43. Load Balancing - Whitelist
Confidential
43
• Ensures only the hosts from a provided list are used
• Wraps a child policy
• Used to limit the effects of automatic peer discovery
• Execute queries only a given list of hosts
44. Asynchronous Statements
Confidential
44
• Native binary protocol
supports request
pipelining
• A single connection can
be used for single
simultaneous and
independent request/
response exchanges
45. Asynchronous Statements
Confidential
45
• Don’t have to wait for a query to complete and
return rows directly, non-blocking IO
• Method almost immediately returns a future
object
NodeClient
46. Asynchronous Statements
Confidential
46
query
=
"SELECT
*
FROM
users
WHERE
lastname=%s"
future
=
session.execute_async(query,
[lastname])
#
...
do
some
other
work
try:
rows
=
future.result()
user
=
rows[0]
print
user.name,
user.age
except
ReadTimeout:
log.exception("Query
timed
out:")
47. Asynchronous Statements
Confidential
47
#
build
a
list
of
futures
futures
=
[]
query
=
"SELECT
*
FROM
users
WHERE
lastname=%s"
for
user_id
in
ids_to_fetch:
futures.append(session.execute_async(query,
[lastname])
#
wait
for
them
to
complete
and
use
the
results
for
future
in
futures:
rows
=
future.result()
print
rows[0].name,
rows[0].age
48. Where can I download the drivers?
Confidential
48
49. Planet Cassandra
Confidential
49
• A great place for Apache Cassandra resources!
• Blog post, webinars, tutorials, and much much more!
• Also a great place for your driver needs