With the new Python driver for Cassandra it is easy to build integrations and apps that use Cassandra seamlessly as a back in. This session will explore what it takes to build the app and the features available with the new Python drivers.
Apidays New York 2024 - The value of a flexible API Management solution for O...
Webinar | Building Apps with the Cassandra Python Driver
1. Building Apps with the Cassandra Python Driver
Eddie Satterly–
CTO Big Data & Analytics at CSC
Dial In: 1-877-668-4493
Access Code: 807 224 168
2. Where is the Driver
https://github.com/datastax/python-driver
3. Key Features
The driver is a connection handler for the Cassandra system
underneath your app with a low-level API. The key features which
really helped simplify the python code from the earlier version of the
app are:
Connection Pooling & Node Discovery – This lets you
connect to the whole set of nodes providing only the seed nodes
in your list. With my old driver you had to provide the list of all
nodes and make the python code decide how to connect.
You give it this set of nodes 192.168.1.1 & 192.168.1.2 and the
driver makes a connection and automatically discovers all other
nodes in the cluster instance.
4. Key Features Cont.
Cluster Attributes – There are several cluster object attributes you can
set but some of the key ones are the ability to set a default keyspace via
the method cluster.connect(‘mykeyspace’) as well as setting the CQL
version for cluster that run in mixed mode due to different timing of data
models being built also metrics_enabled which controls metrics collection
SSL_Options – This attribute is called out separately due to the high
value of this in environments where client to node communication needs
to be encrypted and that feature is turned on cluster side. While this is not
turned on by default in my app it is needed for many of the customers that
are using it.
Load balancing – This is a great added feature that really helps to avoid
hotspot nodes in the older driver approach as now you set the policy in an
attribute (roundrobin is the default) and the driver controls connection. In
early test with the old driver even though the code was supposed to pick a
pseudo-random node affinity seemed to happen and creat hotspot nodes
for queries.
5. Key Features Cont.
default_timeout– Setting a timeout so that the app can detect failures and respond
without leaving the client hanging is key
row_factory – This lets you determine what format to return the results in. This is
super valuable to make sure your app has the data returned in the optimal way for
analysis and manipulation. There were over 50 lines on code in my old python scripts
to handle one-offs that are now gone since this feature exists. Below are the options:
execute_async() – This is one of the best features in the new driver and
makes the processing time for requests much faster from the client PoV.
There is a method to call to force blocking for results to this if needed
but in most cases doing other work while waiting on results providers
speeds up the response times by many milliseconds.
6. Take a Look at Docs
There are many other features I did not call out so take a look
at:
http://datastax.github.io/python-driver/index.html
http://datastax.github.io/python-driver/api/index.html
For high throughput operations like remote lookups I
highly suggest using multiprocessing module instead of
using multithreading, but make sure you understand the
implication with object passing.
7. How I Use It
Take a look at my github in a couple of weeks the new version of the
app will be there using this driver once all the final testing is done.
The current version there is using the old driver and approach so
look for v2.0
https://github.com/esatterly/splunk-cassandra
Build your own playgrounds and figure out the right options and configuration
settings to return data and do analysis and manipulation on it. I will be putting two
other apps out in the next few months for other non-Splunk use cases as well so
stay tuned.