This talk is an introduction to a powerful combination in the big data space: Apache Spark and Cassandra. Spark is a cluster-computing framework that allows users to perform calculations against resilient in-memory datasets using a functional programming interface. Cassandra is a linearly scalable, fault tolerant, decentralized datastore. These two technologies are complicated, but integrate well and provide such a level of utility that whole companies have formed around them.
In this talk we’ll learn how Spark and Cassandra can be leveraged within your Groovy Application: Spark normally asks for a Scala environment. We’ll talk about Spark and Cassandra from a high level and walk through code examples. We’ll discuss the pitfalls of working with these technologies - like modeling your data appropriately to ensure even distribution in Cassandra and general packaging woes with Spark - and ways to avoid them. Finally, we’ll explore how we at ThirdChannel are using these technologies.