Element Fleet has the largest benchmark database in our industry and we needed a robust and linearly scalable platform to turn this data into actionable insights for our customers. The platform needed to support advanced analytics, streaming data sets, and traditional business intelligence use cases.
In this presentation, we will discuss how we built a single, unified platform for both Advanced Analytics and traditional Business Intelligence using Cassandra on DSE. With Cassandra as our foundation, we are able to plug in the appropriate technology to meet varied use cases. The platform we’ve built supports real-time streaming (Spark Streaming/Kafka), batch and streaming analytics (PySpark, Spark Streaming), and traditional BI/data warehousing (C*/FiloDB). In this talk, we are going to explore the entire tech stack and the challenges we faced trying support the above use cases. We will specifically discuss how we ingest and analyze IoT (vehicle telematics data) in real-time and batch, combine data from multiple data sources into to single data model, and support standardized and ah-hoc reporting requirements.
About the Speaker
Jim Peregord Vice President - Analytics, Business Intelligence, Data Management, Element Corp.
24. SPIN
ODS ADS FILO
DB
CASSANDRA/SPARK
JDBC
SPARK
C* C*
ETL - TALEND
RELOAD INCREMENTAL INCREMENTAL
THRIFT
ODS is truncate/load daily.
ADS is complete replica of the source system. Incremental ETL strategy.
ODS tables are used to load FiloDB table (incremental) using Spark Jobs.
SSRS
Power BI
Example: ETL Incremental Load Strategy