The document discusses Apache Spark and its integration with MongoDB. It provides an overview of Spark's architecture and capabilities including Spark SQL, streaming, machine learning libraries. It then covers use cases and benefits of using Spark with MongoDB, including real-time analytics, fraud detection, and time series analysis. The document demonstrates how the Stratio Spark-MongoDB connector allows querying and analyzing MongoDB data using Spark SQL and DataFrames.
15. Spark Stack
Spark SQL
Spark
Streaming
MLIB GraphX
Apache Spark
Seamless integration
with SQL using
DataFrame API. Also
supports HIVE SQL
Fast Feed data processing API.
Designed for Fault Tolerance and
bridges streaming with batch processing
MLib is Spark machine
learning algorithms trick bag.
Spark graph library
43. 43
MongoDB Hadoop Connector
Positive Not So Good
Battle Tested Not the fastest thing
Integrated with existing
Hadoop components
Not dedicated to Spark
Supports HIVE and PIG Dependent on HDFS
http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-hadoop/
48. 48
Stratio Spark-MongoDB
val dfFiveMinForMonth = sqlContext.sql(
"""
SELECT m.Symbol, m.OpenTime as Timestamp, m.Open, m.High, m.Low, m.Close
FROM
...
FROM minbars)
as m
WHERE unix_timestamp(m.CloseTime, 'yyyy-MM-dd HH:mm') - unix_timestamp(m.OpenTime,
'yyyy-MM-dd HH:mm') = 60*4"""
)
55. 55
What to expect
• We are working on a dedicated Spark Connector for
MongoDB
• Stratio Connector is great but:
–Some Operations are actually faster if performed using
Aggregation Framework
• Better Integration with upcoming 3.2 Async Java Driver
–Specially for the Apache Streaming Support
56. MongoDB Days 2015
05 November, 2015 London
https://www.mongodb.com/events/mongodb-days-uk
57. 57
Engineering
Sales & Account Management Finance & People Operations
Pre-Sales Engineering Marketing
Join the Team
View all jobs and apply: http://grnh.se/pj10su