Does exploring data excite you? Do you use Python or R as your language of choice for data analysis? Does your job title include the term Data Analyst? If you answered yes to any of those questions, then the Exploring Your MongoDB Data with Pirates and Snakes is the session for you! MongoDB Developer Advocate Ken Alger will show the suggested method for using dataframe structures in R and Python with your MongoDB data. He’ll show the code for best practices in both languages to move your array based MongoDB data into the popular fast, flexible, and expressive dataframes used for data analysis in these prominent programming languages.
5. Document Model Features
Naturally maps objects to
code using JSON
Represent data of any
structure. Our data model is
very flexible.
Strongly typed for ease of
processing. We support
over twenty binary
encoded JSON data types.
6. Document Model for Analytics
Flexibility helps with
feature engineering by
allowing for
experimentation and the
picking of features
iteratively.
For Deep Learning the
flexibility allows for faster
iteration.
Pre-filtering of data with
aggregation framework.
14. Data Frame
A data frame is a list of vectors, factors,
and/or matrices all having the same length
(number of rows in the case of matrices).
Used for storing data tables.
15. Data Frame Data
Length (ft) Year Completed Displacement Sail Area (sq. ft)
Batavia 186 1628 1200 33000
Cutty Sark 280 1869 2100 32000
Götheborg 190 1738 NaN 21140
HMS Endeavor 97.75 1764 NaN 29889
Kruzenshtern 375 1926 3064 NaN
HMS Victory 227.5 1765 3500 58556
18. R Data Frame
R dataframe is more or less built into the
language.
More functional than Python.
More statistical support in general.
19. Python Data Frame
More object-oriented.
Relies on packages (pandas, numpy, scikit-
learn)
As a language it’s great for additional tasks
along side of analytics.
37. Aggregation Framework
• Pre-filter and/or pre-aggregate data on the server before
moving it across the network.
• Reduces the amount of data in the data frame.
• Improves performance.
38. Sample Data
Country
Year
Completed
Displacement
Individual Sail Areas
(sq. ft)
Batavia NLD 1628 1200 [292, 2012, 990, 550, 403, 642, 1056, ...]
Cutty Sark GBR 1869 2100 [2408, 866, 155, 2041, 518, 1675, …]
Götheborg SWE 1738 NaN [315, 614, 314, 2451, 2096, 2477, …]
HMS
Endeavor
GBR 1764 NaN [1060, 2089, 1101, 420, 2320, 2245]
Kruzenshtern DEU 1926 3064 [1476, 1352, 2383, 1100, 1807, 448, 2415]
HMS Victory GBR 1765 3500 [1310, 2445, 1327, 1668, 2098, 2179, …]
43. Results
0
La Amistad 19335
Batavia 31246
Götheborg 18464
HMS Endeavour 9235
Golden Hind 11749
Grand Turk 8405
Kalmar Nyckel 3710
Lady Nelson 24938
Pallada 20785
Shtandart 9363
HMS Sultana 35061
HMS Surprise 26272
HMS Trincomalee 21744
HMS Victory 14740
44. Other Sessions
Today
1:00pm Real-time Clinical Decision Support System – Prem Timisina & Arash Kia
2:00 Analytics with MongoDB – Stuart Shiell & Mark Clancy
2:00 A Complete Methodology to Data Modeling for MongoDB –Daniel Coupal
3:15 Unleash the Power of the MongoDB Aggregation Framework – Abhishek Bagga
Tomorrow
9:00am Best Practices for Working with IoT and Time-series Data – Robert Walters
3:00pm MongoDB in Data Science – Vigen Sahakyan
45. Takeaways
MongoDB's flexible data model is very powerful for data analytics.
Some analytic tools require a more structured approach.
When forming your data the schema design used can make a huge
impact on analytics.
Use MongoDB's Aggregation Framework to improve performance.