MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
5. #MDBLocal
Why are we building this?
“IDC predicts that by 2025 worldwide data will reach
175 Zettabytes and 49% of it will reside in the public
cloud. “
VS
6. #MDBLocal
Atlas Data Lake Technical Deep Dive
1. Design Goals and Requirements
2. Creating an Atlas Data Lake
3. Atlas Data Lake Architecture
4. Future improvements
9. #MDBLocal
MongoDB Wire Protocol Support
Requirements
1) Look and act like MongoDB
Solution
Empty
• Implement a TCP server in Go.
• Used mongo-go-driver’s wireprotocol packagey
• Used mongo-go-driver's bson package
• Read only
10. #MDBLocal
MongoDB Security Model
Requirements
2) Access customer’s data securely.
Solution
Empty
• Users configured in MongoDB Atlas
• Same authentication and authorization
• Configure buckets
11. #MDBLocal
Scalable Processing
Requirements
3) Handle long running queries over vast amounts
of data using resources efficiently
Solution
Empty
• Read-only commands
• Use server’s aggregation engine
• Distributed MQL processing
• Intelligent file targeting
16. #MDBLocal
You control your data layout
Stores
Empty
Databases
Empty
Collections
Empty
DataSources
CollectionCollection
Store Store
Database
DataSource DataSource
DataSource
17. #MDBLocal
Data Lake Configuration
1. Configure a new Data Lake in Atlas
2. Connect to your Data Lake
3. Configure your databases and collections
4. Query your Data Lake
20. #MDBLocal
Querying via MongoDB Atlas
• Atlas users require readWriteAnyDatabase or readAnyDatabase roles.
• Use MongoDB drivers/clients including the mongo shell and MongoDB
Compass
• Write queries in MongoDB Query Language (MQL)
30. #MDBLocal
Atlas Data Lake is the best way to:
Access long-term data in multiple formats
Query long-term data using MQL
Analyse long-term data on demand