5. NoSQL and Big DataTwo different
topics, "joined" at
the end
Artificial Intelligence
6. NoSQL and Big DataTwo different
topics, "joined" at
the end
Artificial Intelligence
NoSQL for AI
7. How relational ("standard")
databases work
What was missing, enter "Big Data"
The different evolutions of Big Data
What we'll see:
NoSQL and Big Data
8. What is Artificial Intelligence (AI) and
Machine Learning (ML)
What can be done with AI and ML
How AI and ML work (visually, no
math)
What we'll see:
Artificial
Intelligence
13. RDBMSs features
● Uses as little disk as possible (it was expensive in the 70s)
● Simple and very popular query language (SQL)
● Data consistency (enforced with normalization)
14. RDBMSs restrictions
● All data on a single machine, doesn't scale
● Fixed schema from start, difficult to change (add columns)
● Reads and writes are expensive
● Difficulties with fast I/O
● Difficulties with big datasets, even more analytics
● Any failure is catastrophic
18. Big Data: Batch Processing
MapReduce
● Distributed
● Parallel
● Fault tolerant
● Batch, data processing
19. Big Data: Batch Processing
Released in 2005
Open Source
MapReduce implementation
HDFS (Hadoop Distributed File System)
20. Big Data: Distributed Batch Processing
● Distributed
● Batch analytics (processing all the data together)
● Aggregations (avg, max, min)
● Takes long, finishes at some point
● Fault tolerant
22. Big Data: Distributed, in-memory, batch
processing
In-memory first
Distributed batch processing
MapReduce (as with Hadoop)
...and other algorithms and tools
23. Big Data: Distributed, in-memory, batch
processing
Same features as Hadoop:
● Distributed
● Parallel
● Fault tolerant
● batch, data processing
● MapReduce
Plus:
● In-memory first, faster than Hadoop
● Other algorithms and tools apart from
MapReduce
26. Big Data for applications: NoSQL key-value
stores
● Distributed, parallel, fault tolerant, etc
● Minimal latency
● Read / Write individual records
● Know the IDs of each record
● Not complex queries
● Denormalization, duplicate for speed
● Data consistency is harder
● Precomputed aggregations (avg, max, min)
● Reads and Writes are cheap
● High volume, velocity, variety
29. Same characteristics as key-value stores:
● Distributed, parallel, fault tolerant, etc
● Minimal latency
● Read / Write individual records
● Know the IDs of each record
● Not complex queries
● Denormalization, duplicate for speed
● Data consistency is harder
● Precomputed aggregations (avg, max, min)
● Reads and Writes are cheap
● High volume, velocity, variety
Plus:
● Memory first storage, faster
Big Data for applications: NoSQL key-value
memory stores
32. Big Data for applications: NoSQL Document
Stores
Plus:
● Complex structures (JSON "documents")
● Arbitrary indexes on non-key fields
● Complex queries, by non-key fields
● Some denormalization, much less duplication
● Data consistency is easier than key-value
Key-value stores' characteristics:
● Distributed, parallel, fault tolerant, etc
● Minimal latency
● Read / Write individual records
● Precomputed aggregations (avg, max, min)
● Reads and Writes are cheap
● High volume, velocity, variety
35. Big Data for applications: Search Engines
Search Engines' characteristics:
● Distributed, parallel fault-tolerant...
● Minimal latency
● Store complex text documents
● Specialized text processing indexes for search
● Copy data from main data store to search engine
38. Big Data for applications: data synchronization
Plus:
● Server to server data synchronization
● Edge data synchronization (mobile, IoT)
● Offline-first complex applications
NoSQL document stores characteristics:
● Distributed, parallel, fault tolerant, etc
● Minimal latency
● Read / Write individual records
● Precomputed aggregations (avg, max, min)
● Reads and Writes are cheap
● High volume, velocity, variety
● Complex structures (JSON "documents")
● Arbitrary indexes on non-key fields
● Complex queries, by non-key fields
● Some denormalization, much less duplication
● Data consistency is easier than key-value
42. NoSQL restrictions ● Extra work when:
○ Duplication is needed
○ Data updates are required
● New way of thinking and designing systems
○ Some extra learning for RDBMS people
● Strict multi-record transactions require more work
○ But in many cases a single record (document) can
store the transaction data
○ Like bank transactions