NoSQL for Artificial Intelligence

NoSQL for Artificial Intelligence
How NoSQL Fundamentally Changed Big Data,
Machine Learning and Artificial Intelligence

Who am I? Sebastián Ramírez
Chief Data Scientist at
Datum Consultants
Boutique Data Science company specialized in
Artificial Intelligence

Two different
topics, "joined" at
the end
NoSQL and Big Data

NoSQL and Big DataTwo different
topics, "joined" at
the end

NoSQL and Big DataTwo different
topics, "joined" at
the end
NoSQL for AI

How relational ("standard")
databases work
What was missing, enter "Big Data"
The different evolutions of Big Data
What we'll see:
NoSQL and Big Data

What is Artificial Intelligence (AI) and
Machine Learning (ML)
What can be done with AI and ML
How AI and ML work (visually, no
math)
What we'll see:
Artificial
Intelligence

What we'll see:
NoSQL for AI
NoSQL and Big Data Artificial Intelligence
NoSQL for AI

Relational Database
Management
Systems (RDBMSs)

RDBMSs features
● Uses as little disk as possible (it was expensive in the 70s)
● Simple and very popular query language (SQL)
● Data consistency (enforced with normalization)

RDBMSs restrictions
● All data on a single machine, doesn't scale
● Fixed schema from start, difficult to change (add columns)
● Reads and writes are expensive
● Difficulties with fast I/O
● Difficulties with big datasets, even more analytics
● Any failure is catastrophic

Big Data
None of these work well with traditional RDBMS

Big Data:
Distributed batch
processing

Big Data: Batch Processing
MapReduce
● Distributed
● Parallel
● Fault tolerant
● Batch, data processing

Big Data: Batch Processing
Released in 2005
Open Source
MapReduce implementation
HDFS (Hadoop Distributed File System)

Big Data: Distributed Batch Processing
● Distributed
● Batch analytics (processing all the data together)
● Aggregations (avg, max, min)
● Takes long, finishes at some point
● Fault tolerant

Big Data:
Distributed,
in-memory, batch
processing

Big Data: Distributed, in-memory, batch
processing
In-memory first
Distributed batch processing
MapReduce (as with Hadoop)
...and other algorithms and tools

Big Data: Distributed, in-memory, batch
processing
Same features as Hadoop:
● Distributed
● Parallel
● Fault tolerant
● batch, data processing
● MapReduce
Plus:
● In-memory first, faster than Hadoop
● Other algorithms and tools apart from
MapReduce

Big Data for
applications:
NoSQL

Big Data for
applications:
NoSQL, key-value
stores

Big Data for applications: NoSQL key-value
stores
● Distributed, parallel, fault tolerant, etc
● Minimal latency
● Read / Write individual records
● Know the IDs of each record
● Not complex queries
● Denormalization, duplicate for speed
● Data consistency is harder
● Precomputed aggregations (avg, max, min)
● Reads and Writes are cheap
● High volume, velocity, variety

NoSQL, key-value stores: examples

Big Data for
applications:
NoSQL, key-value
memory stores

Same characteristics as key-value stores:
● Minimal latency
● Know the IDs of each record
● Not complex queries
● Denormalization, duplicate for speed
● Data consistency is harder
Plus:
● Memory first storage, faster
Big Data for applications: NoSQL key-value
memory stores

NoSQL, key-value memory stores: examples

Big Data for
applications:
NoSQL, Document
Stores

Big Data for applications: NoSQL Document
Stores
Plus:
● Complex structures (JSON "documents")
● Arbitrary indexes on non-key fields
● Complex queries, by non-key fields
● Some denormalization, much less duplication
● Data consistency is easier than key-value
Key-value stores' characteristics:
● Minimal latency

NoSQL, Document Stores: examples

Big Data for
applications:
NoSQL, Search
Engines

Big Data for applications: Search Engines
Search Engines' characteristics:
● Distributed, parallel fault-tolerant...
● Minimal latency
● Store complex text documents
● Specialized text processing indexes for search
● Copy data from main data store to search engine

NoSQL, Search Engines: examples

Big Data for
applications:
NoSQL, data
synchronization

Big Data for applications: data synchronization
Plus:
● Server to server data synchronization
● Edge data synchronization (mobile, IoT)
● Offline-first complex applications
NoSQL document stores characteristics:
● Minimal latency
● Complex structures (JSON "documents")
● Arbitrary indexes on non-key fields
● Complex queries, by non-key fields
● Some denormalization, much less duplication
● Data consistency is easier than key-value

NoSQL, data synchronization: examples

Big Data for
applications:
NoSQL, with all the
toppings

NoSQL restrictions ● Extra work when:
○ Duplication is needed
○ Data updates are required
● New way of thinking and designing systems
○ Some extra learning for RDBMS people
● Strict multi-record transactions require more work
○ But in many cases a single record (document) can
store the transaction data
○ Like bank transactions

What is Artificial
Intelligence (AI)?
Artificial
Intelligence
Machine
Learning
Deep
Learning

What can be done
with Machine
Learning (ML)?

How Machine
Learning works
...explained without math

How NoSQL helps AI / ML
Data Volume

Data Velocity

Data Variety

● ML iteration
● Update schemas,
queries

Thank you! Questions?
Sebastián Ramírez
@tiangolo
sebastian@datumcon.com

NoSQL for Artificial Intelligence

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à NoSQL for Artificial Intelligence

Similaire à NoSQL for Artificial Intelligence (20)

Plus de Sebastián Ramírez Montaño

Plus de Sebastián Ramírez Montaño (7)

Dernier

Dernier (20)

NoSQL for Artificial Intelligence