4. v1.1
Drowning in Big Data (Buzz)
Data Streams
flood Data Warehouse
flow Data Lake
flow
not an ad, just a pun!!!
n
n
n
5. v1.1
Just say No!
● NoSQL as in NoACID
○ No consistency
○ Eventual consistency
○ M.C. Srivas: “Eventual in-consistency”
● NoSQL as in No SQL!
○ KV stores: DHT, B-Tree
○ Flat CSV, TSV + JSON
○ MapReduce and NoMapReduce/Spark
not an ad, just a pun!!!
6. v1.1
“Full Circle”
● Is SQL good, or just familiar?
○ tons of tooling
○ SQL-on-Hadoop
● Transactions
○ consistency across X with acceptable latency
○ Snapshot Isolation
7. DB is new “Hello World”
We don’t use RDBMS because $$$ or … but need
● Cache Management
● ACID
● Indexes
● Oracle’s Table Clusters ~ CoHadoop
...
Lots of opportunities for DB students !!!
v1.1
8. v1.1
No “One Size Fits All”
RDBMS is a behemoth with great features!
But we want flexible reusable components:
● Intermediate / Physical plan
● DSL: Scala, SQL, Geo, etc.
● Storage Engines and Transactions
Web-scale friendly licenses / free OSS
9. v1.1
Thank you!
More @Twitter @VLDB2014
● Pankaj Gupta et al: “Real-Time Twitter
Recommendation: Online Motif Detection in Large
Dynamic Graphs”, Industrial: Analytics, Wed
● Oscar Boykin et al: “Summingbird: A Framework for
Integrating Batch and Online MapReduce
Computations”, Industrial: Big Data 1, Thu