SlideShare a Scribd company logo
1 of 54
The Shape of Data to Come
it isn’t what we thought it was
©MapR Technologies - Confidential

1
Do you
remember
the future?
©MapR Technologies - Confidential

2
©MapR Technologies - Confidential

3
Some things
turned out
as expected
©MapR Technologies - Confidential

4
Guys wearing
Fedoras

©MapR Technologies - Confidential

5
What about
“Big Data”?

©MapR Technologies - Confidential

6
Harvard University
6
will have 200 x 10
volumes by 2040
Fremont Rider, 1944

©MapR Technologies - Confidential

7
To cope … only short papers
should be published. … not
more than 2500 characters
counting “space,” punctuation
marks, etc.
Gray and Ruston in IEEE Transactions on
Electronic Computers, 1964

©MapR Technologies - Confidential

8
Remember
the guy in
the Fedora?

©MapR Technologies - Confidential

9
He’s tweeting
about this
right now

©MapR Technologies - Confidential

10
So what is the big
data monorail and
what is the cool
hat?
©MapR Technologies - Confidential

11
Data curation
Rigid Schemas
Engineered Structure

©MapR Technologies - Confidential

12
Data curation
Rigid Schemas
Engineered Structure

©MapR Technologies - Confidential

13
Data as-you-find-it
Flexible schemas
Late binding

©MapR Technologies - Confidential

14
Data as-you-find-it
Flexible schemas
Late binding

©MapR Technologies - Confidential

15
©MapR Technologies - Confidential

16
©MapR Technologies - Confidential

17
©MapR Technologies - Confidential

18
©MapR Technologies - Confidential

19
Why is it different?
How does it work?

©MapR Technologies - Confidential

20
The Conventional Answer
More data is being produced more quickly
Data sizes are bigger than even a very large computer can hold
Cost to create and store continues to decrease

©MapR Technologies - Confidential

21
Analytics Scaling Laws


Analytics scaling is all about the 80-20 rule
–
–



The key to net value is how costs scale
–
–



Big gains for little initial effort
Rapidly diminishing returns
Old school – exponential scaling
Big data – linear scaling, low constant

Cost/performance has changed radically
–

IF you can use many commodity boxes

©MapR Technologies - Confidential

22
Which bytes
first?

©MapR Technologies - Confidential

23
©MapR Technologies - Confidential

24
1

Value

0.75

0.5

0.25

0
0

500

1000
Scale

©MapR Technologies - Confidential

25

1500

2,000
1

Value

0.75

Net value optimum has a
sharp peak well before
maximum effort

0.5

0.25

0
0

500

1000
Scale

©MapR Technologies - Confidential

26

1500

2,000
But scaling laws are changing
both slope and shape

©MapR Technologies - Confidential

27
1

Value

0.75

0.5

More than just a little

0.25

0
0

500

1000
Scale

©MapR Technologies - Confidential

28

1500

2,000
1

Value

0.75

0.5

They are changing a LOT!
0.25

0
0

500

1000
Scale

©MapR Technologies - Confidential

29

1500

2,000
©MapR Technologies - Confidential

30
©MapR Technologies - Confidential

31
1

Value

0.75

0.5

0.25

0
0

500

1000
Scale

©MapR Technologies - Confidential

32

1500

2,000
1

Value

0.75

0.5

0.25

0
0

500

1000
Scale

©MapR Technologies - Confidential

33

1500

2,000
1

0.75

Value

A tipping point is reached and
things change radically …
0.5

Initially, linear cost scaling
actually makes things worse
0.25

0
0

500

1000
Scale

©MapR Technologies - Confidential

34

1500

2,000
Evolution of Data Storage

Scalability
Over decades of progress,
Unix-based systems have set
the standard for compatibility
and functionality
Linux
POSIX

Functionality
Compatibility
©MapR Technologies - Confidential

35
Evolution of Data Storage

Scalability
Hadoop achieves much higher
Hadoop
scalability by trading away
essentially all of this compatibility

Linux
POSIX

Functionality
Compatibility
©MapR Technologies - Confidential

36
Evolution of Data Storage

Scalability
Hadoop

MapR enhances Apache Hadoop by
restoring the compatibility while
increasing scalability and performance
Linux
POSIX

Functionality
Compatibility
©MapR Technologies - Confidential

37
Introducing MapR

MapR offers the
technology leading
distribution for Hadoop

©MapR Technologies - Confidential

38
The Industry-Leaders Choose MapR in
the Cloud

Google chose MapR to
provide Hadoop on Google
Compute Engine

Amazon EMR is the largest
Hadoop provider in revenue
and # of clusters

©MapR Technologies - Confidential

39
MapR Supports Broad Set of Use Cases
Leading Retailer

Leading Bank





Recommendation Engine
Fraud detection and Prevention



Customer Behavior Analysis
Brand Monitoring





Customer targeting
Viewer Behavioral analytics





Intrusion detection & prevention
Forensic analysis



Recommendation Engine
Family tree connections






Patient care
monitoring







Log analysis
HBase



Clickstream Analysis
Quality profiling/field
failure analysis





Fraud Detection
Channel analytics



Customer Revenue
Analytics
ETL Offload

©MapR Technologies - Confidential



Advertising exchange
analysis and optimization



Customer targeting
Social media analysis



40





Global threat
analytics
Virus analysis

Customer
Sentiment
Network Analytics

Monitors and measures
behavior of online shoppers
MapR

MapR
The guys with the
cool hats

©MapR Technologies - Confidential

41
MapR’s Innovations

©MapR Technologies - Confidential

42
Seamless integration with existing applications


100% POSIX compliant



Industry standard APIs
- NFS, ODBC, LDAP, REST



More 3rd party solutions



Proprietary connectors
unnecessary



Language neutral

©MapR Technologies - Confidential

43
MapR’s Innovations

©MapR Technologies - Confidential

44
MapR: Lights Out Data Center Ready

Reliable Compute

Dependable Storage



Automated stateful failover





Automated re-replication





Self-healing from HW
and SW failures





Load balancing



Rolling upgrades



No lost jobs or data



99999’s of uptime

©MapR Technologies - Confidential




45

End-to-end checksums
Strong consistency
Business continuity with
snapshots and mirrors
Recover to a point in time
with snapshots
Mirror across sites for
disaster recovery
MapR’s Innovations

©MapR Technologies - Confidential

46
Why MapR Is Faster
Lockless Storage
Service™
Direct Block Device
IO
Hadoop Direct
Shuffle

• Eliminates storage contention

• Provides throughput at device speed
• Exploits MapR-FS architecture to deliver
performance using Hadoop Direct Shuffle

Client Side
Compression

• Reduces network overhead using automatic
compression

C vs Java

• Eliminates sporadic Java garbage collection
overhead (system written in C)

©MapR Technologies - Confidential

47
Security


MapR is pushing the envelope on Hadoop security



Integrates with Linux security (PAM)
–



Strong wire-level authentication and encryption
–



Works with any user directory: Active Directory, LDAP, NIS, …

Kerberos and non-Kerberos options

Fine-grained access control
–
–
–
–

Full POSIX permissions on files and directories
ACLs on tables, column families, columns, cells
ACLs on MapReduce jobs and queues
Administration ACLs on cluster and volumes

©MapR Technologies - Confidential

48
Bullet-proof NoSQL with Zero Administration

Performance

Reliability

Easy
Administration

Benefit

Features

High Performance

Over 1 Million ops/sec with 10 Node Cluster

Continuous Low Latency No I/O Storms, No Compactions
24x7 Applications

Instant Recovery, Online Schema Modification,
Snapshots, Mirroring

Zero Administration

No Processes to Manage, Automated Splits, Self-tuning

High Scalability

1 Trillion Tables

Low TCO

Files and Tables on One Platform

©MapR Technologies - Confidential

49
MapR M7 vs. CDH – Mixed Load (50-50)

©MapR Technologies - Confidential

50
MapR M7 vs. CDH – Mixed Load (50-50)

©MapR Technologies - Confidential

51
MapR

MapR
The guys with the
cool solutions

©MapR Technologies - Confidential

52
MapR

MapR
The future of
the future

©MapR Technologies - Confidential

53
Thank You

©MapR Technologies - Confidential

54

More Related Content

What's hot

Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesTed Dunning
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache MahoutTed Dunning
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendationsTed Dunning
 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveTed Dunning
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesTed Dunning
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningTed Dunning
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Ted Dunning
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationTed Dunning
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation TechnTed Dunning
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real DataTed Dunning
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to NewMapR Technologies
 
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterHow to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterDataWorks Summit
 
Storm users group real time hadoop
Storm users group real time hadoopStorm users group real time hadoop
Storm users group real time hadoopTed Dunning
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedTed Dunning
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionTed Dunning
 

What's hot (19)

Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search engines
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache Mahout
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendations
 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the Hive
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for Recommendation
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to New
 
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterHow to Determine which Algorithms Really Matter
How to Determine which Algorithms Really Matter
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Storm users group real time hadoop
Storm users group real time hadoopStorm users group real time hadoop
Storm users group real time hadoop
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinned
 
GoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 SkinnedGoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 Skinned
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
 

Viewers also liked

Search as recommendation
Search as recommendationSearch as recommendation
Search as recommendationTed Dunning
 
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQL
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQLHBase and Drill: How Loosely Typed SQL is Ideal for NoSQL
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQLMapR Technologies
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engineLars Marius Garshol
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineTrey Grainger
 

Viewers also liked (6)

Search as recommendation
Search as recommendationSearch as recommendation
Search as recommendation
 
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQL
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQLHBase and Drill: How Loosely Typed SQL is Ideal for NoSQL
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQL
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engine
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 

Similar to What is the past future tense of data?

Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07Ted Dunning
 
Chicago finance-big-data
Chicago finance-big-dataChicago finance-big-data
Chicago finance-big-dataTed Dunning
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleIan Downard
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series DatabaseDataWorks Summit
 
Chicago Hadoop in Finance - Ted Dunning
Chicago Hadoop in Finance - Ted DunningChicago Hadoop in Finance - Ted Dunning
Chicago Hadoop in Finance - Ted DunningMapR Technologies
 
predictive-analytics-san-diego-2013-02-21
predictive-analytics-san-diego-2013-02-21predictive-analytics-san-diego-2013-02-21
predictive-analytics-san-diego-2013-02-21Ted Dunning
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning PrimerMathieu Dumoulin
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
 
Strata new-york-2012
Strata new-york-2012Strata new-york-2012
Strata new-york-2012Ted Dunning
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Mathieu Dumoulin
 
Cloud Computing Roadmap Public Vs Private Vs Hybrid And SaaS Vs PaaS Vs IaaS ...
Cloud Computing Roadmap Public Vs Private Vs Hybrid And SaaS Vs PaaS Vs IaaS ...Cloud Computing Roadmap Public Vs Private Vs Hybrid And SaaS Vs PaaS Vs IaaS ...
Cloud Computing Roadmap Public Vs Private Vs Hybrid And SaaS Vs PaaS Vs IaaS ...SlideTeam
 
London Data Science - Super-Fast Clustering Report
London Data Science - Super-Fast Clustering ReportLondon Data Science - Super-Fast Clustering Report
London Data Science - Super-Fast Clustering ReportMapR Technologies
 
Real-time and Long-time Together
Real-time and Long-time TogetherReal-time and Long-time Together
Real-time and Long-time TogetherMapR Technologies
 
Big data, why now?
Big data, why now?Big data, why now?
Big data, why now?Ted Dunning
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...DataWorks Summit/Hadoop Summit
 
London data science
London data scienceLondon data science
London data scienceTed Dunning
 
D6: Cloud Directions ( Predix Transform 2016)
D6: Cloud Directions ( Predix Transform 2016)D6: Cloud Directions ( Predix Transform 2016)
D6: Cloud Directions ( Predix Transform 2016)Predix
 

Similar to What is the past future tense of data? (20)

Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07
 
Big Data Analytics London
Big Data Analytics LondonBig Data Analytics London
Big Data Analytics London
 
Strata New York 2012
Strata New York 2012Strata New York 2012
Strata New York 2012
 
Chicago finance-big-data
Chicago finance-big-dataChicago finance-big-data
Chicago finance-big-data
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
 
Chicago Hadoop in Finance - Ted Dunning
Chicago Hadoop in Finance - Ted DunningChicago Hadoop in Finance - Ted Dunning
Chicago Hadoop in Finance - Ted Dunning
 
predictive-analytics-san-diego-2013-02-21
predictive-analytics-san-diego-2013-02-21predictive-analytics-san-diego-2013-02-21
predictive-analytics-san-diego-2013-02-21
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning Primer
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
Strata new-york-2012
Strata new-york-2012Strata new-york-2012
Strata new-york-2012
 
Hcj 2013-01-21
Hcj 2013-01-21Hcj 2013-01-21
Hcj 2013-01-21
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Cloud Computing Roadmap Public Vs Private Vs Hybrid And SaaS Vs PaaS Vs IaaS ...
Cloud Computing Roadmap Public Vs Private Vs Hybrid And SaaS Vs PaaS Vs IaaS ...Cloud Computing Roadmap Public Vs Private Vs Hybrid And SaaS Vs PaaS Vs IaaS ...
Cloud Computing Roadmap Public Vs Private Vs Hybrid And SaaS Vs PaaS Vs IaaS ...
 
London Data Science - Super-Fast Clustering Report
London Data Science - Super-Fast Clustering ReportLondon Data Science - Super-Fast Clustering Report
London Data Science - Super-Fast Clustering Report
 
Real-time and Long-time Together
Real-time and Long-time TogetherReal-time and Long-time Together
Real-time and Long-time Together
 
Big data, why now?
Big data, why now?Big data, why now?
Big data, why now?
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
London data science
London data scienceLondon data science
London data science
 
D6: Cloud Directions ( Predix Transform 2016)
D6: Cloud Directions ( Predix Transform 2016)D6: Cloud Directions ( Predix Transform 2016)
D6: Cloud Directions ( Predix Transform 2016)
 

More from Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning LogisticsTed Dunning
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7Ted Dunning
 

More from Ted Dunning (9)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 

Recently uploaded

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

What is the past future tense of data?

  • 1. The Shape of Data to Come it isn’t what we thought it was ©MapR Technologies - Confidential 1
  • 2. Do you remember the future? ©MapR Technologies - Confidential 2
  • 3. ©MapR Technologies - Confidential 3
  • 4. Some things turned out as expected ©MapR Technologies - Confidential 4
  • 6. What about “Big Data”? ©MapR Technologies - Confidential 6
  • 7. Harvard University 6 will have 200 x 10 volumes by 2040 Fremont Rider, 1944 ©MapR Technologies - Confidential 7
  • 8. To cope … only short papers should be published. … not more than 2500 characters counting “space,” punctuation marks, etc. Gray and Ruston in IEEE Transactions on Electronic Computers, 1964 ©MapR Technologies - Confidential 8
  • 9. Remember the guy in the Fedora? ©MapR Technologies - Confidential 9
  • 10. He’s tweeting about this right now ©MapR Technologies - Confidential 10
  • 11. So what is the big data monorail and what is the cool hat? ©MapR Technologies - Confidential 11
  • 12. Data curation Rigid Schemas Engineered Structure ©MapR Technologies - Confidential 12
  • 13. Data curation Rigid Schemas Engineered Structure ©MapR Technologies - Confidential 13
  • 14. Data as-you-find-it Flexible schemas Late binding ©MapR Technologies - Confidential 14
  • 15. Data as-you-find-it Flexible schemas Late binding ©MapR Technologies - Confidential 15
  • 16. ©MapR Technologies - Confidential 16
  • 17. ©MapR Technologies - Confidential 17
  • 18. ©MapR Technologies - Confidential 18
  • 19. ©MapR Technologies - Confidential 19
  • 20. Why is it different? How does it work? ©MapR Technologies - Confidential 20
  • 21. The Conventional Answer More data is being produced more quickly Data sizes are bigger than even a very large computer can hold Cost to create and store continues to decrease ©MapR Technologies - Confidential 21
  • 22. Analytics Scaling Laws  Analytics scaling is all about the 80-20 rule – –  The key to net value is how costs scale – –  Big gains for little initial effort Rapidly diminishing returns Old school – exponential scaling Big data – linear scaling, low constant Cost/performance has changed radically – IF you can use many commodity boxes ©MapR Technologies - Confidential 22
  • 24. ©MapR Technologies - Confidential 24
  • 26. 1 Value 0.75 Net value optimum has a sharp peak well before maximum effort 0.5 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 26 1500 2,000
  • 27. But scaling laws are changing both slope and shape ©MapR Technologies - Confidential 27
  • 28. 1 Value 0.75 0.5 More than just a little 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 28 1500 2,000
  • 29. 1 Value 0.75 0.5 They are changing a LOT! 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 29 1500 2,000
  • 30. ©MapR Technologies - Confidential 30
  • 31. ©MapR Technologies - Confidential 31
  • 34. 1 0.75 Value A tipping point is reached and things change radically … 0.5 Initially, linear cost scaling actually makes things worse 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 34 1500 2,000
  • 35. Evolution of Data Storage Scalability Over decades of progress, Unix-based systems have set the standard for compatibility and functionality Linux POSIX Functionality Compatibility ©MapR Technologies - Confidential 35
  • 36. Evolution of Data Storage Scalability Hadoop achieves much higher Hadoop scalability by trading away essentially all of this compatibility Linux POSIX Functionality Compatibility ©MapR Technologies - Confidential 36
  • 37. Evolution of Data Storage Scalability Hadoop MapR enhances Apache Hadoop by restoring the compatibility while increasing scalability and performance Linux POSIX Functionality Compatibility ©MapR Technologies - Confidential 37
  • 38. Introducing MapR MapR offers the technology leading distribution for Hadoop ©MapR Technologies - Confidential 38
  • 39. The Industry-Leaders Choose MapR in the Cloud Google chose MapR to provide Hadoop on Google Compute Engine Amazon EMR is the largest Hadoop provider in revenue and # of clusters ©MapR Technologies - Confidential 39
  • 40. MapR Supports Broad Set of Use Cases Leading Retailer Leading Bank    Recommendation Engine Fraud detection and Prevention  Customer Behavior Analysis Brand Monitoring   Customer targeting Viewer Behavioral analytics    Intrusion detection & prevention Forensic analysis  Recommendation Engine Family tree connections    Patient care monitoring    Log analysis HBase  Clickstream Analysis Quality profiling/field failure analysis    Fraud Detection Channel analytics   Customer Revenue Analytics ETL Offload ©MapR Technologies - Confidential  Advertising exchange analysis and optimization  Customer targeting Social media analysis  40   Global threat analytics Virus analysis Customer Sentiment Network Analytics Monitors and measures behavior of online shoppers
  • 41. MapR MapR The guys with the cool hats ©MapR Technologies - Confidential 41
  • 43. Seamless integration with existing applications  100% POSIX compliant  Industry standard APIs - NFS, ODBC, LDAP, REST  More 3rd party solutions  Proprietary connectors unnecessary  Language neutral ©MapR Technologies - Confidential 43
  • 45. MapR: Lights Out Data Center Ready Reliable Compute Dependable Storage  Automated stateful failover   Automated re-replication   Self-healing from HW and SW failures   Load balancing  Rolling upgrades  No lost jobs or data  99999’s of uptime ©MapR Technologies - Confidential   45 End-to-end checksums Strong consistency Business continuity with snapshots and mirrors Recover to a point in time with snapshots Mirror across sites for disaster recovery
  • 47. Why MapR Is Faster Lockless Storage Service™ Direct Block Device IO Hadoop Direct Shuffle • Eliminates storage contention • Provides throughput at device speed • Exploits MapR-FS architecture to deliver performance using Hadoop Direct Shuffle Client Side Compression • Reduces network overhead using automatic compression C vs Java • Eliminates sporadic Java garbage collection overhead (system written in C) ©MapR Technologies - Confidential 47
  • 48. Security  MapR is pushing the envelope on Hadoop security  Integrates with Linux security (PAM) –  Strong wire-level authentication and encryption –  Works with any user directory: Active Directory, LDAP, NIS, … Kerberos and non-Kerberos options Fine-grained access control – – – – Full POSIX permissions on files and directories ACLs on tables, column families, columns, cells ACLs on MapReduce jobs and queues Administration ACLs on cluster and volumes ©MapR Technologies - Confidential 48
  • 49. Bullet-proof NoSQL with Zero Administration Performance Reliability Easy Administration Benefit Features High Performance Over 1 Million ops/sec with 10 Node Cluster Continuous Low Latency No I/O Storms, No Compactions 24x7 Applications Instant Recovery, Online Schema Modification, Snapshots, Mirroring Zero Administration No Processes to Manage, Automated Splits, Self-tuning High Scalability 1 Trillion Tables Low TCO Files and Tables on One Platform ©MapR Technologies - Confidential 49
  • 50. MapR M7 vs. CDH – Mixed Load (50-50) ©MapR Technologies - Confidential 50
  • 51. MapR M7 vs. CDH – Mixed Load (50-50) ©MapR Technologies - Confidential 51
  • 52. MapR MapR The guys with the cool solutions ©MapR Technologies - Confidential 52
  • 53. MapR MapR The future of the future ©MapR Technologies - Confidential 53
  • 54. Thank You ©MapR Technologies - Confidential 54

Editor's Notes

  1. The different kinds of scaling laws have different shape and I think that shape is the key.
  2. The value of analytics always increases with more data, but the rate of increase drops dramatically after an initial quick increase.
  3. In classical analytics, the cost of doing analytics increases sharply.
  4. The result is a net value that has a sharp optimum in the area where value is increasing rapidly and cost is not yet increasing so rapidly.
  5. New techniques such as Hadoop result in linear scaling of cost. This is a change in shape and it causes a qualitative change in the way that costs trade off against value to give net value. As technology improves, the slope of this cost line is also changing rapidly over time.
  6. This next sequence shows how the net value changes with different slope linear cost models.
  7. Notice how the best net value has jumped up significantly
  8. And as the line approaches horizontal, the highest net value occurs at dramatically larger data scale.
  9. MapR has been selected by two of the companies most experienced with MapReduce technology which is a testament to the technology advanges of MapR’s distribution. Amazon through its Elastic MapReduce service (EMR) hosted over 2 million clusters in the past year. Amazon selected MapR to complement EMR as the only commercial Hadoop distribution being offered, sold and supported as a service by Amazon to its customers. MapR was also selected by Google – the pioneer of MapReduce and the company whose white paper on MapReduce inspired the creation of Hadoop – has also selected MapR to make our distribution available on Google Compute Engine. Hadoop in the cloud makes a great deal of sense: the elastic resource allocation that cloud computing is premised on works well for cluster-based data processing infrastructure used on varying analyses and data sets of indeterminate size. MapR has unique features such as mirroring between sites and multi-tenancy support that further enhance cloud deployments
  10. MapR is used today across industries. We have 10 of the Fortune 100 that are using MapR in production. We have leading web 2.0 properties such as leading digital advertising platforms, using MapR.These customers are using MapR in production for a variety of use cases. Examples include one of the largest credit card issuers in the world that has standardized on MapR for fraud and consumer targeting applications.Other examples include a major health care group,national cyber security, and one of the largest retailers in the world. These are all provided by MapR’s complete distribution for Apache Hadoop
  11. MapR enables integration by providing industry-standard interfacesMore 3rd party solutions work with MapR than any other distributionProprietary connectors not neededNFSAll file-based applications can read and write dataExamples: Linux utilities, file browsers, Informatica UltraMessagingODBC 3.52All BI applications can leverage HiveExamples: Excel, Crystal Reports, Tableau, MicroStrategyLinux PAMAny authentication provider can be usedExamples: LDAP, Kerberos, 3rd party
  12. With MapR Hadoop is Lights out Data Center ReadyMapR provides 5 99999’s of availability including support for rolling upgrades, self –healing and automated stateful failover. MapR is the only distribution that provides these capabilities, MapR also provides dependable data storage with full data protection and business continuity features. MapR provides point in time recovery to protect against application and user errors. There is end to end check summing so data corruption is automatically detected and corrected with MapR’s self healing capabilities. Mirroring across sites is fully supported.All these features support lights out data center operations. Every two weeks an administrator can take a MapR report and a shopping cart full of drives and replace failed drives.