SlideShare une entreprise Scribd logo
1  sur  54
Télécharger pour lire hors ligne
Welcome to Today’s
DBTA Roundtable Discussion
Moderator
Stephen Faig
Manager
Unisphere Research and DBTA
Real-Time Analytics with Hadoop
Speakers
Dale Kim
Director of Industry Solutions
MapR
Paige Roberts
Hadoop & Analytics Evangelist
Actian
© 2015 MapR Technologies 5© 2015 MapR Technologies
© 2015 MapR Technologies 6
Examples of Real-Time
Images licensed under https://creativecommons.org/licenses/by/2.0/
Time image courtesy of Daniel Oldfield: https://www.flickr.com/photos/democlez/4424898002/
Air bag image courtesy of Mike Babcock: https://www.flickr.com/photos/mikebabcock/3098836311/
Tied to clock time Guaranteed response time
For real-time analytics, let’s use: “no built-in delays”
So what is real-time analytics with Hadoop?
© 2015 MapR Technologies 7
Requirements for Real-Time Analytics with Hadoop
REAL-TIME
DATA
REAL-TIME
APPLICATIONS
REAL-TIME
QUERIES
© 2015 MapR Technologies 8
Real-Time Data
Definition: Provide immediate access to live Hadoop data
for analysis
Requirements:
• Analysis uses live real-time data, not batch-copied data
• Business can identify insights immediately (often through
an automated process)
• Critical for use cases such as ad targeting, personalization, network
security analysis.
• System avoids complexity of separate stream processing
or messaging system for recent data
© 2015 MapR Technologies 9
Real-Time Data in Hadoop
For real-time:
• Log files should be written directly
into the cluster or synced across
remote data centers
• Operational applications should
run in the same cluster, or in a
separate cluster with real-time
table replication
• Immediate action should be taken
• E.g., difference between fraud
detection and fraud prevention
• Difference between on-demand ad
bid versus missing opportunity
Existing challenges:
• Log files must be batch uploaded
periodically (e.g., every 30
minutes)
• Due to HDFS limitations (not R/W,
file-close semantics, no direct NFS)
• Operational applications run on a
separate cluster/stack
• Data must be batch uploaded
• With batch uploads, the window to
respond is missed
• Fraud, cyber attacks, matches,
anomalies, etc.
© 2015 MapR Technologies 10
Real-Time Applications
Definition: Run operational applications in the cluster
Requirements:
• Address use cases beyond batch and interactive
analysis
• E.g., end-to-end real-time marketing and security applications directly
on Hadoop
• Eliminate separate Hadoop and NoSQL
clusters/technology stacks for apps
© 2015 MapR Technologies 11
Real-Time Applications in Hadoop
For real-time:
• Minimize impact of disrupting
“housekeeping tasks to enable
consistent, real-time operations
• E.g., Compactions, Java garbage
collection, “region splits”
• Process live, operational data in
Hadoop to avoid delays in batch
copies
Existing challenges:
• Other in-Hadoop databases suffer
disruptions, inhibiting real-time
• E.g., Compactions can significantly
slow down the system
• Garbage collection leads to
unpredictable system delays
• Region splits are required to spread
load, but impacts responsiveness
and performance
• Other in-Hadoop databases require
separate clusters
© 2015 MapR Technologies 12
Real-Time Querying
Definition: Query any data as soon as it lands in the
cluster (self-service)
Requirements:
• Analysts can explore data immediately, no waiting
days/weeks for data prep by IT
• IT is not burdened with repeated schema management
and ETL requests
© 2015 MapR Technologies 13
Real-Time Querying in Hadoop
For real-time:
• Minimize time to get started on
data exploration
• Leverage query engines that can
query data in place
– Eliminate IT dependencies for
schema preparation
Existing challenges:
• New data that lands in the cluster
necessarily requires IT-built
schemas
• Data exploration and analysis is
contingent on IT backlog
© 2015 MapR Technologies 14© 2015 MapR Technologies
So How Are These Implemented?
© 2014 MapR Technologies 15
Fraud model
Recommendations
table
MapR Distribution for Hadoop
Fraud
investigator
Interactive
marketer
Online
transactions
Fraud
detection
Personalized
offers
Clickstream
analysis
Fraud
investigation tool
Real-time Operational Applications
Analytics
Case Study: Global Financial Services Firm
Analytics + Operational Applications on one platform
© 2015 MapR Technologies 16
REAL-TIME
DATA
REAL-TIME
APPLICATIONS
REAL-TIME
QUERIES
© 2015 MapR Technologies 17
Faster/Secure NFS Access
Redundant gateways
for high availability
CLIENT NODE(S)
NFS
Gateway
NFS
Gateway
MapR data access options:
1. HDFS API – apps written for Hadoop
2. Standard read/write NFS (POSIX) – existing
file system-based apps, no code changes
3. MapR POSIX Client – advanced read/write
NFS requirements, includes:
1. Compression
2. Parallelism
3. Authentication
4. Encryption
NFS client
(included in OS)
Native applications
HDFS API
(hadoop-core-*.jar)
MapR POSIX
Client
MapR cluster
Hadoop
applications
(e.g. “hadoop fs –put”)
File-based apps/utils
(e.g. cp, emacs)
NFS client
(included in OS)
NFS
Gateway
2
3
1
© 2015 MapR Technologies 18
YCSB
Benchmark
MapR-DB 4.X Other NoSQL
MapR-DB
Increase
Load
(10, 100)*
27,097 14,753 1.8x
Read
(75, 150)
4,402 1,902 2.3x
50% read /
50% update
(75, 100)
8,684 2,012 4.3x
95% read /
5% update
(75, 100)
3,776 1,127 3.4x
Scan
(32, 32)
478 Client hangs N/A
MapR-DB and “Other NoSQL” Throughput on YCSB
Throughput performance in operations/second/node (higher is better)
*Numbers in parentheses represent threads per client used in test runs for MapR-DB, other NoSQL, respectively
© 2015 MapR Technologies 19
REAL-TIME
DATA
REAL-TIME
APPLICATIONS
REAL-TIME
QUERIES
© 2015 MapR Technologies 20
YCSB Mixed (50% Read / 50% Put) - Compare Read Latency
MapR-DB
HBase on other
Hadoop distribution
Lower is better
© 2015 MapR Technologies 21
MapR-DB Table Replication
Multi-master (aka, active/active)
replication
Active Read/Write
End Users
• Faster data access – minimize network
latency on global data with local clusters
• Reduced risk of data loss – real-time,
bi-directional replication for synchronized
data across active clusters
• Application failover – upon any cluster
failure, applications continue via
redirection to another cluster
© 2015 MapR Technologies 22
MapR-DB Real-Time Analytics
Active clusters close to the end users,
with real-time analytics at central cluster
Active Read/Write
MapR-DB cluster
(London)
MapR-DB cluster
(New York)
MapR-DB cluster
(Singapore)
MapR-DB/Hadoop
cluster
Hadoop analytics
Operational and analytical workloads
combined in a single deployment
Operationally efficient,
consolidated MapR cluster
Database
operations
Hadoop
analytics
End Users
© 2015 MapR Technologies 23
REAL-TIME
DATA
REAL-TIME
APPLICATIONS
REAL-TIME
QUERIES
© 2015 MapR Technologies 24
One SQL Interface for All Data Formats
Unstructured data will
account for more than 80%
of the data collected by
organizations
ANSI SQL queries on rapidly evolving schemas
UNSTRUCTURED
DATA
STRUCTURED DATA
2000 20101990 2020
TotalDataStored
Existing
SQL
Engines
Apache
Drill
Self-Service
Data
Exploration
IT-Driven BI
Self-Service BI
SQL Options for
Analytics
© 2015 MapR Technologies 25
Traditional
Approach
Agility by Reducing Distance to Data
Short analytic life cycles with no upfront schema creation and management
Hadoop Data
Schema
Design
Transforma
tion
Data
Movement
Users
Hadoop Data Users
New Business Questions
Total Time to Value: Weeks to Months
Total Time to Value: Minutes
New
Approach
Data Preparation
New Business Questions
Drill enables the
“As It Happens” business
with instant SQL analytics
on complex data
FROM:
TO:
© 2015 MapR Technologies 26© 2015 MapR Technologies
Summary
© 2015 MapR Technologies 27
Batch Bottlenecks
1. Data streaming – real-time,
but…
2. Further analysis is limited by
batch loads into HDFS
3. Most databases must run in
separate cluster, leading to
batch copies
4. Append-only HDFS leads to
heavy I/O for database
defragmentation
(“compactions”)
5. Data exploration requires IT
intervention
1
2
3
4
5
© 2015 MapR Technologies 28
Removing the Batch Limitations
1. Data streaming – real-time
as before, and now….
2. Further analysis is allowed
with real-time loading
3. MapR-DB runs in Hadoop
4. With full read/write file
system, defragmentation
delays are eliminated
5. Data exploration performed
in a self-service manner
Real-Time
Data
Real-Time
Applications
Real-Time
Querying
1
2
34
5
© 2015 MapR Technologies 29
And Don’t Forget…
• Real-time analytics doesn’t help you if the other key pieces
aren’t in place
• Include security
– Interoperability with any authentication mechanism
– Fine-grained access controls
– Auditing capabilities beyond simple log files
• Also include enterprise-grade reliability
– An automated high availability configuration
– Incremental mirroring/replication for disaster recovery
– Consistent snapshots
• Talk to us about what else you should consider
© 2015 MapR Technologies 30
Q&A
@mapr maprtech
dalekim@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies
Confidential © 2014 Actian Corporation31
Real-Time Analytics
Paige Roberts
April, 2015
Hadoop & Analytics Evangelist
Actian Hadoop & Analytics Center of Excellence
Confidential © 2014 Actian Corporation32
Agenda
About Actian
Advantages of Data-Driven Business
What Do I Mean By Real-Time?
Real-Time Challenge: ATM Fraud
How Actian Does It
Confidential © 2014 Actian Corporation33
$140M Revenues + Profitable
10,000+ Customers
Global Presence: 8 world-wide offices, 7x 24 multinational support model
33
“Fast becoming a big data
powerhouse to challenge
the market.” Forrester
“Actian is now very powerfully
positioned in the big data and analytics
markets.” Bloor
A Few Words About Actian
Confidential © 2014 Actian Corporation34
Note: Percentage, 10 year CAGR McKinsey Report on Big Data.
8
9
5
5
-1
6
9
14
11
9
24
12
Revenue
Big Data Other Companies
Grocers
Online Retailers
Big Box Retailers
Casinos
Credit Cards
Insurance
EBITDA
• Predictive
• Real-time
• All Data
• New Insights
• Accuracy
5
-1
1
2
-15
3
14
9
12
10
22
11
…. At the Expense of Those That Don’t
Companies Using Big Data Strategically Outperform
Confidential © 2014 Actian Corporation35
What Does Real-Time Mean to Us?
Human comfortable interactivity
Streaming data processing
Sub-second response
Confidential © 2014 Actian Corporation36
Real-Time Analytics – Many Applications
Solar Power Company
New customer targeting
Smart meter data
Sportswear Company
Brand loyalty
Wearable data
Bank
ATM Fraud
Router data
Confidential © 2014 Actian Corporation37
Large US Bank Needs Help
• Multi-billion dollar American
bank / financial holding
company
• Provides deposit, credit,
trust, and investment
services to a broad range of
clients
• Operates nearly 1,500 retail
branches and more than
2,000 ATMs
Confidential © 2014 Actian Corporation38
Numberoftimesfasterthan
Impala
Fraud Kept This Bank’s Execs Up at Night
Confidential © 2014 Actian Corporation39
What is the Worst Gotcha About ATM Fraud?
In spite of that, 67% of U.S. adults would
switch to another institution after
experiencing ATM fraud or a data breach.
http://www.harrisinteractive.com/NewsRoom/HarrisPolls/tabid/447/ctl/ReadCustom%20Default/mid/1508/ArticleId/1515/Default.aspx
In the majority of cases, banks are required
to reimburse customer losses.
https://www.tycois.com/insights-and-opinions/articles/atm-skimming-costs-banks
Confidential © 2014 Actian Corporation40
This is What You Call a Delayed Reaction
Confidential © 2014 Actian Corporation41
Time to Call in the Elephant
Confidential © 2014 Actian Corporation42
Actian Management Console
DATAPLATFORM
Actian Vortex
Elastic Data
Preparation
DataFlow
SQL Analytics
Vector in Hadoop
Library of Analytic Blueprints
Graph Analytics
SPARQLverse
Machine Learning & Predictive Analytics
DataFlow
ANALYTIC
APPS
Financial
Services
Health Care
Other
Verticals
SQL
Java,C/++,
Python
SOURCE
DATA
Databases / Marts
Warehouses
Cloud / SaaS
Applications
Structured &
Unstructured
Data
Enterprise
Applications
APPLICATIONDEV
Application Development and Tools
INFRASTRUCTURE
Deployment Options
@Customer
Actian Vortex: The Elephant’s Best Friend
powered by KNIME
Confidential © 2014 Actian Corporation43
Actian Vortex:
High Performance Analytics at Scale in Hadoop
Powered by KNIME
Confidential © 2014 Actian Corporation44
Stopping Fraud in Real Time
https://www.youtube.com/watch?v=u1QoHCpOUOU
Confidential © 2014 Actian Corporation45
Actian Vector in Hadoop: Built for Speed
Vector Processing
Single
Instruction
Multiple
Data
2nd Gen Column Store
Limit I/O
Efficient real time updates
Smarter Compression
Maximize throughput
Vectorized decompression
Exploiting Chip Cache
Process data on chip – not in RAM
1
2
3
4
Multi-core Parallelism
Maximize system resource
utilization…
Storage Indexes
Quickly identify candidate data
blocks
Minimize IO
5
6
Time/CyclestoProcess
Data Processed
DISK
RAM
CHIP
10GB2-3GB40-400MB
2-20150-250Millions
Confidential © 2014 Actian Corporation46
How Fast?
Confidential © 2014 Actian Corporation47
How Fast?
Confidential © 2014 Actian Corporation48
What to Look For in SQL in Hadoop
• Collaborative architecture
• Open access to Actian data
storage formats
• Support for other formats
• Hadoop distribution and
ecosystem application
support
No vendor lock-in
• Fastest data prep and
ingestion
• Fastest analytic engines
• Unbridled processing
power on data nodes in a
Hadoop cluster
• Full SQL support
• Extreme scalability
• Full security
• High Availability &
Disaster Recovery
Results you need when
you need them
Proven technology
advantages
Open Fast Enterprise-Grade
Confidential © 2014 Actian Corporation49
Free Actian Vortex Express Edition
Confidential © 2014 Actian Corporation50
www.actian.com
facebook.com/actiancorp
@actiancorp
Thank You
Download Actian Vortex Express
Free Forever
http://bigdata.actian.com/sql-in-hadoop
Question and Answer Session
(please submit questions)
Q & A
Dale Kim
Director of Industry Solutions
MapR
Paige Roberts
Hadoop & Analytics Evangelist
Actian
Please use the same URL you used to view today’s live event
for the archive event, plus we will be sending you a follow-up
email with that URL once the archive is posted!
Thank you for participating in
today’s roundtable web event
Just by attending this event the winner of the
$100 AmEx Gift Card is…….

Contenu connexe

Tendances

The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceDataWorks Summit/Hadoop Summit
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinFlink Forward
 
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningDataWorks Summit
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsDataWorks Summit
 
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland LeusdenTestistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland LeusdenTurkish Testing Board
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
 
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex BlackTestistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex BlackTurkish Testing Board
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not laterDataWorks Summit
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastImpetus Technologies
 
Data Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data SecurityData Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data SecurityDataWorks Summit
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Scale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNScale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNDataWorks Summit/Hadoop Summit
 

Tendances (20)

Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open Source
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
 
Real Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with SparkReal Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with Spark
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 
What's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and BeyondWhat's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and Beyond
 
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
 
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland LeusdenTestistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
 
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex BlackTestistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus Webcast
 
Data Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data SecurityData Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data Security
 
In Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging serviceIn Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging service
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Scale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNScale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARN
 
Enterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on HadoopEnterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on Hadoop
 

En vedette

Data science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief IntroductionData science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief IntroductionAdnan Masood
 
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhDSpark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhDAdnan Masood
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Jen Stirrup
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Jen Stirrup
 
Restructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality ApproachRestructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality ApproachAdnan Masood
 
Visualising the tabular model for power view upload
Visualising the tabular model for power view uploadVisualising the tabular model for power view upload
Visualising the tabular model for power view uploadJen Stirrup
 
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationDigital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationJen Stirrup
 
Cloud Computing Architecture Primer
Cloud Computing Architecture PrimerCloud Computing Architecture Primer
Cloud Computing Architecture PrimerIlham Ahmed
 
System Quality Attributes for Software Architecture
System Quality Attributes for Software ArchitectureSystem Quality Attributes for Software Architecture
System Quality Attributes for Software ArchitectureAdnan Masood
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight ServiceNeil Mackenzie
 
How Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform EducationHow Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform EducationHortonworks
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureKhalid Salama
 
Hive - 1455: Cloud Storage
Hive - 1455: Cloud StorageHive - 1455: Cloud Storage
Hive - 1455: Cloud StorageHortonworks
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHortonworks
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPHortonworks
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
 

En vedette (20)

Data science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief IntroductionData science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief Introduction
 
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhDSpark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
 
Restructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality ApproachRestructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality Approach
 
Cloud computing by Bhavesh
Cloud computing by BhaveshCloud computing by Bhavesh
Cloud computing by Bhavesh
 
Visualising the tabular model for power view upload
Visualising the tabular model for power view uploadVisualising the tabular model for power view upload
Visualising the tabular model for power view upload
 
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationDigital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
 
Cloud Computing Architecture Primer
Cloud Computing Architecture PrimerCloud Computing Architecture Primer
Cloud Computing Architecture Primer
 
System Quality Attributes for Software Architecture
System Quality Attributes for Software ArchitectureSystem Quality Attributes for Software Architecture
System Quality Attributes for Software Architecture
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
How Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform EducationHow Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform Education
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft Azure
 
Hive - 1455: Cloud Storage
Hive - 1455: Cloud StorageHive - 1455: Cloud Storage
Hive - 1455: Cloud Storage
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 

Similaire à Realtime analytics with_hadoop

Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRData Con LA
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR Technologies
 
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019alanfgates
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformEMC
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiFelicia Haggarty
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Dataconomy Media
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Mats Uddenfeldt
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyAlluxio, Inc.
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Global Business Events
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Denodo
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...AboutYouGmbH
 
Cloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsCloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsVMware Tanzu
 

Similaire à Realtime analytics with_hadoop (20)

Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
 
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 
Cloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsCloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native apps
 

Plus de Edgar Alejandro Villegas

What's New in Predictive Analytics IBM SPSS - Apr 2016
What's New in Predictive Analytics IBM SPSS - Apr 2016What's New in Predictive Analytics IBM SPSS - Apr 2016
What's New in Predictive Analytics IBM SPSS - Apr 2016Edgar Alejandro Villegas
 
The Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperThe Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperEdgar Alejandro Villegas
 
SQL in Hadoop To Boldly Go Where no Data Warehouse Has Gone Before
SQL in Hadoop  To Boldly Go Where no Data Warehouse Has Gone BeforeSQL in Hadoop  To Boldly Go Where no Data Warehouse Has Gone Before
SQL in Hadoop To Boldly Go Where no Data Warehouse Has Gone BeforeEdgar Alejandro Villegas
 
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343Edgar Alejandro Villegas
 
Best Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle OptimizerBest Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle OptimizerEdgar Alejandro Villegas
 
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...Edgar Alejandro Villegas
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Edgar Alejandro Villegas
 
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slidesFast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slidesEdgar Alejandro Villegas
 
BITGLASS - DATA BREACH DISCOVERY DATASHEET
BITGLASS - DATA BREACH DISCOVERY DATASHEETBITGLASS - DATA BREACH DISCOVERY DATASHEET
BITGLASS - DATA BREACH DISCOVERY DATASHEETEdgar Alejandro Villegas
 
Four Pillars of Business Analytics - e-book - Actuate
Four Pillars of Business Analytics - e-book - ActuateFour Pillars of Business Analytics - e-book - Actuate
Four Pillars of Business Analytics - e-book - ActuateEdgar Alejandro Villegas
 

Plus de Edgar Alejandro Villegas (20)

What's New in Predictive Analytics IBM SPSS - Apr 2016
What's New in Predictive Analytics IBM SPSS - Apr 2016What's New in Predictive Analytics IBM SPSS - Apr 2016
What's New in Predictive Analytics IBM SPSS - Apr 2016
 
Oracle big data discovery 994294
Oracle big data discovery   994294Oracle big data discovery   994294
Oracle big data discovery 994294
 
Actian Ingres10.2 Datasheet
Actian Ingres10.2 DatasheetActian Ingres10.2 Datasheet
Actian Ingres10.2 Datasheet
 
Actian Matrix Datasheet
Actian Matrix DatasheetActian Matrix Datasheet
Actian Matrix Datasheet
 
Actian Matrix Whitepaper
 Actian Matrix Whitepaper Actian Matrix Whitepaper
Actian Matrix Whitepaper
 
Actian Vector Whitepaper
 Actian Vector Whitepaper Actian Vector Whitepaper
Actian Vector Whitepaper
 
Actian DataFlow Whitepaper
Actian DataFlow WhitepaperActian DataFlow Whitepaper
Actian DataFlow Whitepaper
 
The Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperThe Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology Whitepaper
 
SQL in Hadoop To Boldly Go Where no Data Warehouse Has Gone Before
SQL in Hadoop  To Boldly Go Where no Data Warehouse Has Gone BeforeSQL in Hadoop  To Boldly Go Where no Data Warehouse Has Gone Before
SQL in Hadoop To Boldly Go Where no Data Warehouse Has Gone Before
 
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Big Data SurVey - IOUG - 2013 - 594292
Big Data SurVey - IOUG - 2013 - 594292Big Data SurVey - IOUG - 2013 - 594292
Big Data SurVey - IOUG - 2013 - 594292
 
Best Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle OptimizerBest Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle Optimizer
 
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869
 
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slidesFast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
 
BITGLASS - DATA BREACH DISCOVERY DATASHEET
BITGLASS - DATA BREACH DISCOVERY DATASHEETBITGLASS - DATA BREACH DISCOVERY DATASHEET
BITGLASS - DATA BREACH DISCOVERY DATASHEET
 
Four Pillars of Business Analytics - e-book - Actuate
Four Pillars of Business Analytics - e-book - ActuateFour Pillars of Business Analytics - e-book - Actuate
Four Pillars of Business Analytics - e-book - Actuate
 
Sas hpa-va-bda-exadata-2389280
Sas hpa-va-bda-exadata-2389280Sas hpa-va-bda-exadata-2389280
Sas hpa-va-bda-exadata-2389280
 
Splice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakesSplice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakes
 

Dernier

ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 

Dernier (20)

ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 

Realtime analytics with_hadoop

  • 1. Welcome to Today’s DBTA Roundtable Discussion
  • 4. Speakers Dale Kim Director of Industry Solutions MapR Paige Roberts Hadoop & Analytics Evangelist Actian
  • 5. © 2015 MapR Technologies 5© 2015 MapR Technologies
  • 6. © 2015 MapR Technologies 6 Examples of Real-Time Images licensed under https://creativecommons.org/licenses/by/2.0/ Time image courtesy of Daniel Oldfield: https://www.flickr.com/photos/democlez/4424898002/ Air bag image courtesy of Mike Babcock: https://www.flickr.com/photos/mikebabcock/3098836311/ Tied to clock time Guaranteed response time For real-time analytics, let’s use: “no built-in delays” So what is real-time analytics with Hadoop?
  • 7. © 2015 MapR Technologies 7 Requirements for Real-Time Analytics with Hadoop REAL-TIME DATA REAL-TIME APPLICATIONS REAL-TIME QUERIES
  • 8. © 2015 MapR Technologies 8 Real-Time Data Definition: Provide immediate access to live Hadoop data for analysis Requirements: • Analysis uses live real-time data, not batch-copied data • Business can identify insights immediately (often through an automated process) • Critical for use cases such as ad targeting, personalization, network security analysis. • System avoids complexity of separate stream processing or messaging system for recent data
  • 9. © 2015 MapR Technologies 9 Real-Time Data in Hadoop For real-time: • Log files should be written directly into the cluster or synced across remote data centers • Operational applications should run in the same cluster, or in a separate cluster with real-time table replication • Immediate action should be taken • E.g., difference between fraud detection and fraud prevention • Difference between on-demand ad bid versus missing opportunity Existing challenges: • Log files must be batch uploaded periodically (e.g., every 30 minutes) • Due to HDFS limitations (not R/W, file-close semantics, no direct NFS) • Operational applications run on a separate cluster/stack • Data must be batch uploaded • With batch uploads, the window to respond is missed • Fraud, cyber attacks, matches, anomalies, etc.
  • 10. © 2015 MapR Technologies 10 Real-Time Applications Definition: Run operational applications in the cluster Requirements: • Address use cases beyond batch and interactive analysis • E.g., end-to-end real-time marketing and security applications directly on Hadoop • Eliminate separate Hadoop and NoSQL clusters/technology stacks for apps
  • 11. © 2015 MapR Technologies 11 Real-Time Applications in Hadoop For real-time: • Minimize impact of disrupting “housekeeping tasks to enable consistent, real-time operations • E.g., Compactions, Java garbage collection, “region splits” • Process live, operational data in Hadoop to avoid delays in batch copies Existing challenges: • Other in-Hadoop databases suffer disruptions, inhibiting real-time • E.g., Compactions can significantly slow down the system • Garbage collection leads to unpredictable system delays • Region splits are required to spread load, but impacts responsiveness and performance • Other in-Hadoop databases require separate clusters
  • 12. © 2015 MapR Technologies 12 Real-Time Querying Definition: Query any data as soon as it lands in the cluster (self-service) Requirements: • Analysts can explore data immediately, no waiting days/weeks for data prep by IT • IT is not burdened with repeated schema management and ETL requests
  • 13. © 2015 MapR Technologies 13 Real-Time Querying in Hadoop For real-time: • Minimize time to get started on data exploration • Leverage query engines that can query data in place – Eliminate IT dependencies for schema preparation Existing challenges: • New data that lands in the cluster necessarily requires IT-built schemas • Data exploration and analysis is contingent on IT backlog
  • 14. © 2015 MapR Technologies 14© 2015 MapR Technologies So How Are These Implemented?
  • 15. © 2014 MapR Technologies 15 Fraud model Recommendations table MapR Distribution for Hadoop Fraud investigator Interactive marketer Online transactions Fraud detection Personalized offers Clickstream analysis Fraud investigation tool Real-time Operational Applications Analytics Case Study: Global Financial Services Firm Analytics + Operational Applications on one platform
  • 16. © 2015 MapR Technologies 16 REAL-TIME DATA REAL-TIME APPLICATIONS REAL-TIME QUERIES
  • 17. © 2015 MapR Technologies 17 Faster/Secure NFS Access Redundant gateways for high availability CLIENT NODE(S) NFS Gateway NFS Gateway MapR data access options: 1. HDFS API – apps written for Hadoop 2. Standard read/write NFS (POSIX) – existing file system-based apps, no code changes 3. MapR POSIX Client – advanced read/write NFS requirements, includes: 1. Compression 2. Parallelism 3. Authentication 4. Encryption NFS client (included in OS) Native applications HDFS API (hadoop-core-*.jar) MapR POSIX Client MapR cluster Hadoop applications (e.g. “hadoop fs –put”) File-based apps/utils (e.g. cp, emacs) NFS client (included in OS) NFS Gateway 2 3 1
  • 18. © 2015 MapR Technologies 18 YCSB Benchmark MapR-DB 4.X Other NoSQL MapR-DB Increase Load (10, 100)* 27,097 14,753 1.8x Read (75, 150) 4,402 1,902 2.3x 50% read / 50% update (75, 100) 8,684 2,012 4.3x 95% read / 5% update (75, 100) 3,776 1,127 3.4x Scan (32, 32) 478 Client hangs N/A MapR-DB and “Other NoSQL” Throughput on YCSB Throughput performance in operations/second/node (higher is better) *Numbers in parentheses represent threads per client used in test runs for MapR-DB, other NoSQL, respectively
  • 19. © 2015 MapR Technologies 19 REAL-TIME DATA REAL-TIME APPLICATIONS REAL-TIME QUERIES
  • 20. © 2015 MapR Technologies 20 YCSB Mixed (50% Read / 50% Put) - Compare Read Latency MapR-DB HBase on other Hadoop distribution Lower is better
  • 21. © 2015 MapR Technologies 21 MapR-DB Table Replication Multi-master (aka, active/active) replication Active Read/Write End Users • Faster data access – minimize network latency on global data with local clusters • Reduced risk of data loss – real-time, bi-directional replication for synchronized data across active clusters • Application failover – upon any cluster failure, applications continue via redirection to another cluster
  • 22. © 2015 MapR Technologies 22 MapR-DB Real-Time Analytics Active clusters close to the end users, with real-time analytics at central cluster Active Read/Write MapR-DB cluster (London) MapR-DB cluster (New York) MapR-DB cluster (Singapore) MapR-DB/Hadoop cluster Hadoop analytics Operational and analytical workloads combined in a single deployment Operationally efficient, consolidated MapR cluster Database operations Hadoop analytics End Users
  • 23. © 2015 MapR Technologies 23 REAL-TIME DATA REAL-TIME APPLICATIONS REAL-TIME QUERIES
  • 24. © 2015 MapR Technologies 24 One SQL Interface for All Data Formats Unstructured data will account for more than 80% of the data collected by organizations ANSI SQL queries on rapidly evolving schemas UNSTRUCTURED DATA STRUCTURED DATA 2000 20101990 2020 TotalDataStored Existing SQL Engines Apache Drill Self-Service Data Exploration IT-Driven BI Self-Service BI SQL Options for Analytics
  • 25. © 2015 MapR Technologies 25 Traditional Approach Agility by Reducing Distance to Data Short analytic life cycles with no upfront schema creation and management Hadoop Data Schema Design Transforma tion Data Movement Users Hadoop Data Users New Business Questions Total Time to Value: Weeks to Months Total Time to Value: Minutes New Approach Data Preparation New Business Questions Drill enables the “As It Happens” business with instant SQL analytics on complex data FROM: TO:
  • 26. © 2015 MapR Technologies 26© 2015 MapR Technologies Summary
  • 27. © 2015 MapR Technologies 27 Batch Bottlenecks 1. Data streaming – real-time, but… 2. Further analysis is limited by batch loads into HDFS 3. Most databases must run in separate cluster, leading to batch copies 4. Append-only HDFS leads to heavy I/O for database defragmentation (“compactions”) 5. Data exploration requires IT intervention 1 2 3 4 5
  • 28. © 2015 MapR Technologies 28 Removing the Batch Limitations 1. Data streaming – real-time as before, and now…. 2. Further analysis is allowed with real-time loading 3. MapR-DB runs in Hadoop 4. With full read/write file system, defragmentation delays are eliminated 5. Data exploration performed in a self-service manner Real-Time Data Real-Time Applications Real-Time Querying 1 2 34 5
  • 29. © 2015 MapR Technologies 29 And Don’t Forget… • Real-time analytics doesn’t help you if the other key pieces aren’t in place • Include security – Interoperability with any authentication mechanism – Fine-grained access controls – Auditing capabilities beyond simple log files • Also include enterprise-grade reliability – An automated high availability configuration – Incremental mirroring/replication for disaster recovery – Consistent snapshots • Talk to us about what else you should consider
  • 30. © 2015 MapR Technologies 30 Q&A @mapr maprtech dalekim@mapr.com Engage with us! MapR maprtech mapr-technologies
  • 31. Confidential © 2014 Actian Corporation31 Real-Time Analytics Paige Roberts April, 2015 Hadoop & Analytics Evangelist Actian Hadoop & Analytics Center of Excellence
  • 32. Confidential © 2014 Actian Corporation32 Agenda About Actian Advantages of Data-Driven Business What Do I Mean By Real-Time? Real-Time Challenge: ATM Fraud How Actian Does It
  • 33. Confidential © 2014 Actian Corporation33 $140M Revenues + Profitable 10,000+ Customers Global Presence: 8 world-wide offices, 7x 24 multinational support model 33 “Fast becoming a big data powerhouse to challenge the market.” Forrester “Actian is now very powerfully positioned in the big data and analytics markets.” Bloor A Few Words About Actian
  • 34. Confidential © 2014 Actian Corporation34 Note: Percentage, 10 year CAGR McKinsey Report on Big Data. 8 9 5 5 -1 6 9 14 11 9 24 12 Revenue Big Data Other Companies Grocers Online Retailers Big Box Retailers Casinos Credit Cards Insurance EBITDA • Predictive • Real-time • All Data • New Insights • Accuracy 5 -1 1 2 -15 3 14 9 12 10 22 11 …. At the Expense of Those That Don’t Companies Using Big Data Strategically Outperform
  • 35. Confidential © 2014 Actian Corporation35 What Does Real-Time Mean to Us? Human comfortable interactivity Streaming data processing Sub-second response
  • 36. Confidential © 2014 Actian Corporation36 Real-Time Analytics – Many Applications Solar Power Company New customer targeting Smart meter data Sportswear Company Brand loyalty Wearable data Bank ATM Fraud Router data
  • 37. Confidential © 2014 Actian Corporation37 Large US Bank Needs Help • Multi-billion dollar American bank / financial holding company • Provides deposit, credit, trust, and investment services to a broad range of clients • Operates nearly 1,500 retail branches and more than 2,000 ATMs
  • 38. Confidential © 2014 Actian Corporation38 Numberoftimesfasterthan Impala Fraud Kept This Bank’s Execs Up at Night
  • 39. Confidential © 2014 Actian Corporation39 What is the Worst Gotcha About ATM Fraud? In spite of that, 67% of U.S. adults would switch to another institution after experiencing ATM fraud or a data breach. http://www.harrisinteractive.com/NewsRoom/HarrisPolls/tabid/447/ctl/ReadCustom%20Default/mid/1508/ArticleId/1515/Default.aspx In the majority of cases, banks are required to reimburse customer losses. https://www.tycois.com/insights-and-opinions/articles/atm-skimming-costs-banks
  • 40. Confidential © 2014 Actian Corporation40 This is What You Call a Delayed Reaction
  • 41. Confidential © 2014 Actian Corporation41 Time to Call in the Elephant
  • 42. Confidential © 2014 Actian Corporation42 Actian Management Console DATAPLATFORM Actian Vortex Elastic Data Preparation DataFlow SQL Analytics Vector in Hadoop Library of Analytic Blueprints Graph Analytics SPARQLverse Machine Learning & Predictive Analytics DataFlow ANALYTIC APPS Financial Services Health Care Other Verticals SQL Java,C/++, Python SOURCE DATA Databases / Marts Warehouses Cloud / SaaS Applications Structured & Unstructured Data Enterprise Applications APPLICATIONDEV Application Development and Tools INFRASTRUCTURE Deployment Options @Customer Actian Vortex: The Elephant’s Best Friend powered by KNIME
  • 43. Confidential © 2014 Actian Corporation43 Actian Vortex: High Performance Analytics at Scale in Hadoop Powered by KNIME
  • 44. Confidential © 2014 Actian Corporation44 Stopping Fraud in Real Time https://www.youtube.com/watch?v=u1QoHCpOUOU
  • 45. Confidential © 2014 Actian Corporation45 Actian Vector in Hadoop: Built for Speed Vector Processing Single Instruction Multiple Data 2nd Gen Column Store Limit I/O Efficient real time updates Smarter Compression Maximize throughput Vectorized decompression Exploiting Chip Cache Process data on chip – not in RAM 1 2 3 4 Multi-core Parallelism Maximize system resource utilization… Storage Indexes Quickly identify candidate data blocks Minimize IO 5 6 Time/CyclestoProcess Data Processed DISK RAM CHIP 10GB2-3GB40-400MB 2-20150-250Millions
  • 46. Confidential © 2014 Actian Corporation46 How Fast?
  • 47. Confidential © 2014 Actian Corporation47 How Fast?
  • 48. Confidential © 2014 Actian Corporation48 What to Look For in SQL in Hadoop • Collaborative architecture • Open access to Actian data storage formats • Support for other formats • Hadoop distribution and ecosystem application support No vendor lock-in • Fastest data prep and ingestion • Fastest analytic engines • Unbridled processing power on data nodes in a Hadoop cluster • Full SQL support • Extreme scalability • Full security • High Availability & Disaster Recovery Results you need when you need them Proven technology advantages Open Fast Enterprise-Grade
  • 49. Confidential © 2014 Actian Corporation49 Free Actian Vortex Express Edition
  • 50. Confidential © 2014 Actian Corporation50 www.actian.com facebook.com/actiancorp @actiancorp Thank You Download Actian Vortex Express Free Forever http://bigdata.actian.com/sql-in-hadoop
  • 51. Question and Answer Session (please submit questions)
  • 52. Q & A Dale Kim Director of Industry Solutions MapR Paige Roberts Hadoop & Analytics Evangelist Actian
  • 53. Please use the same URL you used to view today’s live event for the archive event, plus we will be sending you a follow-up email with that URL once the archive is posted!
  • 54. Thank you for participating in today’s roundtable web event Just by attending this event the winner of the $100 AmEx Gift Card is…….