SlideShare une entreprise Scribd logo
1  sur  19
11
Supercharging Data Performance for
Real-Time Data Analysis
2
Information—the fuel of business—is trapped
in analysis platforms built on 70-year
old architectures.
3
Data volume and velocity challenge traditional
computing methods
Traditional Approach:
• Commodity x86 based servers
• Cluster with open source software
• Scale for volume
• Scale for parallelism / performance
Challenges:
• High level languages can be inefficient
• Data intensive workloads drive in-memory solutions
• DRAM footprints at commodity prices are small
• Scaling out increases cost and complexity
Ryft delivers huge benefits in a small package.
Highest performance per watt and lowest total cost of ownership (TCO) of
any product on the market.
48 TB in 1U
• Data storage is abstracted
as a set of Linux mount
points
• Support native
encryption/decryption with
no loss in performance
(AES 256 Encryption)
Simple API
• C library abstracts internal
FPGA constructs to simplify
programmability, allowing a
programmer to invoke
operations as simple function
calls, returning simple results
• Command line
• Web Interface
Linux Front End
• Linux (Ubuntu 14.04 LTS )
front end - Standard build,
Non restricted OS, apt-get
• API calls FPGA fabric
backend
• Linux services/protocols can
be used
• ssh/scp/rsync/sftp
• Standard monitoring
agents
• Web services
• Security configuration
x86 Architecture vs. Systolic Arrays
Memory
PE
One Clock Cycle
(x86)
Memory
PEPEPE PE PEPE
One Clock Cycle
FPGA- Systolic Array
100 ns
100 ns
FPGA Benefits
x86 FPGA
• General purpose computing
• Sequential in nature
• Non-deterministic performance
• Interrupts
• Memory allocation
• Problems are broken into a sequence of
operations and processed serially
• Increasing number of instructions
• Increased overhead
• Increasing required power/cooling
required
• Software can break problems down and
bring parallelism:
• Between processors/cores
• Between servers
• Output combined over interconnects
• Not general purpose
• Purpose built algorithms
• Can be reprogramed via firmware
• Parallel in nature
• Can execute many parallel operations in
one clock cycle
• More output with less power and clock
speed
• ~1000X less instructions to solve the same
problem as x86
• 100% deterministic performance
• No memory fetching or management
• No interrupts
Multi-Dimensional Systolic Arrays
PE PE PE
PE PE PE
PE PE PE
PE PE PE
PE
PE
PE
PE
PE PE PE PE
PE
PE
PE
PE
PE
The Ryft ONE is powered by a breakthrough in
Real-time Data Analysis.
The only 1U platform capable of analyzing streaming, historical,
unstructured, and multi-structured data in real-time at 10 GB/second.
Ryft ONE avoids bottlenecks that strangle conventional systems
by combining these two innovations:
The Ryft Analytics Cortex™
Ryft ONE leverages a massively parallel bitwise
computing architecture to deliver unprecedented
performance from the smallest possible form factor.
The Ryft Algorithm Primitives™ Library
Each Ryft ONE comes with a subscription to this
growing collection of pre-built algorithm components,
and an open API to leverage them.
+
“We see Spark Streaming scales nearly linearly to 100 nodes, and can
process up to 6 GB/s at sub-second latency on 100 nodes for Grep, 2.3
GB/s for the other, more CPU-intensive jobs”
UC Berkley Streaming Computation at Scale
Proprietary | 9
http://www.cs.berkeley.edu/~matei/papers/2013/sosp_spark_streaming.pdf
Ryft transforms datacenter economics.
The Ryft ONE
Costly & Complex Clusters
Search = 10 GB/s
Term Frequency = 2.5 GB/s
Search = 6GB/s
Term Frequency= 2.3 GB/s
Wikipedia Examples
• English XML Dump is offered by Wikipedia
• Total Corpus is 44GB
• Copying the data takes 44 seconds
• Fuzzy search would take 4.4 seconds
• Term Frequency would take 17.6 seconds
Data Exploration Use Case
Data Exploration Use Case
• RDF—understanding
of native formats
• Powerful no-index
search
• Flexible query format
with wildcarding
• Identify relationships
between disparate data
HDFS
Data Triage for Hadoop/Spark Use Case
Raw Data
M/R
noSQL Hive
Text
Index
Application
Hours?
Days?
Search / Minimize
@10GB/s
Data Triage for Hadoop/Spark Use Case
Ingest @ 1-4GB/s
Seconds!
HDFS
• Social media signal/noise
• Fuzzy searching at line rate
@badguy1
@badguy2
@badguy01
@badboy01
Search: “badguy??”
Organizations who want real-time insights into all their data
Large data sets (changing, structured & unstructured, Text, Binary, Imaging)
High Velocity Data
• Logging
• Ad Data
• Twitter
Forensics & Legal Discovery
• Host based forensics
• E-discovery
Scientific Data
• Genomics
• Sensor Data
Financial
• Compliance
• Fraud Detection
Cyber Security
• PCAP
• Full packet capture
• Binary Analysis
Imagery Analysis
• Change Analysis
• High Performance Rendering
Revisiting Performance Results
Ryft ONE closes the industry’s data analytics performance gap
by combining the following into a single architecture:
 Parallel FPGA architectures to accelerate performance
 Dedicated storage/access/RAM
 Elimination of data security performance bottlenecks
 Elimination of operating system and high level language overhead
 Minimizing the need to move data
Use Case
Single Ryft ONE
Throughput
Spark Cluster to Match
Performance
Search ~10GB/sec > 100 nodes1
Fuzzy Search ~10GB/sec 100-200 nodes2
Term Frequency ~2.5GB/sec 100 nodes1
Accelerate business insights with the only platform purpose-built
to simultaneously analyze any type of data—historical and
streaming, unstructured and multi-structured—
100X faster with 70% lower TCO.
The Ryft ONE: More data. Less center. Faster insights.
1919
info@ryft.com
Questions

Contenu connexe

Tendances

From Batch to Real Time: Overstock’s Journey Towards Unifying Analytics Acros...
From Batch to Real Time: Overstock’s Journey Towards Unifying Analytics Acros...From Batch to Real Time: Overstock’s Journey Towards Unifying Analytics Acros...
From Batch to Real Time: Overstock’s Journey Towards Unifying Analytics Acros...Databricks
 
PaNDA - a platform for Network Data Analytics: an overview
PaNDA - a platform for Network Data Analytics: an overviewPaNDA - a platform for Network Data Analytics: an overview
PaNDA - a platform for Network Data Analytics: an overviewCisco DevNet
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemDan Eaton
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems
 
Apache Spark and future of advanced analytics
Apache Spark and future of advanced analyticsApache Spark and future of advanced analytics
Apache Spark and future of advanced analyticsMuralidhar Somisetty
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors DataWorks Summit/Hadoop Summit
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonDataWorks Summit/Hadoop Summit
 
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Dataconomy Media
 
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic StackSiscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic StackElasticsearch
 
Upgrading Made Easy: Moving to InfluxDB 2.x or InfluxDB Cloud with Cribl LogS...
Upgrading Made Easy: Moving to InfluxDB 2.x or InfluxDB Cloud with Cribl LogS...Upgrading Made Easy: Moving to InfluxDB 2.x or InfluxDB Cloud with Cribl LogS...
Upgrading Made Easy: Moving to InfluxDB 2.x or InfluxDB Cloud with Cribl LogS...InfluxData
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...GetInData
 
Build a car with Graphs, Fabien Batejat, Volvo Cars
Build a car with Graphs, Fabien Batejat, Volvo CarsBuild a car with Graphs, Fabien Batejat, Volvo Cars
Build a car with Graphs, Fabien Batejat, Volvo CarsNeo4j
 
Architecting a Modern Data Warehouse: Enterprise Must-Haves
Architecting a Modern Data Warehouse: Enterprise Must-HavesArchitecting a Modern Data Warehouse: Enterprise Must-Haves
Architecting a Modern Data Warehouse: Enterprise Must-HavesYellowbrick Data
 
FIWARE Global Summit - Edge/Fog Computing in “Powered by FIWARE” Architectures
FIWARE Global Summit - Edge/Fog Computing in “Powered by FIWARE” ArchitecturesFIWARE Global Summit - Edge/Fog Computing in “Powered by FIWARE” Architectures
FIWARE Global Summit - Edge/Fog Computing in “Powered by FIWARE” ArchitecturesFIWARE
 
Industrial production process visualization with the Elastic Stack in real-ti...
Industrial production process visualization with the Elastic Stack in real-ti...Industrial production process visualization with the Elastic Stack in real-ti...
Industrial production process visualization with the Elastic Stack in real-ti...Elasticsearch
 
Scaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber AttacksScaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber AttacksDatabricks
 
DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)Blake Irvine
 

Tendances (20)

From Batch to Real Time: Overstock’s Journey Towards Unifying Analytics Acros...
From Batch to Real Time: Overstock’s Journey Towards Unifying Analytics Acros...From Batch to Real Time: Overstock’s Journey Towards Unifying Analytics Acros...
From Batch to Real Time: Overstock’s Journey Towards Unifying Analytics Acros...
 
PaNDA - a platform for Network Data Analytics: an overview
PaNDA - a platform for Network Data Analytics: an overviewPaNDA - a platform for Network Data Analytics: an overview
PaNDA - a platform for Network Data Analytics: an overview
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
 
Apache Spark and future of advanced analytics
Apache Spark and future of advanced analyticsApache Spark and future of advanced analytics
Apache Spark and future of advanced analytics
 
The Power of Data
The Power of DataThe Power of Data
The Power of Data
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
 
MLOps with Kubeflow
MLOps with Kubeflow MLOps with Kubeflow
MLOps with Kubeflow
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
 
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
 
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic StackSiscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
 
Upgrading Made Easy: Moving to InfluxDB 2.x or InfluxDB Cloud with Cribl LogS...
Upgrading Made Easy: Moving to InfluxDB 2.x or InfluxDB Cloud with Cribl LogS...Upgrading Made Easy: Moving to InfluxDB 2.x or InfluxDB Cloud with Cribl LogS...
Upgrading Made Easy: Moving to InfluxDB 2.x or InfluxDB Cloud with Cribl LogS...
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...
 
Build a car with Graphs, Fabien Batejat, Volvo Cars
Build a car with Graphs, Fabien Batejat, Volvo CarsBuild a car with Graphs, Fabien Batejat, Volvo Cars
Build a car with Graphs, Fabien Batejat, Volvo Cars
 
Architecting a Modern Data Warehouse: Enterprise Must-Haves
Architecting a Modern Data Warehouse: Enterprise Must-HavesArchitecting a Modern Data Warehouse: Enterprise Must-Haves
Architecting a Modern Data Warehouse: Enterprise Must-Haves
 
FIWARE Global Summit - Edge/Fog Computing in “Powered by FIWARE” Architectures
FIWARE Global Summit - Edge/Fog Computing in “Powered by FIWARE” ArchitecturesFIWARE Global Summit - Edge/Fog Computing in “Powered by FIWARE” Architectures
FIWARE Global Summit - Edge/Fog Computing in “Powered by FIWARE” Architectures
 
Industrial production process visualization with the Elastic Stack in real-ti...
Industrial production process visualization with the Elastic Stack in real-ti...Industrial production process visualization with the Elastic Stack in real-ti...
Industrial production process visualization with the Elastic Stack in real-ti...
 
Scaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber AttacksScaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber Attacks
 
DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)
 
Data-Driven @ Netflix
Data-Driven @ NetflixData-Driven @ Netflix
Data-Driven @ Netflix
 

En vedette

Using Performance Management Data to drive strategic decisions and company pe...
Using Performance Management Data to drive strategic decisions and company pe...Using Performance Management Data to drive strategic decisions and company pe...
Using Performance Management Data to drive strategic decisions and company pe...plushr
 
Edge-Fog Cloud: Scaling IoT computations on the edge
Edge-Fog Cloud: Scaling IoT computations on the edgeEdge-Fog Cloud: Scaling IoT computations on the edge
Edge-Fog Cloud: Scaling IoT computations on the edgeNitinder Mohan
 
Rock Report: Fitness Technology for Athletes by @Rock_Health
Rock Report: Fitness Technology for Athletes by @Rock_HealthRock Report: Fitness Technology for Athletes by @Rock_Health
Rock Report: Fitness Technology for Athletes by @Rock_HealthRock Health
 
OpenStack NFV Edge computing for IOT microservices
OpenStack NFV Edge computing for IOT microservicesOpenStack NFV Edge computing for IOT microservices
OpenStack NFV Edge computing for IOT microservicesopenstackindia
 
Rock Report: Big Data by @Rock_Health
Rock Report: Big Data by @Rock_HealthRock Report: Big Data by @Rock_Health
Rock Report: Big Data by @Rock_HealthRock Health
 
Advanced Packaging Role after Moore’s Law: Transition from Technology Node Er...
Advanced Packaging Role after Moore’s Law: Transition from Technology Node Er...Advanced Packaging Role after Moore’s Law: Transition from Technology Node Er...
Advanced Packaging Role after Moore’s Law: Transition from Technology Node Er...Yole Developpement
 
Internet of Things: Programming on the edge
Internet of Things: Programming on the edgeInternet of Things: Programming on the edge
Internet of Things: Programming on the edgeScott Thibault
 

En vedette (9)

Using Performance Management Data to drive strategic decisions and company pe...
Using Performance Management Data to drive strategic decisions and company pe...Using Performance Management Data to drive strategic decisions and company pe...
Using Performance Management Data to drive strategic decisions and company pe...
 
Edge-Fog Cloud: Scaling IoT computations on the edge
Edge-Fog Cloud: Scaling IoT computations on the edgeEdge-Fog Cloud: Scaling IoT computations on the edge
Edge-Fog Cloud: Scaling IoT computations on the edge
 
Rock Report: Fitness Technology for Athletes by @Rock_Health
Rock Report: Fitness Technology for Athletes by @Rock_HealthRock Report: Fitness Technology for Athletes by @Rock_Health
Rock Report: Fitness Technology for Athletes by @Rock_Health
 
Edge-Fog Cloud
Edge-Fog CloudEdge-Fog Cloud
Edge-Fog Cloud
 
OpenStack NFV Edge computing for IOT microservices
OpenStack NFV Edge computing for IOT microservicesOpenStack NFV Edge computing for IOT microservices
OpenStack NFV Edge computing for IOT microservices
 
Rock Report: Big Data by @Rock_Health
Rock Report: Big Data by @Rock_HealthRock Report: Big Data by @Rock_Health
Rock Report: Big Data by @Rock_Health
 
Cloud, Fog & Edge Computing
Cloud, Fog & Edge ComputingCloud, Fog & Edge Computing
Cloud, Fog & Edge Computing
 
Advanced Packaging Role after Moore’s Law: Transition from Technology Node Er...
Advanced Packaging Role after Moore’s Law: Transition from Technology Node Er...Advanced Packaging Role after Moore’s Law: Transition from Technology Node Er...
Advanced Packaging Role after Moore’s Law: Transition from Technology Node Er...
 
Internet of Things: Programming on the edge
Internet of Things: Programming on the edgeInternet of Things: Programming on the edge
Internet of Things: Programming on the edge
 

Similaire à Supercharging Data Performance for Real-Time Data Analysis

OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMichael Hiskey
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarKognitio
 
Sparc t4 systems customer presentation
Sparc t4 systems customer presentationSparc t4 systems customer presentation
Sparc t4 systems customer presentationsolarisyougood
 
DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchDataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchHakka Labs
 
Exadata architecture and internals presentation
Exadata architecture and internals presentationExadata architecture and internals presentation
Exadata architecture and internals presentationSanjoy Dasgupta
 
Hyperscan - Mohammad Abdul Awal
Hyperscan - Mohammad Abdul AwalHyperscan - Mohammad Abdul Awal
Hyperscan - Mohammad Abdul Awalharryvanhaaren
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learnJohn D Almon
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseDataStax
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3Simon Ambridge
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit MumbaiAnand Haridass
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Michael Hiskey
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectureshypertable
 
High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016Eric Sammer
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 

Similaire à Supercharging Data Performance for Real-Time Data Analysis (20)

OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
 
Sparc t4 systems customer presentation
Sparc t4 systems customer presentationSparc t4 systems customer presentation
Sparc t4 systems customer presentation
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchDataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series search
 
Exadata architecture and internals presentation
Exadata architecture and internals presentationExadata architecture and internals presentation
Exadata architecture and internals presentation
 
Hyperscan - Mohammad Abdul Awal
Hyperscan - Mohammad Abdul AwalHyperscan - Mohammad Abdul Awal
Hyperscan - Mohammad Abdul Awal
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectures
 
High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 

Dernier

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 

Dernier (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 

Supercharging Data Performance for Real-Time Data Analysis

  • 1. 11 Supercharging Data Performance for Real-Time Data Analysis
  • 2. 2 Information—the fuel of business—is trapped in analysis platforms built on 70-year old architectures.
  • 3. 3 Data volume and velocity challenge traditional computing methods Traditional Approach: • Commodity x86 based servers • Cluster with open source software • Scale for volume • Scale for parallelism / performance Challenges: • High level languages can be inefficient • Data intensive workloads drive in-memory solutions • DRAM footprints at commodity prices are small • Scaling out increases cost and complexity
  • 4. Ryft delivers huge benefits in a small package. Highest performance per watt and lowest total cost of ownership (TCO) of any product on the market. 48 TB in 1U • Data storage is abstracted as a set of Linux mount points • Support native encryption/decryption with no loss in performance (AES 256 Encryption) Simple API • C library abstracts internal FPGA constructs to simplify programmability, allowing a programmer to invoke operations as simple function calls, returning simple results • Command line • Web Interface Linux Front End • Linux (Ubuntu 14.04 LTS ) front end - Standard build, Non restricted OS, apt-get • API calls FPGA fabric backend • Linux services/protocols can be used • ssh/scp/rsync/sftp • Standard monitoring agents • Web services • Security configuration
  • 5. x86 Architecture vs. Systolic Arrays Memory PE One Clock Cycle (x86) Memory PEPEPE PE PEPE One Clock Cycle FPGA- Systolic Array 100 ns 100 ns
  • 6. FPGA Benefits x86 FPGA • General purpose computing • Sequential in nature • Non-deterministic performance • Interrupts • Memory allocation • Problems are broken into a sequence of operations and processed serially • Increasing number of instructions • Increased overhead • Increasing required power/cooling required • Software can break problems down and bring parallelism: • Between processors/cores • Between servers • Output combined over interconnects • Not general purpose • Purpose built algorithms • Can be reprogramed via firmware • Parallel in nature • Can execute many parallel operations in one clock cycle • More output with less power and clock speed • ~1000X less instructions to solve the same problem as x86 • 100% deterministic performance • No memory fetching or management • No interrupts
  • 7. Multi-Dimensional Systolic Arrays PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE
  • 8. The Ryft ONE is powered by a breakthrough in Real-time Data Analysis. The only 1U platform capable of analyzing streaming, historical, unstructured, and multi-structured data in real-time at 10 GB/second. Ryft ONE avoids bottlenecks that strangle conventional systems by combining these two innovations: The Ryft Analytics Cortex™ Ryft ONE leverages a massively parallel bitwise computing architecture to deliver unprecedented performance from the smallest possible form factor. The Ryft Algorithm Primitives™ Library Each Ryft ONE comes with a subscription to this growing collection of pre-built algorithm components, and an open API to leverage them. +
  • 9. “We see Spark Streaming scales nearly linearly to 100 nodes, and can process up to 6 GB/s at sub-second latency on 100 nodes for Grep, 2.3 GB/s for the other, more CPU-intensive jobs” UC Berkley Streaming Computation at Scale Proprietary | 9 http://www.cs.berkeley.edu/~matei/papers/2013/sosp_spark_streaming.pdf
  • 10. Ryft transforms datacenter economics. The Ryft ONE Costly & Complex Clusters Search = 10 GB/s Term Frequency = 2.5 GB/s Search = 6GB/s Term Frequency= 2.3 GB/s
  • 11. Wikipedia Examples • English XML Dump is offered by Wikipedia • Total Corpus is 44GB • Copying the data takes 44 seconds • Fuzzy search would take 4.4 seconds • Term Frequency would take 17.6 seconds
  • 13. Data Exploration Use Case • RDF—understanding of native formats • Powerful no-index search • Flexible query format with wildcarding • Identify relationships between disparate data
  • 14. HDFS Data Triage for Hadoop/Spark Use Case Raw Data M/R noSQL Hive Text Index Application Hours? Days?
  • 15. Search / Minimize @10GB/s Data Triage for Hadoop/Spark Use Case Ingest @ 1-4GB/s Seconds! HDFS • Social media signal/noise • Fuzzy searching at line rate @badguy1 @badguy2 @badguy01 @badboy01 Search: “badguy??”
  • 16. Organizations who want real-time insights into all their data Large data sets (changing, structured & unstructured, Text, Binary, Imaging) High Velocity Data • Logging • Ad Data • Twitter Forensics & Legal Discovery • Host based forensics • E-discovery Scientific Data • Genomics • Sensor Data Financial • Compliance • Fraud Detection Cyber Security • PCAP • Full packet capture • Binary Analysis Imagery Analysis • Change Analysis • High Performance Rendering
  • 17. Revisiting Performance Results Ryft ONE closes the industry’s data analytics performance gap by combining the following into a single architecture:  Parallel FPGA architectures to accelerate performance  Dedicated storage/access/RAM  Elimination of data security performance bottlenecks  Elimination of operating system and high level language overhead  Minimizing the need to move data Use Case Single Ryft ONE Throughput Spark Cluster to Match Performance Search ~10GB/sec > 100 nodes1 Fuzzy Search ~10GB/sec 100-200 nodes2 Term Frequency ~2.5GB/sec 100 nodes1
  • 18. Accelerate business insights with the only platform purpose-built to simultaneously analyze any type of data—historical and streaming, unstructured and multi-structured— 100X faster with 70% lower TCO. The Ryft ONE: More data. Less center. Faster insights.

Notes de l'éditeur

  1. Legacy proprietary platforms are too slow and costly No real-time performance; limited data formats Priced out of the range of all but the largest enterprises Hadoop/Spark running on clusters are slow, complex, and brittle Significant technology, performance, and knowledge gaps remain Slow and complex setup and maintenance; X86 architecture is not sustainable Demand for knowledgeable developers far exceeds supply Need purpose built solutions that are open, high speed, and sustainable Top ISV/OEMs working to unlock power of new architectures Enterprises developing homegrown servers Hyper growth emerging markets for applying HPC resources to data analysis
  2. x86 servers are used universally across many problem areas: Data analysis Search Simulation Machine learning Genome sequencing Graph processing Scale-out x86 clusters have advantages but also many drawbacks: Increased node count for to meet DRAM footprint requirements Increased node count for CPU core requirements Inefficient high level languages Overhead of distributing data and combining results Datacenter sprawl Complex deployments Increased operational cost A New Approach is Needed Highly distributed memory architectures turn complex analytics problems into I/O problems, because they must frequently move data between physically distributed memory, disk storage, processors, & networked nodes. The rising class of complex analytic workloads demands strong communications and near-real-time turnaround. Trying to partition (slice) these problems into smaller pieces that can run independently is like trying to cut a human into dozens of chunks and expecting each chunk to go on living Commodity Hardware Clusters using Hadoop/Spark are designed for compute-intensive workloads, not data analyticsWithout purpose-built solutions for Big Data Analytics challenges, IT has been forced to piecemeal a solution and scale out to larger and larger commodity hardware clusters that are strangled by i/o performance bottlenecks MapReduce/Hadoop tools were originally designed to run relatively simple, non-real-time tasks on highly distributed architectures such as clusters and clouds; these workloads frequently make the slow journey out to disk and back Spark operates on similar principles but more efficiently — it saves up multiple tasks before going out to disk
  3. JSON – Java Script Object Notation, ODBC – open database connection, ODATA – open data protocol
  4. Footprint Comparison
  5. Years in the making, the Ryft ONE combines two proven innovations in hardware and software to optimize compute, storage and IO performance: Fast Actionable Business Insights Analyze historical and streaming data at an unprecedented 10 Gigabytes per second or faster Traditional Clustered Systems Big Data Analytics challenges by re-engineering old technologies to try to make them faster Ryft’s revolutionary innovations in hardware and software dramatically reduce Mean Time to Decisions
  6. High Velocity Data  These are use cases where the data arrives so rapidly that the indexing approaches don’t work well without expensive scaling and licensing. Logging (enterprise level syslog or flume) Ad data (Admeld) Click stream (web logs) Twitter firehose Scientific Data These are use cases where the data doesn’t format well for tokenizers and indexers. Genomics (sequencing / bowtie and like algorithms) Other sensor data Financial  These check multiple data sources to determine the legitimacy of an action.  The turnaround time determines if it is a forensic finding or circumvents the incident. Compliance Fraud Detection Forensics and Legal Discovery These users get data in a large package that can take vast amounts of time to index and sometimes indexing isn’t possible due to unfamiliar formats that aren’t parsed and text extracted.  Our brute force comparison methods sidestep many of these issues and allow analysts to find key pieces of data in seconds vs. days.  Host based forensics on disk images E-discovery E-mail Databases Documents Messaging servers Copier hard drive images Cyber Security PCAP Full packet capture (includes payload analysis) Binary analysis (malware/virus) Configuration file diff checking Imagery Analysis Change analysis Military airborne sensors Security cameras Aircraft radar Astronomy  High Performance Rendering
  7. The node/cluster configurations are noted in the footnotes in the slide, and also in the notes below. They were taken directly from published literature, which is why they differ across search/fuzzy/TF vs. Sort. Sort was a more recent publication which used higher-end hardware. Each node in the Spark cluster for the search, fuzzy search and term frequency operations consisted of m1.xlarge EC2 nodes made up of 4 cores, 15GB RAM and 1.68TB storage each, as taken from an academic publication by UCB: http://www.cs.berkeley.edu/~matei/papers/2013/sosp_spark_streaming.pdf, and also from Amazon EC2 configuration information: http://aws.amazon.com/ec2/previous-generation/ Spark cluster configuration for the sort operation was taken from a more recent publication (https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html) where they called out extremely high-end (and highest cost as compared to any other EC2 instance at $6.82/hour) EC2 instances where each i2.8xlarge node consists of: 32 cores, 244GB RAM, and 6.4TB of storage per node! That’s an amazing and costly amount of resources! The performance of any sort algorithm is highly dependent on the size of the sort key and the size of its accompanying data record. Ryft ONE’s worst case is on the order of 1GB/sec, and a typical real-world case can be upwards of 10GB/sec. The equivalent number of Spark nodes for Sort is estimated at approximately 65 nodes. This estimate stems from an analysis of the latest Spark sort benchmark performance numbers as published in https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html coupled with estimated Spark performance degradation (at approximately 50%) when moving from the non-real-world sort benchmarks employed to a more realistic real-world sort. Even if the assumptions and estimates are off (say even by a factor of 2), the fact that a single 1U Ryft ONE can achieve the sort performance of a large cluster of nodes where each node is 32 cores, 244GB RAM and 6.4TB is simply amazing.
  8. Massively valuable businesses and applications will be built off of rapidly increasing volume, velocity and variety of data. Today, most enterprise big data initiatives struggle to make it out of prototype stage, because current tools like Hadoop and Spark are complex to build and maintain, limited in capabilities, and built upon server clusters using von Neumann architectures designed 70 years ago. Today’s x86 architectures which rely on these legacy architectures are not designed for high performance data analysis and cannot do what companies need them to answer the questions they need to ask. Businesses need a new category of high performance, open. and low maintenance platform that supports the volume, velocity and variety of big data—at a price tag that makes high performance computing capabilities attainable by all businesses. Massively valuable businesses and applications can be built on the Ryft platform to enable companies to do things never before possible while transforming data center economics.