SlideShare une entreprise Scribd logo
1  sur  37
www.twosigma.com
Responsive and Scalable Real-time
Data Analytics
September 13, 2018
Cecilia Ye
Presented to SHPE 11/2/2017
Disclaimer
September 13, 2018
This document is being distributed for informational and educational purposes only and is not an offer to sell or the solicitation of an offer to buy
any securities or other instruments. The information contained herein is not intended to provide, and should not be relied upon for, investment
advice. The views expressed herein are not necessarily the views of Two Sigma Investments, LP or any of its affiliates (collectively, “Two
Sigma”). Such views reflect the assumptions of the author(s) of the document and are subject to change without notice. The document may
employ data derived from third-party sources. No representation is made by Two Sigma as to the accuracy of such information and the use of
such information in no way implies an endorsement of the source of such information or its validity.
The copyrights and/or trademarks in some of the images, logos or other material used herein may be owned by entities other than Two Sigma. If
so, such copyrights and/or trademarks are most likely owned by the entity that created the material and are used purely for identification and
comment as fair use under international copyright and/or trademark laws. Use of such image, copyright or trademark does not imply any
association with such organization (or endorsement of such organization) by Two Sigma, nor vice versa.
About Me
September 13, 2018
Engineer at Two Sigma
Lead a team that builds analytics engines and data dashboard
platforms that provide real-time monitoring
Agenda
What is streaming analytics?
Reactive principles: Framework for building real-time analytics
Case Study: Real-time data analytics engine
VS
Data in MotionData at Rest
v
Analytics done after
the data creating events
have occurred
Analytics happens
in real-time
as events take place
VS
Stream OrientedBatch Oriented
v
Data captured in data warehouses
& Processed some time later
in a scheduled batch job
Continuous computation &
Extract information as soon as
data arrives
Real-time analytics is valuable to uses cases in many fields…
Monitor financial markets and trading systems
Detect fraudulent credit card activity as it happens
Identify anomalies in telemetry collected from
home automation systems
Key Considerations
Fast
Respond instantly (or near instantly) to new information
Scalable
Able to handle varying incoming workloads
Resilient
Able to handle various failure conditions gracefully
Responsive
Respond to users in a timely fashion
Agenda
What is streaming analytics?
Reactive principles: Framework for building real-time analytics
Case Study: Real-time data analytics engine
Readily responsive to a stimulus
Reactive
- Merriam Webster
Key Considerations Revisited
Fast
Respond instantly (or near instantly) to new information
Scalable
Able to handle varying incoming workloads
Resilient
Able to handle various failure conditions gracefully
Responsive
Respond to users in a timely fashion
Key Considerations
Fast
Respond instantly (or near instantly) to new information
Scalable
Able to handle varying incoming workloads
Resilient
Able to handle various failure conditions gracefully
Responsive
Respond to users in a timely fashion
React To Events
Key Considerations
Fast
Respond instantly (or near instantly) to new information
Scalable
Able to handle varying incoming workloads
Resilient
Able to handle various failure conditions gracefully
Responsive
Respond to users in a timely fashion
React To Events
React To Load
Key Considerations
Fast
Respond instantly (or near instantly) to new information
Scalable
Able to handle varying incoming workloads
Resilient
Able to handle various failure conditions gracefully
Responsive
Respond to users in a timely fashion
React To Events
React To Load
React To Failures
Key Considerations
Fast
Respond instantly (or near instantly) to new information
Scalable
Able to handle varying incoming workloads
Resilient
Able to handle various failure conditions gracefully
Responsive
Respond to users in a timely fashion
React To Events
React To Load
React To Failures
React To Users
A model of concurrent computation
Provides an abstraction for supporting reactive
principles
Actor Model
Actor
Primitive of concurrent computation
Can hold and modify its own private state,
but no shared mutable state
How do Actors communicate?
A Real-life analogy
Send to a friend …
How do Actors communicate?
A Real-life analogy
The communication is asynchronous
Use messages to
communicate
Actor A
Actor
B
M
Decouples the sending
and receiving of
messages
Actor B may or may
not have to respond to
actor A
Non-blocking response
Data flows respond automatically to
propagating changes
Data-flow
Focused
Event-based
Non-
blocking
Availability of new information drives the
logic forward
Emphasizes asynchronous techniques &
non-blocking execution
Reactive Key Traits
Agenda
What is streaming analytics?
Reactive principles: Framework for building real-time analytics
Case Study: Real-time data analytics engine
Real time &
Throughput
Guarantees
Minimize latency
between new
information and output
of results, even under
high loads
Design Considerations
Real time &
Throughput
Guarantees
Minimize latency
between new
information and output
of results, even under
high loads
Correctness
Guarantees
Streaming analysis
must be accurate and
consistent with results
as if processed in
batch
Design Considerations
Real time &
Throughput
Guarantees
Minimize latency
between new
information and output
of results, even under
high loads
Correctness
Guarantees
Streaming analysis
must be accurate and
consistent with results
as if processed in
batch
Complex
Transformations
Customizable
analytics functions &
Handle different data
formats
Design Considerations
Real time &
Throughput
Guarantees
Minimize latency
between new
information and output
of results, even under
high loads
Correctness
Guarantees
Streaming analysis
must be accurate and
consistent with results
as if processed in
batch
Complex
Transformations
Customizable
analytics functions &
Handle different data
formats
Handle out-of-
order or late
data
Keep track of late
arriving data and
manage the ordering
correctly
Design Considerations
Real time &
Throughput
Guarantees
Minimize latency
between new
information and output
of results, even under
high loads
Correctness
Guarantees
Streaming analysis
must be accurate and
consistent with results
as if processed in
batch
Complex
Transformations
Business-specific
analytics functions &
Handle different data
formats
Handle out-of-
order or late
data
Keep track of late
arriving data and
manage the ordering
correctly
Reliability
Resilient to failures,
including problems of
upstream data source
Design Considerations
Implementation
• Uses Akka, a toolkit that supports building actor systems on the JVM
• Clean separation between “plumbing and wiring” and data
transformation logic
• Allow us to focus more on the functionality and analytics & less on the
low-level wiring of asynchronous programming
Sources
Trade Data
Publisher
Actor A
Market Data
Publisher
Actor A
Trade Data
Publisher
Actor B
Transformations & Analysis Sinks
Join Function
Actor
Aggregation
Function
Actor
Bespoke
Analysis
Actor
Filter
Actor
In-Memory
Cache Actor
MMaped
Cache Actor
DB Writer
Actor
Real-time
Data
Example Data Flow
Sources
Trade Data
Publisher
Actor A
Market Data
Publisher
Actor A
Trade Data
Publisher
Actor B
Real-time
Data
Data can come from a
many sources
Could be unbounded
flows of data
Sources
Trade Data
Publisher
Actor A
Market Data
Publisher
Actor A
Trade Data
Publisher
Actor B
Transformations & Analysis
Join Function
Actor
Aggregation
Function
Actor
Bespoke
Analysis
Actor
Filter
Actor
Real-time
Data
New information flows through the
system as messages between actors
Continuously calculates
statistics and metrics on-
the-fly from live streams of
data
Transformations & Analysis
Join Function
Actor
Aggregation
Function
Actor
Bespoke
Analysis
Actor
Filter
Actor
Analysis decomposed
into multiple discrete
steps, each represented
by an actor
Composable Workflows:
Chain together a
composition of functions
to form a data analysis
pipeline
Transformations & Analysis
Join Function
Actor
Aggregation
Function
Actor
Bespoke
Analysis
Actor
Filter
Actor
A vocabulary of reusable
functional transformations
offers solutions to most
analytics problems
Allow custom logic
encapsulated in an actor
construct to solve
problems that are more
business-specific
Sources
Trade Data
Publisher
Actor A
Market Data
Publisher
Actor A
Trade Data
Publisher
Actor B
Transformations & Analysis Sinks
Join Function
Actor
Aggregation
Function
Actor
Bespoke
Analysis
Actor
Filter
Actor
In-Memory
Cache Actor
MMaped
Cache Actor
DB Writer
Actor
…
Real-time
Data
The results can have
many destinations
Dashboard
& Visualization
Data
Storage
Hardware and configurations: One VM with 15 vCPUs, 96 GB Memory, Linux Debian Wheezy OS
Metric Sizes and units
Typical load 4k-20k events per second
Peak capability 150k events per second
Number of Actors 7,000+
Typical time between data
arrival and processing
Milliseconds under typical load;
seconds under high load
Analytics Engine Capabilities and Performance
Thank you

Contenu connexe

Tendances

FrugalML: Using ML APIs More Accurately and Cheaply
FrugalML: Using ML APIs More Accurately and CheaplyFrugalML: Using ML APIs More Accurately and Cheaply
FrugalML: Using ML APIs More Accurately and CheaplyDatabricks
 
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...Cambridge Semantics
 
Accelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success StoriesAccelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success StoriesCambridge Semantics
 
Smart Data Webinar: Transforming Industries with Artificial Intelligence (AI)...
Smart Data Webinar: Transforming Industries with Artificial Intelligence (AI)...Smart Data Webinar: Transforming Industries with Artificial Intelligence (AI)...
Smart Data Webinar: Transforming Industries with Artificial Intelligence (AI)...DATAVERSITY
 
Data Discoverability at SpotHero
Data Discoverability at SpotHeroData Discoverability at SpotHero
Data Discoverability at SpotHeroMaggie Hays
 
Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?Cambridge Semantics
 
Fraud prevention is better with TigerGraph inside
Fraud prevention is better with  TigerGraph insideFraud prevention is better with  TigerGraph inside
Fraud prevention is better with TigerGraph insideTigerGraph
 
Satyam open analytics nyc
Satyam open analytics nycSatyam open analytics nyc
Satyam open analytics nycOpen Analytics
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsCambridge Semantics
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo UnstructuredCambridge Semantics
 
IBM Industry Models and Data Lake
IBM Industry Models and Data Lake IBM Industry Models and Data Lake
IBM Industry Models and Data Lake Pat O'Sullivan
 
SJSU Business School: Guest Lecture - Big Data in Business (Sept 28, 2015)
SJSU Business School: Guest Lecture - Big Data in Business (Sept 28, 2015) SJSU Business School: Guest Lecture - Big Data in Business (Sept 28, 2015)
SJSU Business School: Guest Lecture - Big Data in Business (Sept 28, 2015) saravana krishnamurthy
 
Modern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in InsuranceModern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in InsuranceCambridge Semantics
 
Finance and Audit Predictive Analytics
Finance and Audit Predictive AnalyticsFinance and Audit Predictive Analytics
Finance and Audit Predictive AnalyticsBob Samuels
 
How to Build a Smart Data Lake Using Semantics
How to Build a Smart Data Lake Using SemanticsHow to Build a Smart Data Lake Using Semantics
How to Build a Smart Data Lake Using SemanticsCambridge Semantics
 
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!TigerGraph
 
Accelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data FabricAccelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data FabricCambridge Semantics
 
Foundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureFoundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureInside Analysis
 
Supply Chain and Logistics Management with Graph & AI
Supply Chain and Logistics Management with Graph & AISupply Chain and Logistics Management with Graph & AI
Supply Chain and Logistics Management with Graph & AITigerGraph
 

Tendances (20)

FrugalML: Using ML APIs More Accurately and Cheaply
FrugalML: Using ML APIs More Accurately and CheaplyFrugalML: Using ML APIs More Accurately and Cheaply
FrugalML: Using ML APIs More Accurately and Cheaply
 
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
 
Accelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success StoriesAccelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success Stories
 
The Year of the Graph
The Year of the GraphThe Year of the Graph
The Year of the Graph
 
Smart Data Webinar: Transforming Industries with Artificial Intelligence (AI)...
Smart Data Webinar: Transforming Industries with Artificial Intelligence (AI)...Smart Data Webinar: Transforming Industries with Artificial Intelligence (AI)...
Smart Data Webinar: Transforming Industries with Artificial Intelligence (AI)...
 
Data Discoverability at SpotHero
Data Discoverability at SpotHeroData Discoverability at SpotHero
Data Discoverability at SpotHero
 
Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?
 
Fraud prevention is better with TigerGraph inside
Fraud prevention is better with  TigerGraph insideFraud prevention is better with  TigerGraph inside
Fraud prevention is better with TigerGraph inside
 
Satyam open analytics nyc
Satyam open analytics nycSatyam open analytics nyc
Satyam open analytics nyc
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive Analytics
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo Unstructured
 
IBM Industry Models and Data Lake
IBM Industry Models and Data Lake IBM Industry Models and Data Lake
IBM Industry Models and Data Lake
 
SJSU Business School: Guest Lecture - Big Data in Business (Sept 28, 2015)
SJSU Business School: Guest Lecture - Big Data in Business (Sept 28, 2015) SJSU Business School: Guest Lecture - Big Data in Business (Sept 28, 2015)
SJSU Business School: Guest Lecture - Big Data in Business (Sept 28, 2015)
 
Modern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in InsuranceModern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in Insurance
 
Finance and Audit Predictive Analytics
Finance and Audit Predictive AnalyticsFinance and Audit Predictive Analytics
Finance and Audit Predictive Analytics
 
How to Build a Smart Data Lake Using Semantics
How to Build a Smart Data Lake Using SemanticsHow to Build a Smart Data Lake Using Semantics
How to Build a Smart Data Lake Using Semantics
 
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
 
Accelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data FabricAccelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data Fabric
 
Foundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureFoundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information Architecture
 
Supply Chain and Logistics Management with Graph & AI
Supply Chain and Logistics Management with Graph & AISupply Chain and Logistics Management with Graph & AI
Supply Chain and Logistics Management with Graph & AI
 

Similaire à Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye

Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessInside Analysis
 
Disrupting Risk Management through Emerging Technologies
Disrupting Risk Management through Emerging TechnologiesDisrupting Risk Management through Emerging Technologies
Disrupting Risk Management through Emerging TechnologiesDatabricks
 
Architecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data AnalyticsArchitecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data AnalyticsRob Winters
 
driving_business_value_from_real_time_streaming_analytics
driving_business_value_from_real_time_streaming_analyticsdriving_business_value_from_real_time_streaming_analytics
driving_business_value_from_real_time_streaming_analyticsJane Roberts
 
Uses of Data Lakes: Data Analytics Week SF
Uses of Data Lakes: Data Analytics Week SFUses of Data Lakes: Data Analytics Week SF
Uses of Data Lakes: Data Analytics Week SFAmazon Web Services
 
Event Stream Processing SAP
Event Stream Processing SAPEvent Stream Processing SAP
Event Stream Processing SAPGaurav Ahluwalia
 
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Amazon Web Services
 
Incentius - Portfolio of Capabilities
Incentius - Portfolio of CapabilitiesIncentius - Portfolio of Capabilities
Incentius - Portfolio of CapabilitiesSujeet Pillai
 
Elastic Stack: Using data for insight and action
Elastic Stack: Using data for insight and actionElastic Stack: Using data for insight and action
Elastic Stack: Using data for insight and actionElasticsearch
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureMongoDB
 
Transforming Financial Services with Event Streaming Data
Transforming Financial Services with Event Streaming DataTransforming Financial Services with Event Streaming Data
Transforming Financial Services with Event Streaming Dataconfluent
 
Chapter 11Data Visualization and Geographic Information System.docx
Chapter 11Data Visualization and Geographic Information System.docxChapter 11Data Visualization and Geographic Information System.docx
Chapter 11Data Visualization and Geographic Information System.docxcravennichole326
 
Chapter 11Data Visualization and Geographic Information System.docx
Chapter 11Data Visualization and Geographic Information System.docxChapter 11Data Visualization and Geographic Information System.docx
Chapter 11Data Visualization and Geographic Information System.docxketurahhazelhurst
 
Chapter 11Data Visualization and Geographic Information System.docx
Chapter 11Data Visualization and Geographic Information System.docxChapter 11Data Visualization and Geographic Information System.docx
Chapter 11Data Visualization and Geographic Information System.docxbartholomeocoombs
 
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...Amazon Web Services
 

Similaire à Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye (20)

Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven Business
 
Disrupting Risk Management through Emerging Technologies
Disrupting Risk Management through Emerging TechnologiesDisrupting Risk Management through Emerging Technologies
Disrupting Risk Management through Emerging Technologies
 
Architecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data AnalyticsArchitecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data Analytics
 
driving_business_value_from_real_time_streaming_analytics
driving_business_value_from_real_time_streaming_analyticsdriving_business_value_from_real_time_streaming_analytics
driving_business_value_from_real_time_streaming_analytics
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
Uses of Data Lakes: Data Analytics Week SF
Uses of Data Lakes: Data Analytics Week SFUses of Data Lakes: Data Analytics Week SF
Uses of Data Lakes: Data Analytics Week SF
 
Customer Uses of Data Lakes
Customer Uses of Data LakesCustomer Uses of Data Lakes
Customer Uses of Data Lakes
 
Uses of Data Lakes
Uses of Data LakesUses of Data Lakes
Uses of Data Lakes
 
Event Stream Processing SAP
Event Stream Processing SAPEvent Stream Processing SAP
Event Stream Processing SAP
 
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
 
Incentius - Portfolio of Capabilities
Incentius - Portfolio of CapabilitiesIncentius - Portfolio of Capabilities
Incentius - Portfolio of Capabilities
 
Data Lakes in the Wild
Data Lakes in the WildData Lakes in the Wild
Data Lakes in the Wild
 
Elastic Stack: Using data for insight and action
Elastic Stack: Using data for insight and actionElastic Stack: Using data for insight and action
Elastic Stack: Using data for insight and action
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
 
Transforming Financial Services with Event Streaming Data
Transforming Financial Services with Event Streaming DataTransforming Financial Services with Event Streaming Data
Transforming Financial Services with Event Streaming Data
 
Big Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud DetectionBig Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud Detection
 
Chapter 11Data Visualization and Geographic Information System.docx
Chapter 11Data Visualization and Geographic Information System.docxChapter 11Data Visualization and Geographic Information System.docx
Chapter 11Data Visualization and Geographic Information System.docx
 
Chapter 11Data Visualization and Geographic Information System.docx
Chapter 11Data Visualization and Geographic Information System.docxChapter 11Data Visualization and Geographic Information System.docx
Chapter 11Data Visualization and Geographic Information System.docx
 
Chapter 11Data Visualization and Geographic Information System.docx
Chapter 11Data Visualization and Geographic Information System.docxChapter 11Data Visualization and Geographic Information System.docx
Chapter 11Data Visualization and Geographic Information System.docx
 
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
 

Plus de Two Sigma

The State of Open Data on School Bullying
The State of Open Data on School BullyingThe State of Open Data on School Bullying
The State of Open Data on School BullyingTwo Sigma
 
Halite @ Google Cloud Next 2018
Halite @ Google Cloud Next 2018Halite @ Google Cloud Next 2018
Halite @ Google Cloud Next 2018Two Sigma
 
BeakerX - Tiezheng Li
BeakerX - Tiezheng LiBeakerX - Tiezheng Li
BeakerX - Tiezheng LiTwo Sigma
 
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel Hudson
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel HudsonBringing Linux back to the Server BIOS with LinuxBoot - Trammel Hudson
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel HudsonTwo Sigma
 
Waiter: An Open-Source Distributed Auto-Scaler
Waiter: An Open-Source Distributed Auto-ScalerWaiter: An Open-Source Distributed Auto-Scaler
Waiter: An Open-Source Distributed Auto-ScalerTwo Sigma
 
The Language of Compression - Leif Walsh
The Language of Compression - Leif WalshThe Language of Compression - Leif Walsh
The Language of Compression - Leif WalshTwo Sigma
 
Identifying Emergent Behaviors in Complex Systems - Jane Adams
Identifying Emergent Behaviors in Complex Systems - Jane AdamsIdentifying Emergent Behaviors in Complex Systems - Jane Adams
Identifying Emergent Behaviors in Complex Systems - Jane AdamsTwo Sigma
 
Algorithmic Data Science = Theory + Practice
Algorithmic Data Science = Theory + PracticeAlgorithmic Data Science = Theory + Practice
Algorithmic Data Science = Theory + PracticeTwo Sigma
 
HUOHUA: A Distributed Time Series Analysis Framework For Spark
HUOHUA: A Distributed Time Series Analysis Framework For SparkHUOHUA: A Distributed Time Series Analysis Framework For Spark
HUOHUA: A Distributed Time Series Analysis Framework For SparkTwo Sigma
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowTwo Sigma
 
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...Two Sigma
 
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...Two Sigma
 
Graph Summarization with Quality Guarantees
Graph Summarization with Quality GuaranteesGraph Summarization with Quality Guarantees
Graph Summarization with Quality GuaranteesTwo Sigma
 
Rademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeRademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeTwo Sigma
 
Credit-Implied Volatility
Credit-Implied VolatilityCredit-Implied Volatility
Credit-Implied VolatilityTwo Sigma
 
Principles of REST API Design
Principles of REST API DesignPrinciples of REST API Design
Principles of REST API DesignTwo Sigma
 

Plus de Two Sigma (16)

The State of Open Data on School Bullying
The State of Open Data on School BullyingThe State of Open Data on School Bullying
The State of Open Data on School Bullying
 
Halite @ Google Cloud Next 2018
Halite @ Google Cloud Next 2018Halite @ Google Cloud Next 2018
Halite @ Google Cloud Next 2018
 
BeakerX - Tiezheng Li
BeakerX - Tiezheng LiBeakerX - Tiezheng Li
BeakerX - Tiezheng Li
 
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel Hudson
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel HudsonBringing Linux back to the Server BIOS with LinuxBoot - Trammel Hudson
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel Hudson
 
Waiter: An Open-Source Distributed Auto-Scaler
Waiter: An Open-Source Distributed Auto-ScalerWaiter: An Open-Source Distributed Auto-Scaler
Waiter: An Open-Source Distributed Auto-Scaler
 
The Language of Compression - Leif Walsh
The Language of Compression - Leif WalshThe Language of Compression - Leif Walsh
The Language of Compression - Leif Walsh
 
Identifying Emergent Behaviors in Complex Systems - Jane Adams
Identifying Emergent Behaviors in Complex Systems - Jane AdamsIdentifying Emergent Behaviors in Complex Systems - Jane Adams
Identifying Emergent Behaviors in Complex Systems - Jane Adams
 
Algorithmic Data Science = Theory + Practice
Algorithmic Data Science = Theory + PracticeAlgorithmic Data Science = Theory + Practice
Algorithmic Data Science = Theory + Practice
 
HUOHUA: A Distributed Time Series Analysis Framework For Spark
HUOHUA: A Distributed Time Series Analysis Framework For SparkHUOHUA: A Distributed Time Series Analysis Framework For Spark
HUOHUA: A Distributed Time Series Analysis Framework For Spark
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache Arrow
 
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
 
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
 
Graph Summarization with Quality Guarantees
Graph Summarization with Quality GuaranteesGraph Summarization with Quality Guarantees
Graph Summarization with Quality Guarantees
 
Rademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeRademacher Averages: Theory and Practice
Rademacher Averages: Theory and Practice
 
Credit-Implied Volatility
Credit-Implied VolatilityCredit-Implied Volatility
Credit-Implied Volatility
 
Principles of REST API Design
Principles of REST API DesignPrinciples of REST API Design
Principles of REST API Design
 

Dernier

Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 

Dernier (20)

Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 

Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye

  • 1. www.twosigma.com Responsive and Scalable Real-time Data Analytics September 13, 2018 Cecilia Ye Presented to SHPE 11/2/2017
  • 2. Disclaimer September 13, 2018 This document is being distributed for informational and educational purposes only and is not an offer to sell or the solicitation of an offer to buy any securities or other instruments. The information contained herein is not intended to provide, and should not be relied upon for, investment advice. The views expressed herein are not necessarily the views of Two Sigma Investments, LP or any of its affiliates (collectively, “Two Sigma”). Such views reflect the assumptions of the author(s) of the document and are subject to change without notice. The document may employ data derived from third-party sources. No representation is made by Two Sigma as to the accuracy of such information and the use of such information in no way implies an endorsement of the source of such information or its validity. The copyrights and/or trademarks in some of the images, logos or other material used herein may be owned by entities other than Two Sigma. If so, such copyrights and/or trademarks are most likely owned by the entity that created the material and are used purely for identification and comment as fair use under international copyright and/or trademark laws. Use of such image, copyright or trademark does not imply any association with such organization (or endorsement of such organization) by Two Sigma, nor vice versa.
  • 3. About Me September 13, 2018 Engineer at Two Sigma Lead a team that builds analytics engines and data dashboard platforms that provide real-time monitoring
  • 4.
  • 5. Agenda What is streaming analytics? Reactive principles: Framework for building real-time analytics Case Study: Real-time data analytics engine
  • 6. VS Data in MotionData at Rest v Analytics done after the data creating events have occurred Analytics happens in real-time as events take place
  • 7. VS Stream OrientedBatch Oriented v Data captured in data warehouses & Processed some time later in a scheduled batch job Continuous computation & Extract information as soon as data arrives
  • 8. Real-time analytics is valuable to uses cases in many fields… Monitor financial markets and trading systems Detect fraudulent credit card activity as it happens Identify anomalies in telemetry collected from home automation systems
  • 9. Key Considerations Fast Respond instantly (or near instantly) to new information Scalable Able to handle varying incoming workloads Resilient Able to handle various failure conditions gracefully Responsive Respond to users in a timely fashion
  • 10. Agenda What is streaming analytics? Reactive principles: Framework for building real-time analytics Case Study: Real-time data analytics engine
  • 11. Readily responsive to a stimulus Reactive - Merriam Webster
  • 12. Key Considerations Revisited Fast Respond instantly (or near instantly) to new information Scalable Able to handle varying incoming workloads Resilient Able to handle various failure conditions gracefully Responsive Respond to users in a timely fashion
  • 13. Key Considerations Fast Respond instantly (or near instantly) to new information Scalable Able to handle varying incoming workloads Resilient Able to handle various failure conditions gracefully Responsive Respond to users in a timely fashion React To Events
  • 14. Key Considerations Fast Respond instantly (or near instantly) to new information Scalable Able to handle varying incoming workloads Resilient Able to handle various failure conditions gracefully Responsive Respond to users in a timely fashion React To Events React To Load
  • 15. Key Considerations Fast Respond instantly (or near instantly) to new information Scalable Able to handle varying incoming workloads Resilient Able to handle various failure conditions gracefully Responsive Respond to users in a timely fashion React To Events React To Load React To Failures
  • 16. Key Considerations Fast Respond instantly (or near instantly) to new information Scalable Able to handle varying incoming workloads Resilient Able to handle various failure conditions gracefully Responsive Respond to users in a timely fashion React To Events React To Load React To Failures React To Users
  • 17. A model of concurrent computation Provides an abstraction for supporting reactive principles Actor Model
  • 18. Actor Primitive of concurrent computation Can hold and modify its own private state, but no shared mutable state
  • 19. How do Actors communicate? A Real-life analogy Send to a friend …
  • 20. How do Actors communicate? A Real-life analogy The communication is asynchronous
  • 21. Use messages to communicate Actor A Actor B M Decouples the sending and receiving of messages Actor B may or may not have to respond to actor A Non-blocking response
  • 22. Data flows respond automatically to propagating changes Data-flow Focused Event-based Non- blocking Availability of new information drives the logic forward Emphasizes asynchronous techniques & non-blocking execution Reactive Key Traits
  • 23. Agenda What is streaming analytics? Reactive principles: Framework for building real-time analytics Case Study: Real-time data analytics engine
  • 24. Real time & Throughput Guarantees Minimize latency between new information and output of results, even under high loads Design Considerations
  • 25. Real time & Throughput Guarantees Minimize latency between new information and output of results, even under high loads Correctness Guarantees Streaming analysis must be accurate and consistent with results as if processed in batch Design Considerations
  • 26. Real time & Throughput Guarantees Minimize latency between new information and output of results, even under high loads Correctness Guarantees Streaming analysis must be accurate and consistent with results as if processed in batch Complex Transformations Customizable analytics functions & Handle different data formats Design Considerations
  • 27. Real time & Throughput Guarantees Minimize latency between new information and output of results, even under high loads Correctness Guarantees Streaming analysis must be accurate and consistent with results as if processed in batch Complex Transformations Customizable analytics functions & Handle different data formats Handle out-of- order or late data Keep track of late arriving data and manage the ordering correctly Design Considerations
  • 28. Real time & Throughput Guarantees Minimize latency between new information and output of results, even under high loads Correctness Guarantees Streaming analysis must be accurate and consistent with results as if processed in batch Complex Transformations Business-specific analytics functions & Handle different data formats Handle out-of- order or late data Keep track of late arriving data and manage the ordering correctly Reliability Resilient to failures, including problems of upstream data source Design Considerations
  • 29. Implementation • Uses Akka, a toolkit that supports building actor systems on the JVM • Clean separation between “plumbing and wiring” and data transformation logic • Allow us to focus more on the functionality and analytics & less on the low-level wiring of asynchronous programming
  • 30. Sources Trade Data Publisher Actor A Market Data Publisher Actor A Trade Data Publisher Actor B Transformations & Analysis Sinks Join Function Actor Aggregation Function Actor Bespoke Analysis Actor Filter Actor In-Memory Cache Actor MMaped Cache Actor DB Writer Actor Real-time Data Example Data Flow
  • 31. Sources Trade Data Publisher Actor A Market Data Publisher Actor A Trade Data Publisher Actor B Real-time Data Data can come from a many sources Could be unbounded flows of data
  • 32. Sources Trade Data Publisher Actor A Market Data Publisher Actor A Trade Data Publisher Actor B Transformations & Analysis Join Function Actor Aggregation Function Actor Bespoke Analysis Actor Filter Actor Real-time Data New information flows through the system as messages between actors Continuously calculates statistics and metrics on- the-fly from live streams of data
  • 33. Transformations & Analysis Join Function Actor Aggregation Function Actor Bespoke Analysis Actor Filter Actor Analysis decomposed into multiple discrete steps, each represented by an actor Composable Workflows: Chain together a composition of functions to form a data analysis pipeline
  • 34. Transformations & Analysis Join Function Actor Aggregation Function Actor Bespoke Analysis Actor Filter Actor A vocabulary of reusable functional transformations offers solutions to most analytics problems Allow custom logic encapsulated in an actor construct to solve problems that are more business-specific
  • 35. Sources Trade Data Publisher Actor A Market Data Publisher Actor A Trade Data Publisher Actor B Transformations & Analysis Sinks Join Function Actor Aggregation Function Actor Bespoke Analysis Actor Filter Actor In-Memory Cache Actor MMaped Cache Actor DB Writer Actor … Real-time Data The results can have many destinations Dashboard & Visualization Data Storage
  • 36. Hardware and configurations: One VM with 15 vCPUs, 96 GB Memory, Linux Debian Wheezy OS Metric Sizes and units Typical load 4k-20k events per second Peak capability 150k events per second Number of Actors 7,000+ Typical time between data arrival and processing Milliseconds under typical load; seconds under high load Analytics Engine Capabilities and Performance

Notes de l'éditeur

  1. In case you haven't heard of us, Two Sigma is a New York City-based tech company set on redefining the investment management domain harnessing the power of technology, data and math to systematically derive insights from data. founded by a statistician and a computer scientist in 2001 with the goal of applying leading-edge technology to the data-rich world of finance. Although we do a lot of things, at our core we are a company that harnesses data, TS prides itself on having a huge array of data sources; “…and the power of cutting edge TECHNOLOGY… , turns it into models of how the world works… “…to make more rational decisions in the field of investments.” The ability to make sense of large amounts of data from disparate sources in real-time is valuable to us. At Two Sigma, we have many critical use cases that require continuous real-time computation of statistics and metrics from high volumes of streaming data from disparate sources.