Orion is a petabyte scale AI platform developed by the Big Data and Insights (BDAI) team at Oath to generate actionable insights from large datasets through scalable machine learning. The platform can process over 60 billion records per day from a variety of data sources and uses techniques like anomaly detection and predictive algorithms to provide insights that improve efficiencies, reduce costs, and enhance customer experiences. Orion offers a centralized architecture and suite of APIs to build custom solutions for applications in advertising, marketing, IoT, and other markets at an enterprise scale.
3. 3
6
Big Data & Insights
The BDAI team focuses on developing
advanced data products that generate
actionable insights on datasets through
scalable machine learning
These insights can lead to:
• Improved efficiencies
• Cost mitigation
• Enhanced customer experiences
based on preferences
5. 5
IOT case study: “Needle in haystack problem”
SolutionProblem
Improve operational
efficiency for M2M
deployments
+ + =
Expected
results
IoT Analytics platform anomaly detection services
• Identified devices that might need attention
• Univariate and multivariate anomaly detection algorithms to
discover unusual scenarios which cannot be found using rule
based approaches
Operational
efficiency
through
improved
availability and
cost saving
through
reduction of
truck rolls
Network
data
Device
types
Anomaly
detection
algorithm
Anomalous
devices
6. 6
Need for horizontal Big Data AI Platform
ThingSpace
IoT
sensor
Fixed
network
Oath
Datasets
Wireless
network
Third
party
Public
Your data
Third-party data
Video
Large scale
Machine Learning
Predictive algorithms
BI & reporting
Optimization
Recommendation engine
Anomaly detection
Domain specific rules engine
Artificial Intelligence
Deep learning
Bots Technology
Natural Language
Automated Reasoning
Extraction
Loading
Advertising
Marketing
Consumer lifestyle
segmentation services
Customer service
Transformation
APIs
Data sources
Big Data & AI platform
with privacy controls
Vertical market
IoT services
Video analytics
Cybersecurity
Visualization
8. 8
Big Data and AI platform software architecture
Single platform
Comprehensive data
analytics platform
Improve efficiency
Architected and
optimized for
enterprise needs at
scale
Sophisticated
predictive analytics
Supports complex
analytics use cases
BDA client
(HTTPS)
Data analysis
(IP filtering)
API access
(HTTPS)
Portal access
(SSO)
File send
File
streaming
Secure VLAN
Kafka streaming
infrastructure
Spark stream
processing
Hadoop/ Yarn
Solr | HDFS | Spark | HBase
Raw
data
Compound
data
Monitoring Log indexing/alerting
Firewall
Firewall
Firewall
Secure
VLAN
APIs
OLAP
BDA
web-servi
ces
Users
Reporting
tool
Stream/
file pull
Portal
Firewall
Secure
VLAN
Data
ingestion
services
PII data
hashing
Batch ETL
Secure VLAN
Secure VLAN Secure VLAN
Privacy controls
Privacy controls
9. 9
Architecture continued
File streaming
File send
BDA client
Streaming infrastructure Spark stream processing
Real-time streaming/Batch
ETL (Lambda architecture)
ComputedData
Data
ingestion
services
ETL
UI Portals/
Dashboards
BDA Web services
OLAP
data transformation
Data transformation
Data analysis
Portal access
BDA client
Filepull/Streampull
Portal
mlLib
API access
Monitoring
RawData
Realtime
Hadoop/YARN
Batch
Log indexing/Alerting
Batch
Real-time
Streaming events
applications/devices
Privacy controls
Privacy controls
mlLib/Tensorflow/MxNet
10. 10
Trapezium - Application Management Framework
Features and Benefits
• Common Framework for flow
control to break down each
business problem into smaller
independent transactions.
• Built on top of Spark and is
written in Scala.
• Configuration based source
changes and transaction
management.
• Multiple Model output
comparison at different time
windows.
• Reduction in development time
with reuse and standardization.
15. 15
Improve efficiency in building AI-based data productsEfficiency
Time
Centralized Big
Data and Artificial
Intelligence
platform
Alternatives
(Requires trial and
error testing)
16. 16
Orion capabilities
Features
• Security & Privacy controls
• Enterprise specific data and algorithm pipelines with batch and streaming services
• Powerful suite of APIs to build custom solutions
• Rich geo-spatial, temporal and comparative visualizations
• Enterprise grade AI and ML at scale for advertising, marketing, IoT and other
markets
• Multi-tenancy and high availability
• Trapezium https://github.com/Verizon/trapezium
60B
Records processed
per day by a single
platform
20 PB
150TB
16000 Cores
Easily scalable
>65M
Subscribers
32M
Variables
each
1M
Records
streamed per
second
30+
Days of
traffic
~5 sec
To receive
results
x x =