There is an overload of stream data that has led to interest in Big Data, while mostly resulting in a signal-to-noise problem. There is not enough attention in the world, nor enough analyst time to keep up with this deluge of data. Most Big Data tools available today are not up to the task. A radical new form of information retrieval is called for. In this webinar, we will show how we envision the future of automated insight discovery. We will show a very fast interactive analytics engine that allows for slicing and dicing data in many ways. We then go a step further to systematically walk through all these analytics - brute force style - to generate what we call "trends."
SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends
1. Finding the Signal in the Noise
June 15, 2015
Webinar Presentation
Nova Spivack, CEO
sales@bottlenose.com
2. What is Bottlenose For?
Bottlenose discovers the threats and
opportunities that impact your business
Bottlenose does this using patented stream
intelligence technology
2
3. Key Stream Intelligence Use-Cases
Threats
• Risk detection
• Crisis mitigation
• Competitive threats
• Reputational threats
• Cyber threat detection
Opportunities
• Audience and customer insights
• Innovation and research
• New business and market opportunities
• Competitive intelligence
• Product and marketing intelligence
3
4. Vision
Stream Intelligence
Our mission is to build the leading business intelligence company for stream data
Stream data is the fastest growing segment of data. It includes all types of live or historical, unstructured or
structured, time-stamped data, such as: email and messaging data, social media, mobile data, news, IT log
data, CRM data, support data, sales data, Web and app analytics data, financial data, sensor and device data.
We have built the first unified platform and application for automating the discovery of actionable
intelligence across any stream data sources – We call this stream intelligence.
4
5. ... the future belongs to raw unstructured or semi-
structured data from both internal and external
sources - increasingly delivered in (near) real-time.
This data has great value yet most organizations
do not have the tech infrastructure to handle all
this data.” - IDC
Problem: Massive growth of unstructured data cannot be
managed effectively with existing Tech infrastructure
Real-Time Discovery Against Streaming
Data is Required:
5
6. ● There are never going to be
enough data scientists or
analysts to cope with the rise
of unstructured stream data in
the enterprise
● Analysts need automated
stream intelligence tools to
help them deal with the
volume, velocity and variety
of stream data
Analysts Are
Drowning in Streams
7. Solution: Bottlenose Automates Stream Intelligence
• Bottlenose provides the most advanced
automated stream intelligence that
automatically finds patterns such as trends,
anomalies, threats, opportunities and
correlations in stream data
• Bottlenose is extremely easy to use and easy to
derive value from right away without extensive
engineering and IT involvement or long
professional solutions
• The platform combines both internal enterprise
data and external data from social, broadcast,
web and other areas.
We are In The Stream Intelligence Sweet Spot
The Bottlenose solution is a new generation of tools that automates the production
of actionable intelligence from stream data
VarietyVelocity
Volume
&
ELK Stack
7
13. Bottlenose
Platform
Social & traditional media
(social networks, blogs,
Forums, newswires)
98% of all live TV & Radio
Broadcasts
Enterprise Data
(Sales, financials, Web
analytics, IT systems, email,
internal databases, etc.) Web Data, commercial data
sources, financial market
data sources, public data
sources
Machine and sensor data
(Internet-of-things, machine
data, weather data, etc.)
13
Generate Actionable Intelligence from ANY Stream Data
13
14. Stream Intelligence Pipeline
Applications
Rules & Agents
Alerts/Actions Based on Business Interests
Stream Data
Storage & API
Long-term storage,
real-time access,
search & APIs
Trend Detection
Extrapolation, Correlation/Clustering
Data Mining & Analytics
~30 Entity Types and ~150 Metrics
Ingestion & Enrichment
Push/Pull of Unstructured/Structured Data
Data in Motion
Alerts &
Actions
New
Patterns
Entities &
Metrics
14
16. Continuous High-Volume Stream Analytics
• 3 billion live + historical messages analyzed every hour
• 72 billion records analyzed per day + predictive analytics on 7.2 billion
• 67,000,000 new messages ingested every day
• Trend detection at a rate of 1 million events per second
• 30 entity types recognized * 150 metrics per entity * 10’s of millions of entities =
~50 to 100 billion time series monitored and analyzed continuously
• Growing to 200 Terabytes of data stored & analyzed continuously in 2015
1000s of High-Level Detected Trends Per Hour
• Automated data science layer applies machine learning, statistics, predictive
analytics to correlate, cluster, predict and analyze emergent trends
We See the Near Future Before Anyone Else
• 80% of the time, our system detects breaking news and emerging threats,
opportunities and keywords up to 10’s to 100’s of minutes ahead of the media,
Twitter, ad networks, etc. Similar advantages against non-text data sources
Key Metrics
Bottlenose
analyzes 72
billion data
records
every day
16
18. Customer Facing Products
● Analytics, intelligence, and
discovery engine
○ Nerve Center
○ Full-stack offering
● Streaming data services to
applications
○ Bottlenose API (Platform)
18
19. ‣ Advanced filtering & aggregations using
simple OLAP interface
‣ “Interactive Analytics” thanks to sub-second
query response time
‣ Add new data sources using central mapping
system
Analytics Engine 19
20. A sophisticated Semantic approach is required to make sense of the raw data. The
structure of data can be derived based on entities/dimensions the system has a pattern for.
Machine learning techniques can begin to make inferences and match to known profiles as
data flows in.
One of the most powerful capabilities is when different data sources need to be compared.
A system like ours automatically normalizes them. For example, when the data has
different time granularity, we automatically align different time periods in order to find
overlaps.
Of course the Semantic engine can also be adjusted with a vertical industries unique facts,
relationships, and jargon.
Need for a Radical New Form of Information Retrieval:
Semantic meaning bottoms up from raw data
20
21. Application - Nerve Center
Our application provides a powerful suite of tools to find business insights in
streaming data:
● Monitor: Real-time monitoring with powerful live visualizations.
● Analyze: Fast interactive analytics to dig deep into the data.
● Discover: Automated insight discovery. Get notified when new patterns
are detected.
● Customize: Reports and live dashboards can be created for any vertical
by mix-and-matching insights & visualizations across any combination of
data streams
21
26. ‣ Typical topic stream like “Beyonce” (Pepsi)
‣ 4M new events (data records) per month
‣ ~8M unique entities tracked per month
‣ ~8M unique entities x 150 metrics x many
time buckets = A lot of data points
‣ And this is just 1 stream. We have thousands
of these running at all times...
Data Points 26
32. Detection Engine
Anticipate
Python servers
for trend
detection &
extrapolation
Detector
Workers that continuously aggregate
entities and fetch corresponding metrics
Context Gathering
Finding additional meta-data around
detections
Time Series Extrapolation
Clustering
Rolling clustering of trends based on
overlapping meta-data and a variety of
distance functions
Analytics Requests
Entities & Time Series
Analytics Requests
Related Entities
Find Related Trends
Related Trends
New and updated trends
32
33. ‣ Python library, using SciPy
‣ Algorithms for detection & extrapolation in
time series data
‣ Includes tooling for debugging, training and
simulating
‣ ~500 detections/CPU-core/second
Anticipate 33