SlideShare une entreprise Scribd logo
1  sur  28
John Repko -- Pikasoft LLC
Using Big Data to your Advantage
It’s not just about toy elephants anymore…
March 19, 2013
John Repko – john.repko@pikasoft.com
Source: http://blog.questionpro.com/2012/12/24/market-research-trends-2013-big-data/
John Repko -- Pikasoft LLC
Big Data Is Not Just About “Big” Data … It’s About FAST Data!
(http://www.pikasoft.com/journal/2011/5/13/not-big-data-fast-data.html)
2
Source: http://www.startribune.com/sports/164830346.htmlSource: https://thedailyload.files.wordpress.com/2010/12/william_perry.jpg
So How Did We Get to Big Data Anyway?
John Repko -- Pikasoft LLC
There Are Big Data Breakthroughs Everywhere…
3
I’ve Heard About Big Data Successes…
“Watson” Wins on Jeopardy Google Wins
the Search
Market
Progressive’s Instant
“Overnight” rate
quotes
Beat the best Jeopardy players ever
Massively parallel
web searches with
results back in a
tenth of a second
Progressive creates
an insurance quote
for every car and
truck in the US –
every night
John Repko -- Pikasoft LLC
How Can I Determine If These Big Data Wins Apply to My Business?
4
• Where do I put the data?
• How do I load the system?
• How do I find the value in the
data?
• How do I present it?
• How long is this going to take?
• How much is this going to cost?
You Need A Proven Approach to Finding the Value in Your Data
Source: http://www.beingjavaguys.com/2013/01/what-is-big-data-introduction-and.html
John Repko -- Pikasoft LLC
The Key is to Recognize That There IS a Pattern to Big Data Wins
• Foresight
– We are presented a pattern – What has the
outcome been when we’ve seen similar patterns in
the past?
• Hindsight
– We are presented an outcome -- What pattern of
events anticipated the outcome in the past?
5
The Variety Of Big Data Wins In The Press Fall Into Just Two Solution Patterns
We Don’t Need Dozens Of Solution Approaches For Big Data – Just Two
John Repko -- Pikasoft LLC
Big Data Wins – Not “10 Problems” But Only 2
6
1. Modeling True Risk
• What past patterns led to success or default?
1. Customer Churn Analysis
• What do customer churn patterns predict about our products and markets?
1. Recommendation Engine
• We have search terms – what have the results been from similar searches in the past?
1. Ad Targeting
• We have profile information – what offers have led to sales for similar profiles in the past?
1. PoS Transaction Analysis
• We have your purchase history – what deals might we offer in the future?
Summary – 10 Common Hadoop-able Problems*
Foresight Hindsight
In This Light, Let’s Take A Look At The “10 Hadoop-able Problems”
* http://info.cloudera.com/TenCommonHadoopableProblemsWhitePaper.html
John Repko -- Pikasoft LLC
Big Data Wins – Not “10 Problems” But Only 2
7
6. Analyzing Data Logs to Forecast Events
• We have your logs – what pattern of events have anticipated failures before?
7. Threat Analysis
• We have a specific event – what results have we seen from similar threats in the past?
8. Trade Surveillance
• Does this parcel raise any alarms, based on our history of past parcel-tracking?
9. Search Quality
• We have a set of search terms – what have similar searches succeeded in finding in the past?
• Data “Sandbox”
• We have your data, possibly unstructured data. What patterns in that data might we bring to your
attention now?
These Two Solution Types Apply Generally To The Hadoop-able Problems
Summary – 10 Common Hadoop-able Problems*
Foresight Hindsight
John Repko -- Pikasoft LLC
Data Warehouse Advanced Analytics Is Expensive and Generally Restricted To
Structured Data
• According to Gartner, Enterprise Data will grow 650% by 2014. 85% of these
data will be “unstructured data”, with a CAGR of 62% per year, far larger than
transactional data
• Growth is taking place in areas not well served by RDBMS’s and DW’s
8
Source http://www.vertica.com/writable/knowledge_articles/file/bi_vertica.pdf: http://thecloudtutorial.com/hadoop-tutorial.html
Structured:
Managed by
RDBMS & DW
Unstructured:
Growth Areas Not
Managed well by
RDBMS or DW
John Repko -- Pikasoft LLC
The Tremendous Growth Of Data Is In Unstructured Data That Is Best Managed
Outside The RDBMS
9
Structured:
Managed by RDBMS
or DW
Unstructured:
Not Managed by
RDBMS or DW
John Repko -- Pikasoft LLC
The New Areas Of Non-RDBMS Managed Data Are Rich In Business Value And
Are Ripe For Analysis
10
Structured:
Managed by
RDBMS
Unstructured:
Not Managed by
RDBMS or DW
John Repko -- Pikasoft LLC
Big Data Stores Are Increasingly Architected With Open-Source Tools
11
Data
Integration
Tools which extract, transform, and
load data between Relational and
Non-Relational datasets.
NoSQL
Data Store
Datasets structured as columnar,
key-value, or document-based in
order to overcome limitations in
traditional relational modeling for
‘Big’ datasets.
Map
Reduce
Languages
Higher-level wrapper languages
which simplify Map Reduce
development efforts.
Map
Reduce
Engine
Cloud
MapReduce
Processes (‘Map’ and ‘Reduce’
functions) which analyze very large
datasets across distributed systems.
John Repko -- Pikasoft LLC
You Have Data. Here’s What You Need to Unlock It
• Load the data in a system equipped
with the tools to analyze it
– Via a standard interface, or
– Programmatically
• Determine valid relationships in the
data
• Analyze the data for these common
patterns
• Tune the analytics
• Visualize the results
• Pursue the patterns that emerge
12
• The system has to live where the data lives (otherwise
transmission costs become prohibitive)
• REST or SOAP are the most common interfaces
• Bloom Filters can provide set operations in large data sets
• ORM (Object-Relational Management) simplifies data access
• Hadoop provides parallelized analysis for unstructured data
• Starfish provides automatic analytics tuning for Hadoop
• Structured data can be analyzed via statistical analysis (for
numbers) or free-text search (for text)
• Solution patterns can be applied automatically once the data is
sandboxed
• Visualization can help to grasp the key patterns and results
Needs Requirements
The Right Platform Can Meet All Of These Requirements
John Repko -- Pikasoft LLC
Additional Tools: With a Platform for Big Data, We Can Expand Our Analysis
with Rich Analytics Tools
13
1. Predictive Modeling
2. Data Visualization
3. Cluster Partitioning
Key Big Data Analytics Solution Patterns
4. Outlier Analysis
5. AB Testing
6. Markov Chains
These Patterns Provide Straightforward Way to Finding Big Data Wins –
Here’s How
Source: http://www.cognizant.com/InsightsCognizantiarticles/Cognizanti_Sow'sEar_Analytics.pdf
John Repko -- Pikasoft LLC
Big Data And Classic Analysis Patterns Are Creating A New Class Of Enterprise
Applications
14
Data Sources Data Processing Data Presentation
Google Chart Tools
Public Data Sets on AWS
These Offerings Emerged In The Consumer Domain And
Enterprise Users Are Coming To Have Similar Expectations
John Repko -- Pikasoft LLC
But New Applications Will Remain Just Curiosities, “One-Offs” Unless The
Underlying Patterns Are Drawn Out
• There’s Nothing New Here: Hadoop is Turing-complete, as are most general-purpose
processing and analytics packages
• To provide richer insights, tools like Hadoop need more advanced processing patterns:
Basic Patterns
Filtering | Parsing | Counting/Summing | Collating | Sorting | Distributed Tasks | Chained Jobs
Advanced Patterns
Distinct | Group By | Secondary Sorts | Joins | Distributed Sorting
Leading-Edge Work
Classification | Clustering | Regression | Dimension Reduction | Evolutionary Code
15
To See More Advanced Patterns and Richer Presentation, The Basic
Patterns Must First Become Routine
John Repko -- Pikasoft LLC
Software Will Capture the Value of Intellectual Property
17
2012 Internet Company Valuations as %Revenue
• Pure services companies generally yield a company valuation of 0.5 to 1.0x Annual Revenue
• Recurring revenue businesses (hosting, support) typically generate 2.5 – 4.0x Revenue
• Product businesses derive their multiples from: growth, product margin, network effects, customer
lock-in, and ecosystem effects) – with a good product, valuations of > 5X Revenue are possible
http://abovethecrowd.com/wp-content/uploads/2011/05/pr_mults.png
John Repko -- Pikasoft LLC
Capturing Trends – Where Is the IT Industry Headed?
18
IT Product Breakthroughs Happen When Technology Advances Invalidate “Old”
Product Assumptions. Here Are The Principal Areas Where Old Assumptions
Will Be Obsoleted.
• 5 major trends
– Big Data: Big Data Just Beginning to Explode
– Cloud: Cloud Computing Market Size – Facts and Trends
– In-Memory: The Coming In-Memory Database Tipping Point
– Handheld: Five Emerging Trends in Analytics
– Real-time: Using Analytics to Create a Sense-and-Respond Organization
John Repko -- Pikasoft LLC
Capturing Trends – Why Bother? Who Cares?
• Big Data:
– According to Michael Stonebraker and Jeremy Kepner the future of Hadoop is doomed
– According to Mike Miller of Cloudant the days are numbered for Hadoop as we know it
• Cloud:
– Even PCI and HIPAA data is evolving into cloud-hosted models
• In-Memory:
– Spinning disk is "the new tape" (overflow, recovery)
• Handheld:
– Mobile Internet devices will outnumber humans this year, Cisco predicts
• Real-time:
– Future of computing technology belongs to handheld devices
19
“You can’t just ask customers what they want and then try to give that to them. By the time you get it built,
they’ll want something new. It took us three years to build the NeXT computer. If we’d given customers
what they said they wanted, we’d have built a computer they’d have been happy with a year after we spoke
to them — not something they’d want now.”
~ Steve Jobs
John Repko -- Pikasoft LLC
The Cloud Provides a Platform For Do It Yourself Analytics
• Why the cloud matters
– Analytics cannot be “do it yourself” until everyone has access to a platform suitable for
holding and processing Big Data.
– Only the cloud has the scale, speed, and availability to process Big Data universally
• What it gives us that is unique and differentiating
– Big Data projects today are 1) expensive, 2) long lead-time, and 3) run on masses of
local hardware. With inevitable commoditization this has to change.
– The trend is to “do it yourself” analytics – if we build the ability to give do it yourself
analytics, applications will appear that were inconceivable before the environment was
created
• What we need to make happen
– Robustness –at least 3-nines of availability and zero data loss
– Security – starting with things like 5 Ways Amazon Web Services Protects Cloud Data
– Privacy – where it begins: Complying to the Higher Standard
20
John Repko -- Pikasoft LLC
Handhelds Make Analytics Available Everywhere
• Why handheld client delivery matters
– There are now more smartphones than client PCs
– More than 25% of users use smartphones for their primary web access
– The future of internet computing is mobile
• What it gives us that is unique and differentiating
– Hadoop is dreadfully mismatched with handheld access (batch, no standard client or
reporting interface)
– Coming in-memory databases (HANA, Vertica, VoltDB) will provide a much-better
mesh with handheld
• What we need to make happen
– Make handheld our primary target UI (design for thumbs, not mice … and more)
– Target do-it-yourself analytics use cases
21
John Repko -- Pikasoft LLC
Real-time Makes Previously Unthinkable Apps Possible
• Why real-time matters
– Users increasingly expect real-time analytics
– The first wave of real-time analytics tools is becoming available
• What it gives us that is unique and differentiating
– "Self-service" analytics
– Intuitive and unconstrained data exploration
– Instant visualization of complex datasets
– Viable plays for a variety of asset types
• Credit card debt, Student load debt, Properties, Insurance, etc.
• What we need to make happen
– If Hadoop – we must evolve to interactive batch execution (or overnight batch, like
Progressive Insurance)
– If In-memory DB– need to select and groom a handheld interface and design for sub-
100ms response times
22
John Repko -- Pikasoft LLC
Beyond Big Data – The Emerging Big Data Tech Platform
23
RDBMS In-Memory RDBMS
On-Premise Distributed Cloud
Structured Data DWs Big Data Universal Data
Batch Hadoop Batch Always
Hindsight Foresight
Lumpenprogramming Today Tomorrow
Report Specialists Data Scientists Everyone
Reports
Data
Warehouses
Big Data
DIY
Analytics
For what?
By whom?
What?
With what?
Stored where?
Processed where?
How?
When?
Here’s Where Our World Is Headed
What Happened? Why Did That Happen? What’s Next?
John Repko -- Pikasoft LLC
The Future: Here’s What The Evolution Looks Like
24
Trend Development Initiatives Who’s Doing It
Big Data • APIs. No one is likely to reach a market with Big Data analytics
fronted by their own UI. Success will come from API links to
• Level 1: REST Access API
• Level 2: Plug-in API
• Level 3: Runtime environment
Open territory! Infochimps has
Level 1, Amazon (Elastic
Mapreduce) has levels 2 and
3. Who else will play???
Cloud • All of the Cloud players are investigating DB-rich offerings
• VoltDB options with AWS High IO option
• “38% of all companies are planning a BI SaaS project before the end of 2013.”
Everybody: Amazon,
Rackspace, Heroku ...
Accenture
In-Memory • Move demo to DAHANA architecture (not hand-coded)
• Select non-HANA in-memory DB (probably VoltDB) as secondary
platform
• Hadoop evolves for a processing platform to an ETL gateway from
unstructured to structured data
• SAP / Hana
• HP / Vertica
• other NewSQL players
Handheld • Evolving UIs with HTML5 + JQuery Mobile
• Reporting platforms increasingly offer mobile interfaces
• Review Big Data interfaces to IPad and Android devices
Two principal camps -- Apple
IOS and Android
Real-Time • Investigate CDN options for Big Data deployment
• Confirm DB performance on buffer pool, locking, latching, recovery
• Design for sub-100ms delivery
Just getting started...
John Repko -- Pikasoft LLC
• Vision:
– Target Audience: Product Executives
– Anticipated Benefit: Keep up with market
leader Amazon, build up-sell and cross-sell
revenue
– Delivered Benefit: Better market
segmentation, enhanced revenue through
“customers who bought xxx also bought...”
recommendations.
– Alternatives: CRM recommendations do
not draw on deep sense of customer intent
– Why It Kills: Provable revenue growth
through A-B testing
25
Today’s Killer Apps: Recommendation Engine For Enhanced Retail Marketing
• How to Implement It:
– Proof of Concept: Small cloud-based recognition
engine, based on readily-available (customer profile,
purchase history) data stores
– Initial Rollout: Still cloud-based, but with broader
streams (e.g. search histories) and dynamic updates
– Test and Customer Acceptance: Pilot program with
configuration from the Initial Rollout, but now tied (on a
limited basis) into retailing process and systems
– Full Rollout: Could be cloud or in-house, but moving to
richer streams and real-time (i.e. in-memory) data access
– Maintenance: Tools updates, streams updates, transition
to real-time data access
Today’s Tools: The Killer Apps
John Repko -- Pikasoft LLC 26
• Vision:
– Target Audience: High end retailers with profitable
service contracts (e.g. computers, cameras, sound
systems)
– Anticipated Benefit: Increase penetration rate of
service contracts by pre-calculating terms in advance of
sale or service renewal
– Delivered Benefit: Reward customer with historically
low service costs, and increase penetration of profitable
service deals by pre-calculation of ideal rates
– Alternatives: Consumers generally know one-size-fits-all
service contracts are overpriced. If you can’t fit the terms
to the customer then you can’t complete the service
contract
– Why It Kills: Big data approach pre-calculates
appropriate terms for all customers in advance of a sales
or renewal transaction
• How to Implement It:
– Proof of Concept: Small cloud-based run with limited data
sets to confirm data adoption approaches and identify most
profitable segments in that sub-population
– Initial Rollout: Still cloud-based, but with larger data sets
and dynamic updates
– Test and Customer Acceptance: Pilot program with
configuration from the Initial Rollout, but now tied (on a limited
basis) promotions and target marketing
– Full Rollout: Could be cloud or in-house, but moving to
larger data stores, real-time (i.e. in-memory) data access and
notifications across the full customer set
– Maintenance: Tools updates, stores updates, transition to
real-time data access and notifications
Today’s Tools: The Killer Apps
Today’s Killer Apps: Analysis and Prediction Engine
John Repko -- Pikasoft LLC 27
• Vision:
– Target Audience: Utilities executives
– Anticipated Benefit: Sell a energy or utilities package
that better fits customer interests and reduces customer
costs while increasing energy/utility margins
– Delivered Benefit: Customer gets a package that
better fits their specific interests (e.g. “green”) and exec
sells higher-margin offerings
– Alternatives: One size plan fits all does not capture
customer interests or delivery high-margin offerings well
– Why It Kills: More customized packages better fit
customer needs while reducing capital expenses and
increasing margins for the utility
• How to Implement It:
– Proof of Concept: Small cloud-based run with limited
data sets to capture basic patterns and confirm data
adoption approaches
– Initial Rollout: Still cloud-based, but with larger data
stores and dynamic updates
– Test and Customer Acceptance: Pilot program with
configuration from the Initial Rollout, but now tied (on a
limited basis) into production logs with reporting
– Full Rollout: Could be cloud or in-house, but moving to
larger data stores, real-time (i.e. in-memory) data access
and notifications
– Maintenance: Tools updates, stores updates, transition
to real-time data access and notifications
Today’s Tools: The Killer Apps
Today’s Killer Apps: Log Analysis Engine
John Repko -- Pikasoft LLC
This Is Only The Beginning. With A Standard
Platform We’ll See Richer Big Data
Discoveries Become Routine
The Solution Tools (Slide 13) Become
Straightforward if We Run Them on a
Standard Architecture
“One man’s noise is another man’s data.”
~ Bill Stensrud - InstantEncore
29
Summary
John Repko -- Pikasoft LLC
• John Repko: john.repko@pikasoft.com - (720) 624-6025
30
Contacts
https://pikasoft.s3.amazonaws.com/Using_Big_Data_To_Your_Advantage.ppt

Contenu connexe

Tendances

Knowledge Graphs for a Connected World - AI, Deep & Machine Learning Meetup
Knowledge Graphs for a Connected World - AI, Deep & Machine Learning MeetupKnowledge Graphs for a Connected World - AI, Deep & Machine Learning Meetup
Knowledge Graphs for a Connected World - AI, Deep & Machine Learning MeetupBenjamin Nussbaum
 
How to use your data science team: Becoming a data-driven organization
How to use your data science team: Becoming a data-driven organizationHow to use your data science team: Becoming a data-driven organization
How to use your data science team: Becoming a data-driven organizationYael Garten
 
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.Jennifer Walker
 
GGV Capital: Venture Investing and the Cloud (2012)
GGV Capital: Venture Investing and the Cloud (2012)GGV Capital: Venture Investing and the Cloud (2012)
GGV Capital: Venture Investing and the Cloud (2012)GGV Capital
 
The Value of Pervasive Analytics
The Value of Pervasive AnalyticsThe Value of Pervasive Analytics
The Value of Pervasive AnalyticsCloudera, Inc.
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data scienceVipul Kalamkar
 
Loras College 2016 Business Analytics Symposium Keynote
Loras College 2016 Business Analytics Symposium KeynoteLoras College 2016 Business Analytics Symposium Keynote
Loras College 2016 Business Analytics Symposium KeynoteRich Clayton
 
Pervasive Analytics Gets Real
Pervasive Analytics Gets RealPervasive Analytics Gets Real
Pervasive Analytics Gets RealCloudera, Inc.
 
HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016Andrey Karpov
 
Big Data Decision-Making
Big Data Decision-MakingBig Data Decision-Making
Big Data Decision-MakingTeradata Aster
 
Module 6 The Future of Big and Smart Data- Online
Module 6 The Future of Big and Smart Data- Online Module 6 The Future of Big and Smart Data- Online
Module 6 The Future of Big and Smart Data- Online caniceconsulting
 
Impact of big data on analytics
Impact of big data on analyticsImpact of big data on analytics
Impact of big data on analyticsCapgemini
 
Big Data: The Force That’s Good for Consumers and Society
Big Data: The Force That’s Good for Consumers and SocietyBig Data: The Force That’s Good for Consumers and Society
Big Data: The Force That’s Good for Consumers and SocietyExperian_US
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongMarTech Conference
 
Cutting Edge Predictive Analytics with Eric Siegel
Cutting Edge Predictive Analytics with Eric Siegel   Cutting Edge Predictive Analytics with Eric Siegel
Cutting Edge Predictive Analytics with Eric Siegel Databricks
 
A Journey into bringing (Artificial) Intelligence to the Enterprise
A Journey into bringing (Artificial) Intelligence to the EnterpriseA Journey into bringing (Artificial) Intelligence to the Enterprise
A Journey into bringing (Artificial) Intelligence to the EnterprisePatrick Deglon
 
Emcien overview v6 01282013
Emcien overview v6 01282013Emcien overview v6 01282013
Emcien overview v6 01282013WCJones6348
 

Tendances (20)

Knowledge Graphs for a Connected World - AI, Deep & Machine Learning Meetup
Knowledge Graphs for a Connected World - AI, Deep & Machine Learning MeetupKnowledge Graphs for a Connected World - AI, Deep & Machine Learning Meetup
Knowledge Graphs for a Connected World - AI, Deep & Machine Learning Meetup
 
Big data basics
Big data basicsBig data basics
Big data basics
 
How to use your data science team: Becoming a data-driven organization
How to use your data science team: Becoming a data-driven organizationHow to use your data science team: Becoming a data-driven organization
How to use your data science team: Becoming a data-driven organization
 
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
 
GGV Capital: Venture Investing and the Cloud (2012)
GGV Capital: Venture Investing and the Cloud (2012)GGV Capital: Venture Investing and the Cloud (2012)
GGV Capital: Venture Investing and the Cloud (2012)
 
The Value of Pervasive Analytics
The Value of Pervasive AnalyticsThe Value of Pervasive Analytics
The Value of Pervasive Analytics
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 
Loras College 2016 Business Analytics Symposium Keynote
Loras College 2016 Business Analytics Symposium KeynoteLoras College 2016 Business Analytics Symposium Keynote
Loras College 2016 Business Analytics Symposium Keynote
 
Pervasive Analytics Gets Real
Pervasive Analytics Gets RealPervasive Analytics Gets Real
Pervasive Analytics Gets Real
 
HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016
 
Big Data Decision-Making
Big Data Decision-MakingBig Data Decision-Making
Big Data Decision-Making
 
Module 6 The Future of Big and Smart Data- Online
Module 6 The Future of Big and Smart Data- Online Module 6 The Future of Big and Smart Data- Online
Module 6 The Future of Big and Smart Data- Online
 
Impact of big data on analytics
Impact of big data on analyticsImpact of big data on analytics
Impact of big data on analytics
 
Big Data: The Force That’s Good for Consumers and Society
Big Data: The Force That’s Good for Consumers and SocietyBig Data: The Force That’s Good for Consumers and Society
Big Data: The Force That’s Good for Consumers and Society
 
The 25 Predictions About The Future Of Big Data
The 25 Predictions About The Future Of Big DataThe 25 Predictions About The Future Of Big Data
The 25 Predictions About The Future Of Big Data
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin Strong
 
Big Data: How does it fit in your data strategy?
Big Data: How does it fit in your data strategy?Big Data: How does it fit in your data strategy?
Big Data: How does it fit in your data strategy?
 
Cutting Edge Predictive Analytics with Eric Siegel
Cutting Edge Predictive Analytics with Eric Siegel   Cutting Edge Predictive Analytics with Eric Siegel
Cutting Edge Predictive Analytics with Eric Siegel
 
A Journey into bringing (Artificial) Intelligence to the Enterprise
A Journey into bringing (Artificial) Intelligence to the EnterpriseA Journey into bringing (Artificial) Intelligence to the Enterprise
A Journey into bringing (Artificial) Intelligence to the Enterprise
 
Emcien overview v6 01282013
Emcien overview v6 01282013Emcien overview v6 01282013
Emcien overview v6 01282013
 

Similaire à Using big data_to_your_advantage

Ruby, rails, no sql and big data
Ruby, rails, no sql and big dataRuby, rails, no sql and big data
Ruby, rails, no sql and big dataJohn Repko
 
TOP Business Intelligence Predictions for 2015
TOP Business Intelligence Predictions for 2015TOP Business Intelligence Predictions for 2015
TOP Business Intelligence Predictions for 2015Panorama Software
 
How Startups can leverage big data?
How Startups can leverage big data?How Startups can leverage big data?
How Startups can leverage big data?Rackspace
 
Top Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama SoftwareTop Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama SoftwarePanorama Software
 
Big data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersBig data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersRuhollah Farchtchi
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017SingleStore
 
Big Data Meetup by Chad Richeson
Big Data Meetup by Chad RichesonBig Data Meetup by Chad Richeson
Big Data Meetup by Chad RichesonSocietyConsulting
 
How to make your data scientists happy
How to make your data scientists happy How to make your data scientists happy
How to make your data scientists happy Hussain Sultan
 
What is the future of data strategy?
What is the future of data strategy?What is the future of data strategy?
What is the future of data strategy?Denodo
 
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubEnable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubCloudera, Inc.
 
final oracle presentation
final oracle presentationfinal oracle presentation
final oracle presentationPriyesh Patel
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieSunil Ranka
 
Webinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud AnalyticsWebinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud AnalyticsSnapLogic
 
What_BigData_means_to_your_organization
What_BigData_means_to_your_organizationWhat_BigData_means_to_your_organization
What_BigData_means_to_your_organizationAttila Barta
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataPrakalp Agarwal
 
big data analytics pgpmx2015
big data analytics pgpmx2015big data analytics pgpmx2015
big data analytics pgpmx2015Sanmeet Dhokay
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalIIIT Allahabad
 

Similaire à Using big data_to_your_advantage (20)

Ruby, rails, no sql and big data
Ruby, rails, no sql and big dataRuby, rails, no sql and big data
Ruby, rails, no sql and big data
 
TOP Business Intelligence Predictions for 2015
TOP Business Intelligence Predictions for 2015TOP Business Intelligence Predictions for 2015
TOP Business Intelligence Predictions for 2015
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
How Startups can leverage big data?
How Startups can leverage big data?How Startups can leverage big data?
How Startups can leverage big data?
 
Top Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama SoftwareTop Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama Software
 
Big data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersBig data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makers
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
A6 big data_in_the_cloud
A6 big data_in_the_cloudA6 big data_in_the_cloud
A6 big data_in_the_cloud
 
Big Data Meetup by Chad Richeson
Big Data Meetup by Chad RichesonBig Data Meetup by Chad Richeson
Big Data Meetup by Chad Richeson
 
How to make your data scientists happy
How to make your data scientists happy How to make your data scientists happy
How to make your data scientists happy
 
What is the future of data strategy?
What is the future of data strategy?What is the future of data strategy?
What is the future of data strategy?
 
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubEnable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
 
final oracle presentation
final oracle presentationfinal oracle presentation
final oracle presentation
 
Big data in telecom
Big data in telecomBig data in telecom
Big data in telecom
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A Lie
 
Webinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud AnalyticsWebinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud Analytics
 
What_BigData_means_to_your_organization
What_BigData_means_to_your_organizationWhat_BigData_means_to_your_organization
What_BigData_means_to_your_organization
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
big data analytics pgpmx2015
big data analytics pgpmx2015big data analytics pgpmx2015
big data analytics pgpmx2015
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 

Dernier

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Dernier (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

Using big data_to_your_advantage

  • 1. John Repko -- Pikasoft LLC Using Big Data to your Advantage It’s not just about toy elephants anymore… March 19, 2013 John Repko – john.repko@pikasoft.com Source: http://blog.questionpro.com/2012/12/24/market-research-trends-2013-big-data/
  • 2. John Repko -- Pikasoft LLC Big Data Is Not Just About “Big” Data … It’s About FAST Data! (http://www.pikasoft.com/journal/2011/5/13/not-big-data-fast-data.html) 2 Source: http://www.startribune.com/sports/164830346.htmlSource: https://thedailyload.files.wordpress.com/2010/12/william_perry.jpg So How Did We Get to Big Data Anyway?
  • 3. John Repko -- Pikasoft LLC There Are Big Data Breakthroughs Everywhere… 3 I’ve Heard About Big Data Successes… “Watson” Wins on Jeopardy Google Wins the Search Market Progressive’s Instant “Overnight” rate quotes Beat the best Jeopardy players ever Massively parallel web searches with results back in a tenth of a second Progressive creates an insurance quote for every car and truck in the US – every night
  • 4. John Repko -- Pikasoft LLC How Can I Determine If These Big Data Wins Apply to My Business? 4 • Where do I put the data? • How do I load the system? • How do I find the value in the data? • How do I present it? • How long is this going to take? • How much is this going to cost? You Need A Proven Approach to Finding the Value in Your Data Source: http://www.beingjavaguys.com/2013/01/what-is-big-data-introduction-and.html
  • 5. John Repko -- Pikasoft LLC The Key is to Recognize That There IS a Pattern to Big Data Wins • Foresight – We are presented a pattern – What has the outcome been when we’ve seen similar patterns in the past? • Hindsight – We are presented an outcome -- What pattern of events anticipated the outcome in the past? 5 The Variety Of Big Data Wins In The Press Fall Into Just Two Solution Patterns We Don’t Need Dozens Of Solution Approaches For Big Data – Just Two
  • 6. John Repko -- Pikasoft LLC Big Data Wins – Not “10 Problems” But Only 2 6 1. Modeling True Risk • What past patterns led to success or default? 1. Customer Churn Analysis • What do customer churn patterns predict about our products and markets? 1. Recommendation Engine • We have search terms – what have the results been from similar searches in the past? 1. Ad Targeting • We have profile information – what offers have led to sales for similar profiles in the past? 1. PoS Transaction Analysis • We have your purchase history – what deals might we offer in the future? Summary – 10 Common Hadoop-able Problems* Foresight Hindsight In This Light, Let’s Take A Look At The “10 Hadoop-able Problems” * http://info.cloudera.com/TenCommonHadoopableProblemsWhitePaper.html
  • 7. John Repko -- Pikasoft LLC Big Data Wins – Not “10 Problems” But Only 2 7 6. Analyzing Data Logs to Forecast Events • We have your logs – what pattern of events have anticipated failures before? 7. Threat Analysis • We have a specific event – what results have we seen from similar threats in the past? 8. Trade Surveillance • Does this parcel raise any alarms, based on our history of past parcel-tracking? 9. Search Quality • We have a set of search terms – what have similar searches succeeded in finding in the past? • Data “Sandbox” • We have your data, possibly unstructured data. What patterns in that data might we bring to your attention now? These Two Solution Types Apply Generally To The Hadoop-able Problems Summary – 10 Common Hadoop-able Problems* Foresight Hindsight
  • 8. John Repko -- Pikasoft LLC Data Warehouse Advanced Analytics Is Expensive and Generally Restricted To Structured Data • According to Gartner, Enterprise Data will grow 650% by 2014. 85% of these data will be “unstructured data”, with a CAGR of 62% per year, far larger than transactional data • Growth is taking place in areas not well served by RDBMS’s and DW’s 8 Source http://www.vertica.com/writable/knowledge_articles/file/bi_vertica.pdf: http://thecloudtutorial.com/hadoop-tutorial.html Structured: Managed by RDBMS & DW Unstructured: Growth Areas Not Managed well by RDBMS or DW
  • 9. John Repko -- Pikasoft LLC The Tremendous Growth Of Data Is In Unstructured Data That Is Best Managed Outside The RDBMS 9 Structured: Managed by RDBMS or DW Unstructured: Not Managed by RDBMS or DW
  • 10. John Repko -- Pikasoft LLC The New Areas Of Non-RDBMS Managed Data Are Rich In Business Value And Are Ripe For Analysis 10 Structured: Managed by RDBMS Unstructured: Not Managed by RDBMS or DW
  • 11. John Repko -- Pikasoft LLC Big Data Stores Are Increasingly Architected With Open-Source Tools 11 Data Integration Tools which extract, transform, and load data between Relational and Non-Relational datasets. NoSQL Data Store Datasets structured as columnar, key-value, or document-based in order to overcome limitations in traditional relational modeling for ‘Big’ datasets. Map Reduce Languages Higher-level wrapper languages which simplify Map Reduce development efforts. Map Reduce Engine Cloud MapReduce Processes (‘Map’ and ‘Reduce’ functions) which analyze very large datasets across distributed systems.
  • 12. John Repko -- Pikasoft LLC You Have Data. Here’s What You Need to Unlock It • Load the data in a system equipped with the tools to analyze it – Via a standard interface, or – Programmatically • Determine valid relationships in the data • Analyze the data for these common patterns • Tune the analytics • Visualize the results • Pursue the patterns that emerge 12 • The system has to live where the data lives (otherwise transmission costs become prohibitive) • REST or SOAP are the most common interfaces • Bloom Filters can provide set operations in large data sets • ORM (Object-Relational Management) simplifies data access • Hadoop provides parallelized analysis for unstructured data • Starfish provides automatic analytics tuning for Hadoop • Structured data can be analyzed via statistical analysis (for numbers) or free-text search (for text) • Solution patterns can be applied automatically once the data is sandboxed • Visualization can help to grasp the key patterns and results Needs Requirements The Right Platform Can Meet All Of These Requirements
  • 13. John Repko -- Pikasoft LLC Additional Tools: With a Platform for Big Data, We Can Expand Our Analysis with Rich Analytics Tools 13 1. Predictive Modeling 2. Data Visualization 3. Cluster Partitioning Key Big Data Analytics Solution Patterns 4. Outlier Analysis 5. AB Testing 6. Markov Chains These Patterns Provide Straightforward Way to Finding Big Data Wins – Here’s How Source: http://www.cognizant.com/InsightsCognizantiarticles/Cognizanti_Sow'sEar_Analytics.pdf
  • 14. John Repko -- Pikasoft LLC Big Data And Classic Analysis Patterns Are Creating A New Class Of Enterprise Applications 14 Data Sources Data Processing Data Presentation Google Chart Tools Public Data Sets on AWS These Offerings Emerged In The Consumer Domain And Enterprise Users Are Coming To Have Similar Expectations
  • 15. John Repko -- Pikasoft LLC But New Applications Will Remain Just Curiosities, “One-Offs” Unless The Underlying Patterns Are Drawn Out • There’s Nothing New Here: Hadoop is Turing-complete, as are most general-purpose processing and analytics packages • To provide richer insights, tools like Hadoop need more advanced processing patterns: Basic Patterns Filtering | Parsing | Counting/Summing | Collating | Sorting | Distributed Tasks | Chained Jobs Advanced Patterns Distinct | Group By | Secondary Sorts | Joins | Distributed Sorting Leading-Edge Work Classification | Clustering | Regression | Dimension Reduction | Evolutionary Code 15 To See More Advanced Patterns and Richer Presentation, The Basic Patterns Must First Become Routine
  • 16. John Repko -- Pikasoft LLC Software Will Capture the Value of Intellectual Property 17 2012 Internet Company Valuations as %Revenue • Pure services companies generally yield a company valuation of 0.5 to 1.0x Annual Revenue • Recurring revenue businesses (hosting, support) typically generate 2.5 – 4.0x Revenue • Product businesses derive their multiples from: growth, product margin, network effects, customer lock-in, and ecosystem effects) – with a good product, valuations of > 5X Revenue are possible http://abovethecrowd.com/wp-content/uploads/2011/05/pr_mults.png
  • 17. John Repko -- Pikasoft LLC Capturing Trends – Where Is the IT Industry Headed? 18 IT Product Breakthroughs Happen When Technology Advances Invalidate “Old” Product Assumptions. Here Are The Principal Areas Where Old Assumptions Will Be Obsoleted. • 5 major trends – Big Data: Big Data Just Beginning to Explode – Cloud: Cloud Computing Market Size – Facts and Trends – In-Memory: The Coming In-Memory Database Tipping Point – Handheld: Five Emerging Trends in Analytics – Real-time: Using Analytics to Create a Sense-and-Respond Organization
  • 18. John Repko -- Pikasoft LLC Capturing Trends – Why Bother? Who Cares? • Big Data: – According to Michael Stonebraker and Jeremy Kepner the future of Hadoop is doomed – According to Mike Miller of Cloudant the days are numbered for Hadoop as we know it • Cloud: – Even PCI and HIPAA data is evolving into cloud-hosted models • In-Memory: – Spinning disk is "the new tape" (overflow, recovery) • Handheld: – Mobile Internet devices will outnumber humans this year, Cisco predicts • Real-time: – Future of computing technology belongs to handheld devices 19 “You can’t just ask customers what they want and then try to give that to them. By the time you get it built, they’ll want something new. It took us three years to build the NeXT computer. If we’d given customers what they said they wanted, we’d have built a computer they’d have been happy with a year after we spoke to them — not something they’d want now.” ~ Steve Jobs
  • 19. John Repko -- Pikasoft LLC The Cloud Provides a Platform For Do It Yourself Analytics • Why the cloud matters – Analytics cannot be “do it yourself” until everyone has access to a platform suitable for holding and processing Big Data. – Only the cloud has the scale, speed, and availability to process Big Data universally • What it gives us that is unique and differentiating – Big Data projects today are 1) expensive, 2) long lead-time, and 3) run on masses of local hardware. With inevitable commoditization this has to change. – The trend is to “do it yourself” analytics – if we build the ability to give do it yourself analytics, applications will appear that were inconceivable before the environment was created • What we need to make happen – Robustness –at least 3-nines of availability and zero data loss – Security – starting with things like 5 Ways Amazon Web Services Protects Cloud Data – Privacy – where it begins: Complying to the Higher Standard 20
  • 20. John Repko -- Pikasoft LLC Handhelds Make Analytics Available Everywhere • Why handheld client delivery matters – There are now more smartphones than client PCs – More than 25% of users use smartphones for their primary web access – The future of internet computing is mobile • What it gives us that is unique and differentiating – Hadoop is dreadfully mismatched with handheld access (batch, no standard client or reporting interface) – Coming in-memory databases (HANA, Vertica, VoltDB) will provide a much-better mesh with handheld • What we need to make happen – Make handheld our primary target UI (design for thumbs, not mice … and more) – Target do-it-yourself analytics use cases 21
  • 21. John Repko -- Pikasoft LLC Real-time Makes Previously Unthinkable Apps Possible • Why real-time matters – Users increasingly expect real-time analytics – The first wave of real-time analytics tools is becoming available • What it gives us that is unique and differentiating – "Self-service" analytics – Intuitive and unconstrained data exploration – Instant visualization of complex datasets – Viable plays for a variety of asset types • Credit card debt, Student load debt, Properties, Insurance, etc. • What we need to make happen – If Hadoop – we must evolve to interactive batch execution (or overnight batch, like Progressive Insurance) – If In-memory DB– need to select and groom a handheld interface and design for sub- 100ms response times 22
  • 22. John Repko -- Pikasoft LLC Beyond Big Data – The Emerging Big Data Tech Platform 23 RDBMS In-Memory RDBMS On-Premise Distributed Cloud Structured Data DWs Big Data Universal Data Batch Hadoop Batch Always Hindsight Foresight Lumpenprogramming Today Tomorrow Report Specialists Data Scientists Everyone Reports Data Warehouses Big Data DIY Analytics For what? By whom? What? With what? Stored where? Processed where? How? When? Here’s Where Our World Is Headed What Happened? Why Did That Happen? What’s Next?
  • 23. John Repko -- Pikasoft LLC The Future: Here’s What The Evolution Looks Like 24 Trend Development Initiatives Who’s Doing It Big Data • APIs. No one is likely to reach a market with Big Data analytics fronted by their own UI. Success will come from API links to • Level 1: REST Access API • Level 2: Plug-in API • Level 3: Runtime environment Open territory! Infochimps has Level 1, Amazon (Elastic Mapreduce) has levels 2 and 3. Who else will play??? Cloud • All of the Cloud players are investigating DB-rich offerings • VoltDB options with AWS High IO option • “38% of all companies are planning a BI SaaS project before the end of 2013.” Everybody: Amazon, Rackspace, Heroku ... Accenture In-Memory • Move demo to DAHANA architecture (not hand-coded) • Select non-HANA in-memory DB (probably VoltDB) as secondary platform • Hadoop evolves for a processing platform to an ETL gateway from unstructured to structured data • SAP / Hana • HP / Vertica • other NewSQL players Handheld • Evolving UIs with HTML5 + JQuery Mobile • Reporting platforms increasingly offer mobile interfaces • Review Big Data interfaces to IPad and Android devices Two principal camps -- Apple IOS and Android Real-Time • Investigate CDN options for Big Data deployment • Confirm DB performance on buffer pool, locking, latching, recovery • Design for sub-100ms delivery Just getting started...
  • 24. John Repko -- Pikasoft LLC • Vision: – Target Audience: Product Executives – Anticipated Benefit: Keep up with market leader Amazon, build up-sell and cross-sell revenue – Delivered Benefit: Better market segmentation, enhanced revenue through “customers who bought xxx also bought...” recommendations. – Alternatives: CRM recommendations do not draw on deep sense of customer intent – Why It Kills: Provable revenue growth through A-B testing 25 Today’s Killer Apps: Recommendation Engine For Enhanced Retail Marketing • How to Implement It: – Proof of Concept: Small cloud-based recognition engine, based on readily-available (customer profile, purchase history) data stores – Initial Rollout: Still cloud-based, but with broader streams (e.g. search histories) and dynamic updates – Test and Customer Acceptance: Pilot program with configuration from the Initial Rollout, but now tied (on a limited basis) into retailing process and systems – Full Rollout: Could be cloud or in-house, but moving to richer streams and real-time (i.e. in-memory) data access – Maintenance: Tools updates, streams updates, transition to real-time data access Today’s Tools: The Killer Apps
  • 25. John Repko -- Pikasoft LLC 26 • Vision: – Target Audience: High end retailers with profitable service contracts (e.g. computers, cameras, sound systems) – Anticipated Benefit: Increase penetration rate of service contracts by pre-calculating terms in advance of sale or service renewal – Delivered Benefit: Reward customer with historically low service costs, and increase penetration of profitable service deals by pre-calculation of ideal rates – Alternatives: Consumers generally know one-size-fits-all service contracts are overpriced. If you can’t fit the terms to the customer then you can’t complete the service contract – Why It Kills: Big data approach pre-calculates appropriate terms for all customers in advance of a sales or renewal transaction • How to Implement It: – Proof of Concept: Small cloud-based run with limited data sets to confirm data adoption approaches and identify most profitable segments in that sub-population – Initial Rollout: Still cloud-based, but with larger data sets and dynamic updates – Test and Customer Acceptance: Pilot program with configuration from the Initial Rollout, but now tied (on a limited basis) promotions and target marketing – Full Rollout: Could be cloud or in-house, but moving to larger data stores, real-time (i.e. in-memory) data access and notifications across the full customer set – Maintenance: Tools updates, stores updates, transition to real-time data access and notifications Today’s Tools: The Killer Apps Today’s Killer Apps: Analysis and Prediction Engine
  • 26. John Repko -- Pikasoft LLC 27 • Vision: – Target Audience: Utilities executives – Anticipated Benefit: Sell a energy or utilities package that better fits customer interests and reduces customer costs while increasing energy/utility margins – Delivered Benefit: Customer gets a package that better fits their specific interests (e.g. “green”) and exec sells higher-margin offerings – Alternatives: One size plan fits all does not capture customer interests or delivery high-margin offerings well – Why It Kills: More customized packages better fit customer needs while reducing capital expenses and increasing margins for the utility • How to Implement It: – Proof of Concept: Small cloud-based run with limited data sets to capture basic patterns and confirm data adoption approaches – Initial Rollout: Still cloud-based, but with larger data stores and dynamic updates – Test and Customer Acceptance: Pilot program with configuration from the Initial Rollout, but now tied (on a limited basis) into production logs with reporting – Full Rollout: Could be cloud or in-house, but moving to larger data stores, real-time (i.e. in-memory) data access and notifications – Maintenance: Tools updates, stores updates, transition to real-time data access and notifications Today’s Tools: The Killer Apps Today’s Killer Apps: Log Analysis Engine
  • 27. John Repko -- Pikasoft LLC This Is Only The Beginning. With A Standard Platform We’ll See Richer Big Data Discoveries Become Routine The Solution Tools (Slide 13) Become Straightforward if We Run Them on a Standard Architecture “One man’s noise is another man’s data.” ~ Bill Stensrud - InstantEncore 29 Summary
  • 28. John Repko -- Pikasoft LLC • John Repko: john.repko@pikasoft.com - (720) 624-6025 30 Contacts https://pikasoft.s3.amazonaws.com/Using_Big_Data_To_Your_Advantage.ppt