SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
random notes on big data
Chen Peng, Jianqiang Wang, Yang Huang
April 19, 2013
What is big data
● Volume: Gigabytes-
>Terabytes -
>Petabytes.
● Velocity: time
sensitive, streaming,
real-time.
Jet engine: 20TB/hr
GE: (minds + machines)
● Variety:
structured/unstructur
ed.
● Value: insights,
analytical systems.
Challenges: collect, store, organize, analyze and share
External
> web sites (blogs/reviews)
> social media (Facebook, LinkedIn,
Google+, Twitter)
> images and videos
> ...
Internal
> transactions
> server logs
> machines and sensors
> emails
> ...
Variety
Value Hierarchy
Raw Data
Normalized
Insight
Recommendation
Transact
Data is now a strategic asset
Technology stack & corresponding
firms
Google
App Engine
Google
BigQuery
Scalable
application
development and
execution
environment
Google
Compute Engine
Virtual machines
Run arbitrary workloads
at scale
(e.g. Hadoop, scientific
computing)
Google Cloud Platform
Google
Cloud Storage
Storage
Connecting glue between
each step of the data
pipeline
Data analysis
Querying large datasets
+ third party apps for
visualization (e.g.
Tableau)
Big data analytics
Analytics is
The scientific process of transforming data into
insights for making better decisions.
Data Insight Decision
IT logs, cloud,
social media,
sensors,
experiments,
etc.
statistical &
operations research
modeling
judgement,
constraints,
intuition
"resource" "product" "goal"
Predictive analytics extracts information from data and
use it to predict future trends and behavior patterns.
regression models
discrete choice models
time series models
classification models (decision tree, random forest, support vector machine,
neural network, etc.)
clustering models (k-means, density based, graph based, etc.)
association analysis
...
Big data analytics
Descriptive Analytics
Predictive Analytics
Prescriptive Analytics
Always keep in mind...
> business objectives are the origin of every data mining solution
> data preparation is more than half of the data mining process
> all patterns are subject to change
> there will always be new knowledge
Always pause and ask yourself:
Does this work relate to the business question we try to answer?
Is the original business question still valid?
Industry Use-cases/Application
Healthcare Drug development
Patient monitoring
Electronic Medical Records
Utilities Smart grid optimization
Retail &
marketing
Customer loyalty and churn analysis
Targeted product and services offerings
Product sentiment analysis
Marketing campaign optimization
Financial
services
Fraud detection & prevention
Anti-money laundering
Telecom Customer churn mitigation
Geospatial analytics
Call data record (CDR) analysis
Use cases by industry
Industry applications of big data
analytics
Customer acquisition
predict customers' buying habits in order to promote relevant products at
multiple touch points.
http://www.youtube.com/watch?feature=player_embedded&v=3WspJ16Ubhw
Clinical decision support
Experts use predictive analysis in health care primarily to determine which
patients are at risk of developing certain conditions, like diabetes, asthma, heart
disease, and other lifetime illnesses.
Cross sale
predictive analytics can help analyze customers' spending, usage and other
behavior, leading to efficient cross sales, or selling additional products to
current customers (beer & diaper)
Ads targeting
http://www.slideshare.net/dennyglee/yahoo-tao-case-study-excerpt
Fraud detection
A predictive model can help weed out the "bads" and reduce a business's
exposure to fraud.
Image and Speech Recognition
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.
com/en/us/people/jeff/MIT_BigData_Sep2012.pdf
Operations
Jet Engine + Humans
http://www.youtube.com/watch?v=JHc4ZTTWKrQ
Industry applications of big data
analytics
Amazon wareouse operational efficiency: http://www.youtube.com/watch?
v=Kafs9tZskuo
Beer and diaper
What are those startups doing?
Bloomreach
http://www.youtube.com/watch?feature=player_embedded&v=K12awAj4tW8
Datastax
http://www.nytimes.com/2013/02/25/business/media/for-house-of-cards-using-big-data-to-guarantee-
its-popularity.html?pagewanted=all
Paraccel
http://www.paraccel.com/solutions/paraccel-solutions-big-data.php#.UXG207WG3Ct
Kaggle
http://www.kaggle.com/c/acm-sf-chapter-hackathon-big
VC funding for "Big Data"
Data from 71 start-ups. Funding is
counted starting from 2004.
VC Funding Activity
Data from 71 start-ups. Funding is
counted starting from 2004.
Interesting view points
" Special (domain) knowledge becomes less relevant;
organizations should focus on collecting people who know
how to extract value and insights from data."
" In god we trust. All others must bring data."
" The usefulness of a variable in a model is inversely
related to the time you spend creating it."
"Noise is convex but information is concave."
"Big data is sexy but small data is beautiful."
noise
information
data size
Interesting view points
"All models are wrong, but some are useful."
"Big data is like teenage sex: everyone talks about it,
nobody really knows how to do it; everyone thinks everyone
else is doing it, so they claim they are doing it."
"Statistics: The Art and Science of Learning from Data"
The danger of big data
Open discussion
Potential opportunities / challenges for
entrepreneurs?
- visualization
- internet of things
- analytics as a service (a3
s)
Standardization v.s. customization
Human and data interaction
- data v.s intuition
Back-Up Slides
Data Science v.s. OR
risk management strategic planning
predictive analytics optimization
Risk
Measurable of Objective
skill sets of data scientists
Big data types
● Web & social media: clickstream, web content,
amazon reviews, facebook postings & 'like'...
● M2M:smart meters, oil rig sensor reading, GPS
signals...
● Transaction:retail store, healthcare claims, utility
billing...
● Biometrics:fingerprint, face, voice, handwriting..
● Human-generated data:call logs, emails, surveys...
Web & social media
● Transaction: orders, revenue,
● Conversion: click thru, convert to
purchase,...
● Session: length, bounce rate
● Lifetime value: repeat, frequency,...
● Social interaction: intensity,
influence,...
Shopping cart analysis
CTR prediction
Personalization
Retention/customer
churn
A/B testing
Targeted ads
Lifetime value
Interesting data visualization
projects
wind map
http://hint.fm/wind/gallery/oct-30.js.html
Some analytical problems people
deal with at Google ...
● search ranking
Processing Pipeline
Hadoop
MapReduce
log
sensor
web
...
Structured
Data
Note: Hadoop -- an open-source software framework that supports data-intensive distributed
applications, licensed under the Apache v2 license. It supports the running of applications on large
clusters of commodity hardware. Orginated from Google MapReduce and further developed/promoted by
Yahoo.
SQL
HIVE
Dremel ...
Analytics
Big Data
Cloud
Computing
http://www.forbes.com/sites/davefeinleib/2012/06/19/the-big-data-landscape/
How big is big?
When your data set becomes so large that you have to
start innovating around how to collect, store, organize,
analyze and share it ...
External
> web sites (blogs/reviews)
> social media (Facebook,
LinkedIn, Google+, Twitter)
> images and videos
> ...
Internal
> transactions
> server logs
> machines and sensors
> emails
> ...
Health
care
Sentiment
analysis
Patient
monitoring
Genetic
Testing
Electronic
Medical
Records
Utilities Smart
Meters
Retail Loyalty
programs
RFID tags Recommenda
tion, market
basket
Face
recognition
Telcos Customer
churn
Location-
based
IT Machine
log
Web &
Social
media
M2M Transaction Biometrics Human-
generat
ed
Example of semantic graph
Call Data Record
What is Hadoop

Contenu connexe

Tendances

Big Data Meetup by Chad Richeson
Big Data Meetup by Chad RichesonBig Data Meetup by Chad Richeson
Big Data Meetup by Chad Richeson
SocietyConsulting
 
Financial services use cases
Financial services use casesFinancial services use cases
Financial services use cases
Erni Susanti
 
ATPI Expert Insight Analytics
ATPI Expert Insight AnalyticsATPI Expert Insight Analytics
ATPI Expert Insight Analytics
Carlos Padilla
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013
Brian Crotty
 
Machine learning with sabyasachi upadhya
Machine learning with sabyasachi upadhyaMachine learning with sabyasachi upadhya
Machine learning with sabyasachi upadhya
AnthonyBennet
 

Tendances (20)

Marketing analytics for the Banking Industry
Marketing analytics for the Banking IndustryMarketing analytics for the Banking Industry
Marketing analytics for the Banking Industry
 
Big Data Meetup by Chad Richeson
Big Data Meetup by Chad RichesonBig Data Meetup by Chad Richeson
Big Data Meetup by Chad Richeson
 
Big Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceBig Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-Commerce
 
Predictive analytics km chicago
Predictive analytics km chicagoPredictive analytics km chicago
Predictive analytics km chicago
 
Financial services use cases
Financial services use casesFinancial services use cases
Financial services use cases
 
ATPI Expert Insight Analytics
ATPI Expert Insight AnalyticsATPI Expert Insight Analytics
ATPI Expert Insight Analytics
 
Analystics in banking and financial services
Analystics in banking and financial servicesAnalystics in banking and financial services
Analystics in banking and financial services
 
Vendor strategies: Operational Business Intelligence for Agile Enterprises
Vendor strategies: Operational Business Intelligence for Agile EnterprisesVendor strategies: Operational Business Intelligence for Agile Enterprises
Vendor strategies: Operational Business Intelligence for Agile Enterprises
 
Big Data in Retail - Examples in Action
Big Data in Retail - Examples in ActionBig Data in Retail - Examples in Action
Big Data in Retail - Examples in Action
 
Hidden security and privacy consequences around mobility (Infosec 2013)
Hidden security and privacy consequences around mobility (Infosec 2013)Hidden security and privacy consequences around mobility (Infosec 2013)
Hidden security and privacy consequences around mobility (Infosec 2013)
 
Predictive Analytics, Contextual Computing, and Big Data
Predictive Analytics, Contextual Computing, and  Big DataPredictive Analytics, Contextual Computing, and  Big Data
Predictive Analytics, Contextual Computing, and Big Data
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
Big data & analytics for banking new york lars hamberg
Big data & analytics for banking new york   lars hambergBig data & analytics for banking new york   lars hamberg
Big data & analytics for banking new york lars hamberg
 
Data Mining in Retail Industries
Data Mining in Retail IndustriesData Mining in Retail Industries
Data Mining in Retail Industries
 
13 pv-do es-18-bigdata-v3
13 pv-do es-18-bigdata-v313 pv-do es-18-bigdata-v3
13 pv-do es-18-bigdata-v3
 
Big data Business Use Cases
Big data  Business Use CasesBig data  Business Use Cases
Big data Business Use Cases
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013
 
Rulex big data and analytics
Rulex big data and analyticsRulex big data and analytics
Rulex big data and analytics
 
Machine learning with sabyasachi upadhya
Machine learning with sabyasachi upadhyaMachine learning with sabyasachi upadhya
Machine learning with sabyasachi upadhya
 
Data mining & data warehousing
Data mining & data warehousingData mining & data warehousing
Data mining & data warehousing
 

En vedette

Deploying private cloud with eucalyptus
Deploying private cloud with eucalyptusDeploying private cloud with eucalyptus
Deploying private cloud with eucalyptus
Beny Raja
 
Eucalyptus - An Open-source Infrastructure for Cloud Computing
Eucalyptus - An Open-source Infrastructure for Cloud ComputingEucalyptus - An Open-source Infrastructure for Cloud Computing
Eucalyptus - An Open-source Infrastructure for Cloud Computing
elliando dias
 
Open Source Cloud Computing -Eucalyptus
Open Source Cloud Computing -EucalyptusOpen Source Cloud Computing -Eucalyptus
Open Source Cloud Computing -Eucalyptus
Sameer Naik
 

En vedette (10)

Eucalyptus gnuNify 2012
Eucalyptus gnuNify 2012 Eucalyptus gnuNify 2012
Eucalyptus gnuNify 2012
 
SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official...
SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official...SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official...
SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official...
 
Building your own personal cloud with Eucalyptus
Building your own personal cloud with EucalyptusBuilding your own personal cloud with Eucalyptus
Building your own personal cloud with Eucalyptus
 
Deploying private cloud with eucalyptus
Deploying private cloud with eucalyptusDeploying private cloud with eucalyptus
Deploying private cloud with eucalyptus
 
Eucalyptus - An Open-source Infrastructure for Cloud Computing
Eucalyptus - An Open-source Infrastructure for Cloud ComputingEucalyptus - An Open-source Infrastructure for Cloud Computing
Eucalyptus - An Open-source Infrastructure for Cloud Computing
 
Open Source Cloud Computing -Eucalyptus
Open Source Cloud Computing -EucalyptusOpen Source Cloud Computing -Eucalyptus
Open Source Cloud Computing -Eucalyptus
 
Leadership resilience amid disruption: A report from the front lines
Leadership resilience amid disruption: A report from the front linesLeadership resilience amid disruption: A report from the front lines
Leadership resilience amid disruption: A report from the front lines
 
Working With Big Data
Working With Big DataWorking With Big Data
Working With Big Data
 
Analytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolutionAnalytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolution
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 

Similaire à Random notes on big data

Information Technology Data Mining
Information Technology Data MiningInformation Technology Data Mining
Information Technology Data Mining
samiksha sharma
 

Similaire à Random notes on big data (20)

Transformando la vida cotidiana a través de Big Data
Transformando la vida cotidiana a través de Big DataTransformando la vida cotidiana a través de Big Data
Transformando la vida cotidiana a través de Big Data
 
Big Data in Retail (White paper)
Big Data in Retail (White paper)Big Data in Retail (White paper)
Big Data in Retail (White paper)
 
Information Technology Data Mining
Information Technology Data MiningInformation Technology Data Mining
Information Technology Data Mining
 
Big Data overview
Big Data overviewBig Data overview
Big Data overview
 
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptx
 
Data mining and its applications!
Data mining and its applications!Data mining and its applications!
Data mining and its applications!
 
Kp-Data Analytics-ts.pptx
Kp-Data Analytics-ts.pptxKp-Data Analytics-ts.pptx
Kp-Data Analytics-ts.pptx
 
C21027_Aditya_Big Data Analytics In Baking Sector.pptx
C21027_Aditya_Big Data Analytics In Baking Sector.pptxC21027_Aditya_Big Data Analytics In Baking Sector.pptx
C21027_Aditya_Big Data Analytics In Baking Sector.pptx
 
Main Street, Meet Mr Watson - Matt Coatney
Main Street, Meet Mr Watson - Matt CoatneyMain Street, Meet Mr Watson - Matt Coatney
Main Street, Meet Mr Watson - Matt Coatney
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 
Riding and Capitalizing the Next Wave of Information Technology
Riding and Capitalizing the Next Wave of Information TechnologyRiding and Capitalizing the Next Wave of Information Technology
Riding and Capitalizing the Next Wave of Information Technology
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Effective Big Data Analytics Use Cases in 20+ Industries
Effective Big Data Analytics Use Cases in 20+ IndustriesEffective Big Data Analytics Use Cases in 20+ Industries
Effective Big Data Analytics Use Cases in 20+ Industries
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Random notes on big data

  • 1. random notes on big data Chen Peng, Jianqiang Wang, Yang Huang April 19, 2013
  • 2. What is big data
  • 3. ● Volume: Gigabytes- >Terabytes - >Petabytes. ● Velocity: time sensitive, streaming, real-time. Jet engine: 20TB/hr GE: (minds + machines) ● Variety: structured/unstructur ed. ● Value: insights, analytical systems.
  • 4. Challenges: collect, store, organize, analyze and share External > web sites (blogs/reviews) > social media (Facebook, LinkedIn, Google+, Twitter) > images and videos > ... Internal > transactions > server logs > machines and sensors > emails > ... Variety
  • 6. Technology stack & corresponding firms
  • 7. Google App Engine Google BigQuery Scalable application development and execution environment Google Compute Engine Virtual machines Run arbitrary workloads at scale (e.g. Hadoop, scientific computing) Google Cloud Platform Google Cloud Storage Storage Connecting glue between each step of the data pipeline Data analysis Querying large datasets + third party apps for visualization (e.g. Tableau)
  • 8. Big data analytics Analytics is The scientific process of transforming data into insights for making better decisions. Data Insight Decision IT logs, cloud, social media, sensors, experiments, etc. statistical & operations research modeling judgement, constraints, intuition "resource" "product" "goal"
  • 9. Predictive analytics extracts information from data and use it to predict future trends and behavior patterns. regression models discrete choice models time series models classification models (decision tree, random forest, support vector machine, neural network, etc.) clustering models (k-means, density based, graph based, etc.) association analysis ... Big data analytics Descriptive Analytics Predictive Analytics Prescriptive Analytics
  • 10. Always keep in mind... > business objectives are the origin of every data mining solution > data preparation is more than half of the data mining process > all patterns are subject to change > there will always be new knowledge Always pause and ask yourself: Does this work relate to the business question we try to answer? Is the original business question still valid?
  • 11. Industry Use-cases/Application Healthcare Drug development Patient monitoring Electronic Medical Records Utilities Smart grid optimization Retail & marketing Customer loyalty and churn analysis Targeted product and services offerings Product sentiment analysis Marketing campaign optimization Financial services Fraud detection & prevention Anti-money laundering Telecom Customer churn mitigation Geospatial analytics Call data record (CDR) analysis Use cases by industry
  • 12. Industry applications of big data analytics Customer acquisition predict customers' buying habits in order to promote relevant products at multiple touch points. http://www.youtube.com/watch?feature=player_embedded&v=3WspJ16Ubhw Clinical decision support Experts use predictive analysis in health care primarily to determine which patients are at risk of developing certain conditions, like diabetes, asthma, heart disease, and other lifetime illnesses. Cross sale predictive analytics can help analyze customers' spending, usage and other behavior, leading to efficient cross sales, or selling additional products to current customers (beer & diaper) Ads targeting http://www.slideshare.net/dennyglee/yahoo-tao-case-study-excerpt
  • 13. Fraud detection A predictive model can help weed out the "bads" and reduce a business's exposure to fraud. Image and Speech Recognition http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google. com/en/us/people/jeff/MIT_BigData_Sep2012.pdf Operations Jet Engine + Humans http://www.youtube.com/watch?v=JHc4ZTTWKrQ Industry applications of big data analytics Amazon wareouse operational efficiency: http://www.youtube.com/watch? v=Kafs9tZskuo
  • 15.
  • 16. What are those startups doing? Bloomreach http://www.youtube.com/watch?feature=player_embedded&v=K12awAj4tW8 Datastax http://www.nytimes.com/2013/02/25/business/media/for-house-of-cards-using-big-data-to-guarantee- its-popularity.html?pagewanted=all Paraccel http://www.paraccel.com/solutions/paraccel-solutions-big-data.php#.UXG207WG3Ct Kaggle http://www.kaggle.com/c/acm-sf-chapter-hackathon-big
  • 17. VC funding for "Big Data" Data from 71 start-ups. Funding is counted starting from 2004.
  • 18. VC Funding Activity Data from 71 start-ups. Funding is counted starting from 2004.
  • 19. Interesting view points " Special (domain) knowledge becomes less relevant; organizations should focus on collecting people who know how to extract value and insights from data." " In god we trust. All others must bring data." " The usefulness of a variable in a model is inversely related to the time you spend creating it." "Noise is convex but information is concave." "Big data is sexy but small data is beautiful." noise information data size
  • 20. Interesting view points "All models are wrong, but some are useful." "Big data is like teenage sex: everyone talks about it, nobody really knows how to do it; everyone thinks everyone else is doing it, so they claim they are doing it." "Statistics: The Art and Science of Learning from Data"
  • 21. The danger of big data
  • 22. Open discussion Potential opportunities / challenges for entrepreneurs? - visualization - internet of things - analytics as a service (a3 s) Standardization v.s. customization Human and data interaction - data v.s intuition
  • 24. Data Science v.s. OR risk management strategic planning predictive analytics optimization Risk Measurable of Objective skill sets of data scientists
  • 25.
  • 26. Big data types ● Web & social media: clickstream, web content, amazon reviews, facebook postings & 'like'... ● M2M:smart meters, oil rig sensor reading, GPS signals... ● Transaction:retail store, healthcare claims, utility billing... ● Biometrics:fingerprint, face, voice, handwriting.. ● Human-generated data:call logs, emails, surveys...
  • 27. Web & social media ● Transaction: orders, revenue, ● Conversion: click thru, convert to purchase,... ● Session: length, bounce rate ● Lifetime value: repeat, frequency,... ● Social interaction: intensity, influence,... Shopping cart analysis CTR prediction Personalization Retention/customer churn A/B testing Targeted ads Lifetime value
  • 28. Interesting data visualization projects wind map http://hint.fm/wind/gallery/oct-30.js.html
  • 29. Some analytical problems people deal with at Google ... ● search ranking
  • 30. Processing Pipeline Hadoop MapReduce log sensor web ... Structured Data Note: Hadoop -- an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. It supports the running of applications on large clusters of commodity hardware. Orginated from Google MapReduce and further developed/promoted by Yahoo. SQL HIVE Dremel ... Analytics Big Data Cloud Computing http://www.forbes.com/sites/davefeinleib/2012/06/19/the-big-data-landscape/
  • 31. How big is big? When your data set becomes so large that you have to start innovating around how to collect, store, organize, analyze and share it ... External > web sites (blogs/reviews) > social media (Facebook, LinkedIn, Google+, Twitter) > images and videos > ... Internal > transactions > server logs > machines and sensors > emails > ...
  • 32. Health care Sentiment analysis Patient monitoring Genetic Testing Electronic Medical Records Utilities Smart Meters Retail Loyalty programs RFID tags Recommenda tion, market basket Face recognition Telcos Customer churn Location- based IT Machine log Web & Social media M2M Transaction Biometrics Human- generat ed
  • 34.
  • 36.
  • 37.
  • 38.
  • 39.