SlideShare une entreprise Scribd logo
1  sur  21
An Introduction to
BIG DATA
CUSO Seminar on Big Data
Prof. Dr. Philippe Cudré-Mauroux
http://exascale.info
May 22, 2014
Fribourg–Switzerland
1
2
On the Menu Today
• Big Data: Context
• Big Data: Buzzwords
– 3 Vs of Big Data
• Big Data Landscape
• Hadoop
• Big Data in Switzerland
Instant Quizz
• 3 Vs of Big Data?
• CAP?
• Hadoop?
• Spark?
3
Exascale Data Deluge
• Science
– Biology
– Astronomy
– Remote Sensing
• Web companies
– Ebay
– Yahoo
• Financial services,
retail companies
governments, etc.
© Wired 2009
➡ New data formats
➡ New machines
➡ Peta & exa-scale datasets
➡ Obsolescence of traditional
information infrastructures
The Web as the Main Driver
5
© Qmee
Big Data Central Theorem
Data+Technology  Actionable Insight  $$
6
Big Data Buzz
7
Between now and 2015, the firm expects big data to
create some 4.4 million IT jobs globally; of those,
1.9 million will be in the U.S. Applying an economic
multiplier to that estimate, Gartner expects each new
big-data-related IT job to create work for three more
people outside the tech industry, for a total of almost
6 million more U.S. jobs.
Growth in the Asia Pacific Big Data
market is expected to accelerate rapidly
in two to three years time, from a mere
US$258.5 million last year to in excess
of $1.76 billion in 2016, with highest
growth in the storage segment.
Big Data as a New Class of Asset
• The Age of Big Data (NYTimes Feb. 11, 2012)
http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-
world.html
“Welcome to the Age of Big Data. The new megarich of Silicon Valley, first at
Google and now Facebook, are masters at harnessing the data of the Web
— online searches, posts and messages — with Internet advertising. At the
World Economic Forum last month in Davos, Switzerland, Big Data was a
marquee topic. A report by the forum, “Big Data, Big Impact,” declared data
a new class of economic asset, like currency or gold.”
8
9
The 3-Vs of Big Data
• Volume
– Amount of data
• Velocity
– speed of data in and out
• Variety
– range of data types and sources
• [Gartner 2012] "Big Data are high-volume, high-velocity, and/or high-
variety information assets that require new forms of processing to
enable enhanced decision making, insight discovery and process
optimization" 10
What can you do with the data
• Reporting
– Post Hoc
– Real time
• Monitoring (fine-grained)
• Exploration
• Finding Patterns
• Root Cause Analysis
• Closed-loop Control
• Model construction
• Prediction
• …
11
© Mike Franklin
10 ways big data changes everything
• Some concrete examples
– http://gigaom.com/2012/03/11/10-ways-big-data-is-changing-everything/2/
1. Can gigabytes predict the next Lady Gaga?
2. How big data can curb the world’s energy consumption
3. Big data is now your company’s virtual assistant
4. The future of Foursquare is data-fueled recommendations
5. How Twitter data-tracked cholera in Haiti
6. Revolutionizing Web publishing with big data
7. Can cell phone data cure society’s ills?
8. How data can help predict and create video hits
9. The new face of data visualization
10. One hospital’s embrace of big data
12
Typical Big Data Success Story
• Modeling users through Big Data
– Online ads sale / placement [e.g., Facebook]
– Personalized Coupons [e.g., Target]
– Product Placement [Walmart]
– Content Generation [e.g., NetFlix]
– Personalized learning [e.g., Duolingo]
– HR Recruiting [e.g., Gild]
13
More Data => Better Answers?
• Not that easy…
• More Rows: Algorithmic complexity kicks in
• More Columns: Exponentially more hypotheses
• Another formulation of the problem:
– Given an inferential goal and a fixed computational budget,
provide a guarantee that the quality of inference will increase
monotonically as data accrue (without bound)
• In other words:
=> Data should be a resource, not a load
14
© Mike Jordan
Big Data Infrastructures
15
A Concrete Example: Zynga
16
Leading the Pack of Wolves: Hadoop
• Google: Map/Reduce paper published 2004
• Open source variant: Hadoop
• Map-reduce = high-level programming model and
implementation for large-scale parallel data processing
• Right now most overhyped system in CS
17
What about Swiss Big Data?
• Competitive Research Groups
• Swiss Big Data User Group
• Swiss companies playing catch-up
– Productized Big Data systems at leading telcos & financial
companies
– Big Data is not a new technology: it's a fact;
• Deal with it  POCs in most banks, insurance companies, retailers
18
Tasty Bites of Big Data (1)
Thursday afternoon
• 13:30-15:00: Big Data Profiling
Felix Naumann (Hasso Plattner Institute)
• 15:15-16:45: Realtime Analytics
Christoph Koch (EPFL)
• 16:45-17:45: Current Trends and Challenges in Big Data
Benchmarking
Kais Sachs (SAP / Spec)
19
Tasty Bites of Big Data (2)
Friday
• 9:00 - 10:30: Structured Data in Web Search
Alon Halevy (Google)
• 10:45 - 12:15: Human Computation for Big Data
Gianluca Demartini (UNIFR)
• 13:30-15:00: Analysing and Querying Big Scientific Data
Thomas Heinis (EPFL)
• 15:00-16:30: The Evolution of Big Data Frameworks
Carlo Curino (Microsoft Research)
20
Social Event, Friday – Beer
Tasting!
Basse-Ville Fribourg / 15 CHF per Person
Everything You Always Wanted to Know
About Beer. * But Were Afraid to Ask!
18:00 @ Café du Belvédère, Grand-Rue
36
19:00 @ Fri-Mousse, Rue de la
Samaritaine 19
Limited Places, Inscription is mandatory at:
http://xr.si

Contenu connexe

Tendances

Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?Anna Kuhn
 
Big Data - Insights & Challenges
Big Data - Insights & ChallengesBig Data - Insights & Challenges
Big Data - Insights & ChallengesRupen Momaya
 
Team 2 Big Data Presentation
Team 2 Big Data PresentationTeam 2 Big Data Presentation
Team 2 Big Data PresentationMatthew Urdan
 
Big data - What is It?
Big data - What is It?Big data - What is It?
Big data - What is It?Nicole Aidney
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementationSandip Tipayle Patil
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Yaman Hajja, Ph.D.
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and OpportunitiesKenny Huang Ph.D.
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)Shahbaz Anjam
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataRichard Vidgen
 
Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges   Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges Experian_US
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBernard Marr
 
Big data and its applications
Big data and its applicationsBig data and its applications
Big data and its applicationsali easazadeh
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBernard Marr
 

Tendances (20)

Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?
 
Big Data - Insights & Challenges
Big Data - Insights & ChallengesBig Data - Insights & Challenges
Big Data - Insights & Challenges
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
Team 2 Big Data Presentation
Team 2 Big Data PresentationTeam 2 Big Data Presentation
Team 2 Big Data Presentation
 
Big data - What is It?
Big data - What is It?Big data - What is It?
Big data - What is It?
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and Opportunities
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
Big data
Big dataBig data
Big data
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Big Data Presentation
Big  Data PresentationBig  Data Presentation
Big Data Presentation
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges   Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 
Big data and its applications
Big data and its applicationsBig data and its applications
Big data and its applications
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
The promise and challenge of Big Data
The promise and challenge of Big DataThe promise and challenge of Big Data
The promise and challenge of Big Data
 

Similaire à An Introduction to Big Data

Big data and Internet
Big data and InternetBig data and Internet
Big data and InternetSanoj Kumar
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalIIIT Allahabad
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)eXascale Infolab
 
Big data seminor
Big data seminorBig data seminor
Big data seminorberasrujana
 
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria? Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria? INACAP
 
BigDataFinal.pptx
BigDataFinal.pptxBigDataFinal.pptx
BigDataFinal.pptxPentaTech
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptxkalai75
 
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest MindsWhitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest MindsHappiest Minds Technologies
 
QuickView #3 - Big Data
QuickView #3 - Big DataQuickView #3 - Big Data
QuickView #3 - Big DataSonovate
 

Similaire à An Introduction to Big Data (20)

Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
 
Big data and Internet
Big data and InternetBig data and Internet
Big data and Internet
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data
Big dataBig data
Big data
 
Big data seminor
Big data seminorBig data seminor
Big data seminor
 
Fundamentals of Big Data
Fundamentals of Big DataFundamentals of Big Data
Fundamentals of Big Data
 
Big data basics
Big data basicsBig data basics
Big data basics
 
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria? Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
 
BigDataFinal.pptx
BigDataFinal.pptxBigDataFinal.pptx
BigDataFinal.pptx
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big Data 2.0
Big Data 2.0Big Data 2.0
Big Data 2.0
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
 
Identifying the new frontier of big data as an enabler for T&T industries: Re...
Identifying the new frontier of big data as an enabler for T&T industries: Re...Identifying the new frontier of big data as an enabler for T&T industries: Re...
Identifying the new frontier of big data as an enabler for T&T industries: Re...
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest MindsWhitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
 
QuickView #3 - Big Data
QuickView #3 - Big DataQuickView #3 - Big Data
QuickView #3 - Big Data
 

Plus de eXascale Infolab

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictioneXascale Infolab
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...eXascale Infolab
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex GraphseXascale Infolab
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapeXascale Infolab
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...eXascale Infolab
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...eXascale Infolab
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceanseXascale Infolab
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutioneXascale Infolab
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataeXascale Infolab
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data ManagementeXascale Infolab
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataeXascale Infolab
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataeXascale Infolab
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingeXascale Infolab
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...eXascale Infolab
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingeXascale Infolab
 

Plus de eXascale Infolab (20)

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
 
Hasler2014
Hasler2014Hasler2014
Hasler2014
 

Dernier

Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制vexqp
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 

Dernier (20)

Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 

An Introduction to Big Data

  • 1. An Introduction to BIG DATA CUSO Seminar on Big Data Prof. Dr. Philippe Cudré-Mauroux http://exascale.info May 22, 2014 Fribourg–Switzerland 1
  • 2. 2 On the Menu Today • Big Data: Context • Big Data: Buzzwords – 3 Vs of Big Data • Big Data Landscape • Hadoop • Big Data in Switzerland
  • 3. Instant Quizz • 3 Vs of Big Data? • CAP? • Hadoop? • Spark? 3
  • 4. Exascale Data Deluge • Science – Biology – Astronomy – Remote Sensing • Web companies – Ebay – Yahoo • Financial services, retail companies governments, etc. © Wired 2009 ➡ New data formats ➡ New machines ➡ Peta & exa-scale datasets ➡ Obsolescence of traditional information infrastructures
  • 5. The Web as the Main Driver 5 © Qmee
  • 6. Big Data Central Theorem Data+Technology  Actionable Insight  $$ 6
  • 7. Big Data Buzz 7 Between now and 2015, the firm expects big data to create some 4.4 million IT jobs globally; of those, 1.9 million will be in the U.S. Applying an economic multiplier to that estimate, Gartner expects each new big-data-related IT job to create work for three more people outside the tech industry, for a total of almost 6 million more U.S. jobs. Growth in the Asia Pacific Big Data market is expected to accelerate rapidly in two to three years time, from a mere US$258.5 million last year to in excess of $1.76 billion in 2016, with highest growth in the storage segment.
  • 8. Big Data as a New Class of Asset • The Age of Big Data (NYTimes Feb. 11, 2012) http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the- world.html “Welcome to the Age of Big Data. The new megarich of Silicon Valley, first at Google and now Facebook, are masters at harnessing the data of the Web — online searches, posts and messages — with Internet advertising. At the World Economic Forum last month in Davos, Switzerland, Big Data was a marquee topic. A report by the forum, “Big Data, Big Impact,” declared data a new class of economic asset, like currency or gold.” 8
  • 9. 9
  • 10. The 3-Vs of Big Data • Volume – Amount of data • Velocity – speed of data in and out • Variety – range of data types and sources • [Gartner 2012] "Big Data are high-volume, high-velocity, and/or high- variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization" 10
  • 11. What can you do with the data • Reporting – Post Hoc – Real time • Monitoring (fine-grained) • Exploration • Finding Patterns • Root Cause Analysis • Closed-loop Control • Model construction • Prediction • … 11 © Mike Franklin
  • 12. 10 ways big data changes everything • Some concrete examples – http://gigaom.com/2012/03/11/10-ways-big-data-is-changing-everything/2/ 1. Can gigabytes predict the next Lady Gaga? 2. How big data can curb the world’s energy consumption 3. Big data is now your company’s virtual assistant 4. The future of Foursquare is data-fueled recommendations 5. How Twitter data-tracked cholera in Haiti 6. Revolutionizing Web publishing with big data 7. Can cell phone data cure society’s ills? 8. How data can help predict and create video hits 9. The new face of data visualization 10. One hospital’s embrace of big data 12
  • 13. Typical Big Data Success Story • Modeling users through Big Data – Online ads sale / placement [e.g., Facebook] – Personalized Coupons [e.g., Target] – Product Placement [Walmart] – Content Generation [e.g., NetFlix] – Personalized learning [e.g., Duolingo] – HR Recruiting [e.g., Gild] 13
  • 14. More Data => Better Answers? • Not that easy… • More Rows: Algorithmic complexity kicks in • More Columns: Exponentially more hypotheses • Another formulation of the problem: – Given an inferential goal and a fixed computational budget, provide a guarantee that the quality of inference will increase monotonically as data accrue (without bound) • In other words: => Data should be a resource, not a load 14 © Mike Jordan
  • 17. Leading the Pack of Wolves: Hadoop • Google: Map/Reduce paper published 2004 • Open source variant: Hadoop • Map-reduce = high-level programming model and implementation for large-scale parallel data processing • Right now most overhyped system in CS 17
  • 18. What about Swiss Big Data? • Competitive Research Groups • Swiss Big Data User Group • Swiss companies playing catch-up – Productized Big Data systems at leading telcos & financial companies – Big Data is not a new technology: it's a fact; • Deal with it  POCs in most banks, insurance companies, retailers 18
  • 19. Tasty Bites of Big Data (1) Thursday afternoon • 13:30-15:00: Big Data Profiling Felix Naumann (Hasso Plattner Institute) • 15:15-16:45: Realtime Analytics Christoph Koch (EPFL) • 16:45-17:45: Current Trends and Challenges in Big Data Benchmarking Kais Sachs (SAP / Spec) 19
  • 20. Tasty Bites of Big Data (2) Friday • 9:00 - 10:30: Structured Data in Web Search Alon Halevy (Google) • 10:45 - 12:15: Human Computation for Big Data Gianluca Demartini (UNIFR) • 13:30-15:00: Analysing and Querying Big Scientific Data Thomas Heinis (EPFL) • 15:00-16:30: The Evolution of Big Data Frameworks Carlo Curino (Microsoft Research) 20
  • 21. Social Event, Friday – Beer Tasting! Basse-Ville Fribourg / 15 CHF per Person Everything You Always Wanted to Know About Beer. * But Were Afraid to Ask! 18:00 @ Café du Belvédère, Grand-Rue 36 19:00 @ Fri-Mousse, Rue de la Samaritaine 19 Limited Places, Inscription is mandatory at: http://xr.si