SlideShare a Scribd company logo
1 of 34
Big Data Analytics- Hand Out
Practice
Vijay Bhaskar Semwal
Senior System Engineer
Siemens Information System Research lab, Gurgaon
August,2012 ( New Trainee Joining Course)
Energy Fossil Fuel ,Instrumentation and electronic
division previous health care Bangalore
Why Big Data?
• Big data is a popular term used to describe the
exponential growth and availability of data, both
structured and unstructured
Key enablers for the appearance and
growth of ‘Big-Data’ are:
+Increase in storage capabilities
+Increase in processing power
+Availability of data
What is the aim of the course
Focus is on “Systems” and applications for cloud-
based storage and processing of BIG DATA.
+Big Data - Definition
+Big Data - Analytics
+Big Data - Storage (HDFS)
+Big Data - Computing (Map/Reduce)
+Big Data - Database (HBase)
+Big Data – Graph DB (Titan)
+Big Data - Streaming (Strom)
Mantra
“Learning is not just restricted to listening, it is
actively asking relevant questions”
“It is crime not to ask question, ask any question. I
can expect any question but you can not expect
garbage answer. My Rule do not keep any doubt in
mind”
Rule of class
Introduction to Big Data
What are we going to understand
• What is Big Data?
• Why we landed up there?
• To whom does it matter
• Where is the money?
• Are we ready to handle it?
• What are the concerns?
• Tools and Technologies
▫ Is Big Data <=> Hadoop
Start UP
• What is the maximum file size you have dealt so far?
▫ Movies/Files/Streaming video that you have used?
▫ What have you observed?
• What is the maximum download speed you get?
• Simple computation
▫ How much time to just transfer.
What is big data?
• Every day, we create 2.5 quintillion bytes of data — so
much that 90% of the data in the world today has been
created in the last two years alone. This data comes
from everywhere: sensors used to gather climate
information, posts to social media sites, digital
pictures and videos, purchase transaction records, and
cell phone GPS signals to name a few.
This data is “big data.”
Big data spans three dimensions:
Volume, Velocity and Variety
• Volume: Enterprises are awash with ever-growing data of all types, easily amassing
terabytes—even petabytes—of information.
▫ Turn 12 terabytes of Tweets created each day into improved product sentiment analysis
▫ Convert 350 billion annual meter readings to better predict power consumption
• Velocity: Sometimes 2 minutes is too late. For time-sensitive processes such as catching
fraud, big data must be used as it streams into your enterprise in order to maximize its
value.
▫ Scrutinize 5 million trade events created each day to identify potential fraud
▫ Analyze 500 million daily call detail records in real-time to predict customer churn faster
▫ The latest I have heard is 10 nano seconds delay is too much.
• Variety: Big data is any type of data - structured and unstructured data such as text, sensor
data, audio, video, click streams, log files and more. New insights are found when analyzing
these data types together.
▫ Monitor 100’s of live video feeds from surveillance cameras to target points of interest
▫ Exploit the 80% data growth in images, video and documents to improve customer
satisfaction
• At SAS, we consider two additional dimensions
when thinking about big data:
• Variability. In addition to the increasing velocities
and varieties of data, data flows can be highly
inconsistent with periodic peaks. Is something
trending in social media? Daily, seasonal and event-
triggered peak data loads can be challenging to
manage. Even more so with unstructured data
involved.
• Complexity. Today's data comes from multiple
sources. And it is still an undertaking to link, match,
cleanse and transform data across systems.
However, it is necessary to connect and correlate
relationships, hierarchies and multiple data linkages
or your data can quickly spiral out of control.
Finally….
Big- Data’ is similar to ‘Small-data’ but bigger
.. But having data bigger it requires different
approaches:
Techniques, tools, architecture
… with an aim to solve new problems
Or old problems in a better way
The Social Layer in an Instrumented
Interconnected World
What does Big Data trigger?
BIG DATA is not just HADOOP
Introduction: Explosion in Quantity of
Data- Our Data-driven World
• Science
▫ Data bases from astronomy, genomics, environmental data,
transportation data, …
• Humanities and Social Sciences
▫ Scanned books, historical documents, social interactions data, new
technology like GPS …
• Business & Commerce
▫ Corporate sales, stock market transactions, census, airline traffic, …
• Entertainment
▫ Internet images, Hollywood movies, MP3 files, …
• Medicine
▫ MRI & CT scans, patient records, …
• Fish and Oceans of Data
• What we do with these amount of data?
•Ignore
Big Data Characteristics
7
Big Data Vectors (3Vs)
- high-volume
amount of data
- high-velocity
Speed rate in collecting or acquiring or generating or processing of data
- high-variety
different data type such as audio, video, image data (mostly
unstructured data)
Cost Problem (example)
8
Cost of processing 1 Petabyte of data
with 1000 node ?
1 PB = 1015 B = 1 million gigabytes = 1 thousand terabytes
- 9 hours for each node to process 500GB at rate of 15MB/S
- 15*60*60*9 = 486000MB ~ 500 GB
- 1000 * 9 * 0.34$ = 3060$ for single run
- 1 PB = 1000000 / 500 = 2000 * 9 =
18000 h /24 = 750 Day
- The cost for 1000 cloud node each
processing 1PB
2000 * 3060$ = 6,120,000$
Usage Example in Big Data
- Moneyball: The Art of Winning an Unfair Game
Oakland Athletics baseball team and its general manager Billy Beane
- Oakland A's' front office took advantage of more analytical gauges
of player performance to field a team that could compete
successfully against richer competitors in MLB
- Oakland approximately $41 million in salary,
New York Yankees, $125 million in payroll that same season.
Oakland is forced to find players undervalued by the market,
- Moneyball had a huge impact in other teams in MLB
And there is a moneyball movie!!!!!
Usage Example of Big Data
US 2012 Election
- data mining for
individualized ad targeting
- Orca big-data app
- YouTube channel( 23,700 subscribers
and 26 million page views)
- Ace of Spades HQ
- predictive modeling
- mybarackobama.com
- drive traffic to other campaign sites
Facebook page (33 million "likes")
YouTube channel (240,000 subscribers
and 246 million page views).
- a contest to dine with Sarah Jessica Parker
- Every single night, the team ran 66,000
computer simulations, Reddit!!!
- Amazon web services
Usage Example in Big Data 13
Data Analysis prediction for US 2012 Election
Drew Linzer, June 2012
332 for Obama,
206 for Romney
Nate Silver’s, Five thirty Eight blog
Predict Obama had a 86% chance of winning
Predicted all 50 state correctly
Sam Wang, the Princeton Election Consortium
The probability of Obama's re-election
at more than 98%
media continue reporting the race as very
tight
Some Challenges in Big Data
 Big Data Integration is Multidisciplinary
Less than 10% of Big Data world are genuinely relational
Meaningful data integration in the real, messy, schema-less
and complex Big Data world of database and semantic web
using multidisciplinary and multi-technology methode
 The Billion Triple Challenge
Web of data contain 31 billion RDf triples, that 446million of
them are RDF links, 13 Billion government data, 6 Billion
geographic data, 4.6 Billion Publication and Media data, 3 Billion
life science data
BTC 2011, Sindice 2011
 The Linked Open Data Ripper
Mapping, Ranking, Visualization, Key Matching, Snappiness
 Demonstrate the Value of Semantics: let data integration drive
DBMS technology
Large volumes of heterogeneous data, like link data and RDF
Other Aspects of Big Data
15
1- Automating Research Changes the Definition of Knowledge
2- Claim to Objectively and Accuracy are Misleading
3- Bigger Data are not always Better data
4- Not all Data are equivalent
5- Just because it is accessible doesn’t make it ethical
6- Limited access to big data creatrs new digital divides
Six Provocations for Big Data
Other Aspects of Big Data
• Five Big Question about big Data:
1- What happens in a world of radical transparency, with data widely available?
2- If you could test all your decisions, how would that change the way you
compete?
3- How would your business change if you used big data for widespread, real
time customization?
4- How can big data augment or even replace Management?
5-Could you create a new business model based on data?
16
17
Platforms for Large-scale Data Analysis
• Parallel DBMS technologies
▫ Proposed in late eighties
▫ Matured over the last two decades
▫ Multi-billion dollar industry: Proprietary DBMS Engines
intended as Data Warehousing solutions for very large
enterprises
• Map Reduce
▫ pioneered by Google
▫ popularized by Yahoo! (Hadoop)
1
8
MapReduce
• Overview:
▫ Data-parallel programming model
▫ An associated parallel and distributed
implementation for commodity clusters
• Pioneered by Google
▫ Processes 20 PB of data per day
• Popularized by open-source Hadoop
▫ Used by Yahoo!, Facebook,
Amazon, and the list is growing …
Parallel DBMS technologies
 Popularly used for more than two decades
 Research Projects: Gamma, Grace, …
 Commercial: Multi-billion dollar
industry but access to only a privileged
few
 Relational Data Model
 Indexing
 Familiar SQL interface
 Advanced query optimization
 Well understood and studied
19
MapReduce
Raw Input: <key, value>
MAP
<K2,V2><K1, V1> <K3,V3>
REDUCE
2
0
MapReduce Advantages
• Automatic Parallelization:
▫ Depending on the size of RAW INPUT DATA 
instantiate multiple MAP tasks
▫ Similarly, depending upon the number of intermediate
<key, value> partitions  instantiate multiple
REDUCE tasks
• Run-time:
▫ Data partitioning
▫ Task scheduling
▫ Handling machine failures
▫ Managing inter-machine communication
• Completely transparent to the
programmer/analyst/user
21
Map Reduce vs Parallel DBMS
Parallel DBMS MapReduce
Schema Support  Not out of the box
Indexing  Not out of the box
Programming Model
Declarative
(SQL)
Imperative
(C/C++, Java, …)
Extensions through
Pig and Hive
Optimizations
(Compression, Query
Optimization)
 Not out of the box
Flexibility Not out of the box 
Fault Tolerance
Coarse grained
techniques

Zeta-Byte Horizon
2
2
 the total amount of global data is expected to grow to 2.7 zettabytes
during 2012. This is 48% up from 2011
Wrap Up
2012 2020
x50
 As of 2009, the entire World Wide Web was estimated to
contain close to 500 exabytes. This is a half zettabyte
References
1. B. Brown, M. Chuiu and J. Manyika, “Are you ready for the era of Big Data?” McKinsey Quarterly, Oct
2011, McKinsey Global Institute
2. C. Bizer, P. Bonez, M. L. Bordie and O. Erling, “The Meaningful Use of Big Data: Four Perspective –
Four Challenges” SIGMOD Vol. 40, No. 4, December 2011
3. D. Boyd and K. Crawford, “Six Provation for Big Data” A Decade in Internet Time: Symposium on the
Dynamics of the Internet and Society, September 2011, Oxford Internet Institute
4. D. Agrawal, S. Das and A. E. Abbadi, “Big Data and Cloud Computing: Current State and Future
Opportunities” ETDB 2011, Uppsala, Sweden
5. D. Agrawal, S. Das and A. E. Abbadi, “Big Data and Cloud Computing: New Wine or Just New Bottles?”
VLDB 2010, Vol. 3, No. 2
6. F. J. Alexander, A. Hoisie and A. Szalay, “Big Data” IEEE Computing in Science and Engineering
journal 2011
7. O. Trelles, P Prins, M. Snir and R. C. Jansen, “Big Data, but are we ready?” Nature Reviews, Feb 2011
8. K. Bakhshi, “Considerations for Big data: Architecture and approach” Aerospace Conference, 2012
IEEE
8. S. Lohr, “The Age of Big Data” Thr New York times Publication, February 2012
10. M. Nielsen, “Aguide to the day of big data”, Nature, vol. 462, December 2009
2
4
Vijay Bhaskar semwal
THINK

More Related Content

What's hot

What's hot (20)

#BigDataCanarias: "Big Data & Career Paths"
#BigDataCanarias: "Big Data & Career Paths"#BigDataCanarias: "Big Data & Career Paths"
#BigDataCanarias: "Big Data & Career Paths"
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big data
Big dataBig data
Big data
 
Big Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesBig Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation Slides
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest MindsWhitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
 
Why Data is Drowning the (IT) World?
Why Data is Drowning the (IT) World?Why Data is Drowning the (IT) World?
Why Data is Drowning the (IT) World?
 
Approaching Big Data: Lesson Plan
Approaching Big Data: Lesson Plan Approaching Big Data: Lesson Plan
Approaching Big Data: Lesson Plan
 
Big Data Information Architecture PowerPoint Presentation Slide
Big Data Information Architecture PowerPoint Presentation SlideBig Data Information Architecture PowerPoint Presentation Slide
Big Data Information Architecture PowerPoint Presentation Slide
 
The promise and challenge of Big Data
The promise and challenge of Big DataThe promise and challenge of Big Data
The promise and challenge of Big Data
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
The Pros and Cons of Big Data in an ePatient World
The Pros and Cons of Big Data in an ePatient WorldThe Pros and Cons of Big Data in an ePatient World
The Pros and Cons of Big Data in an ePatient World
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of thingsBig Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
 
The importance of data
The importance of dataThe importance of data
The importance of data
 

Viewers also liked

Sascha Seifert, Siemens Healthcare, pour la journée e-health 2013
Sascha Seifert, Siemens Healthcare, pour la journée e-health 2013Sascha Seifert, Siemens Healthcare, pour la journée e-health 2013
Sascha Seifert, Siemens Healthcare, pour la journée e-health 2013
Thearkvalais
 
Agile Transition at Siemens Healthcare Syngo. XP2012 Presentation.
Agile Transition at Siemens Healthcare Syngo. XP2012 Presentation.Agile Transition at Siemens Healthcare Syngo. XP2012 Presentation.
Agile Transition at Siemens Healthcare Syngo. XP2012 Presentation.
Andrea Heck
 
Summer internship project report
Summer internship project reportSummer internship project report
Summer internship project report
Manish Singh
 

Viewers also liked (17)

Big data analytics workshop
Big data analytics workshopBig data analytics workshop
Big data analytics workshop
 
Siemens - Big Data, Internet of Things & Deleøkonomi
Siemens - Big Data, Internet of Things & DeleøkonomiSiemens - Big Data, Internet of Things & Deleøkonomi
Siemens - Big Data, Internet of Things & Deleøkonomi
 
Bus 615 group presentation
Bus 615 group presentationBus 615 group presentation
Bus 615 group presentation
 
Siemens AG Österreich - Data Provider + Data Customer
Siemens AG Österreich - Data Provider + Data CustomerSiemens AG Österreich - Data Provider + Data Customer
Siemens AG Österreich - Data Provider + Data Customer
 
EDF2013: Keynote Stefan Decker: Big Data In Ireland - Linked Data and beyond
EDF2013: Keynote Stefan Decker: Big Data In Ireland - Linked Data and beyondEDF2013: Keynote Stefan Decker: Big Data In Ireland - Linked Data and beyond
EDF2013: Keynote Stefan Decker: Big Data In Ireland - Linked Data and beyond
 
Sascha Seifert, Siemens Healthcare, pour la journée e-health 2013
Sascha Seifert, Siemens Healthcare, pour la journée e-health 2013Sascha Seifert, Siemens Healthcare, pour la journée e-health 2013
Sascha Seifert, Siemens Healthcare, pour la journée e-health 2013
 
2012 & plan for 2013
2012 & plan for 20132012 & plan for 2013
2012 & plan for 2013
 
EDF2013: Keynote Gerhard Kreß: Big Data in Industrial Applications
EDF2013: Keynote Gerhard Kreß: Big Data in Industrial ApplicationsEDF2013: Keynote Gerhard Kreß: Big Data in Industrial Applications
EDF2013: Keynote Gerhard Kreß: Big Data in Industrial Applications
 
Linda Brunner Presentation - BDI 3/29/12 HCP Healthcare Social Communications...
Linda Brunner Presentation - BDI 3/29/12 HCP Healthcare Social Communications...Linda Brunner Presentation - BDI 3/29/12 HCP Healthcare Social Communications...
Linda Brunner Presentation - BDI 3/29/12 HCP Healthcare Social Communications...
 
What You May Have Missed at AACC 2016
What You May Have Missed at AACC 2016What You May Have Missed at AACC 2016
What You May Have Missed at AACC 2016
 
Agile Transition of a big medical software product development
Agile Transition of a big medical software product developmentAgile Transition of a big medical software product development
Agile Transition of a big medical software product development
 
IBCon Internet of Things: Ten Years of Lessons Learned
IBCon Internet of Things: Ten Years of Lessons LearnedIBCon Internet of Things: Ten Years of Lessons Learned
IBCon Internet of Things: Ten Years of Lessons Learned
 
Healthcare Analytics Market Categorization
Healthcare Analytics Market CategorizationHealthcare Analytics Market Categorization
Healthcare Analytics Market Categorization
 
Pricing models for bpo organizations
Pricing models for bpo organizationsPricing models for bpo organizations
Pricing models for bpo organizations
 
Agile Transition at Siemens Healthcare Syngo. XP2012 Presentation.
Agile Transition at Siemens Healthcare Syngo. XP2012 Presentation.Agile Transition at Siemens Healthcare Syngo. XP2012 Presentation.
Agile Transition at Siemens Healthcare Syngo. XP2012 Presentation.
 
Epic Estimation - Agile or High Risk Guesswork
Epic Estimation - Agile or High Risk GuessworkEpic Estimation - Agile or High Risk Guesswork
Epic Estimation - Agile or High Risk Guesswork
 
Summer internship project report
Summer internship project reportSummer internship project report
Summer internship project report
 

Similar to Big Data By Vijay Bhaskar Semwal

Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
Vamshikrishna Goud
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
kalai75
 

Similar to Big Data By Vijay Bhaskar Semwal (20)

Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data and Internet
Big data and InternetBig data and Internet
Big data and Internet
 
Big data Analytics
Big data Analytics Big data Analytics
Big data Analytics
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
 
Understanding big data
Understanding big dataUnderstanding big data
Understanding big data
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Big_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptxBig_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptx
 
Big data Ppt
Big data PptBig data Ppt
Big data Ppt
 
Our big data
Our big dataOur big data
Our big data
 
Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big data
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 

Recently uploaded

DESIGN THINKING in architecture- Introduction
DESIGN THINKING in architecture- IntroductionDESIGN THINKING in architecture- Introduction
DESIGN THINKING in architecture- Introduction
sivagami49
 
RT Nagar Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...
RT Nagar Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...RT Nagar Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...
RT Nagar Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...
amitlee9823
 
Escorts Service Nagavara ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Nagavara ☎ 7737669865☎ Book Your One night Stand (Bangalore)Escorts Service Nagavara ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Nagavara ☎ 7737669865☎ Book Your One night Stand (Bangalore)
amitlee9823
 
infant assessment fdbbdbdddinal ppt.pptx
infant assessment fdbbdbdddinal ppt.pptxinfant assessment fdbbdbdddinal ppt.pptx
infant assessment fdbbdbdddinal ppt.pptx
suhanimunjal27
 
Escorts Service Basapura ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Basapura ☎ 7737669865☎ Book Your One night Stand (Bangalore)Escorts Service Basapura ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Basapura ☎ 7737669865☎ Book Your One night Stand (Bangalore)
amitlee9823
 
Peaches App development presentation deck
Peaches App development presentation deckPeaches App development presentation deck
Peaches App development presentation deck
tbatkhuu1
 
call girls in Dakshinpuri (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
call girls in Dakshinpuri  (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️call girls in Dakshinpuri  (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
call girls in Dakshinpuri (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Basavanagudi Just Call 👗 7737669865 👗 Top Class Call Girl Service ...
Call Girls Basavanagudi Just Call 👗 7737669865 👗 Top Class Call Girl Service ...Call Girls Basavanagudi Just Call 👗 7737669865 👗 Top Class Call Girl Service ...
Call Girls Basavanagudi Just Call 👗 7737669865 👗 Top Class Call Girl Service ...
amitlee9823
 

Recently uploaded (20)

call girls in Vasundhra (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...
call girls in Vasundhra (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...call girls in Vasundhra (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...
call girls in Vasundhra (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...
 
DESIGN THINKING in architecture- Introduction
DESIGN THINKING in architecture- IntroductionDESIGN THINKING in architecture- Introduction
DESIGN THINKING in architecture- Introduction
 
call girls in Kaushambi (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...
call girls in Kaushambi (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...call girls in Kaushambi (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...
call girls in Kaushambi (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...
 
RT Nagar Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...
RT Nagar Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...RT Nagar Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...
RT Nagar Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...
 
Escorts Service Nagavara ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Nagavara ☎ 7737669865☎ Book Your One night Stand (Bangalore)Escorts Service Nagavara ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Nagavara ☎ 7737669865☎ Book Your One night Stand (Bangalore)
 
Case Study of Hotel Taj Vivanta, Pune
Case Study of Hotel Taj Vivanta, PuneCase Study of Hotel Taj Vivanta, Pune
Case Study of Hotel Taj Vivanta, Pune
 
Top Rated Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...
Top Rated  Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...Top Rated  Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...
Top Rated Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...
 
Q4-W4-SCIENCE-5 power point presentation
Q4-W4-SCIENCE-5 power point presentationQ4-W4-SCIENCE-5 power point presentation
Q4-W4-SCIENCE-5 power point presentation
 
HiFi Call Girl Service Delhi Phone ☞ 9899900591 ☜ Escorts Service at along wi...
HiFi Call Girl Service Delhi Phone ☞ 9899900591 ☜ Escorts Service at along wi...HiFi Call Girl Service Delhi Phone ☞ 9899900591 ☜ Escorts Service at along wi...
HiFi Call Girl Service Delhi Phone ☞ 9899900591 ☜ Escorts Service at along wi...
 
infant assessment fdbbdbdddinal ppt.pptx
infant assessment fdbbdbdddinal ppt.pptxinfant assessment fdbbdbdddinal ppt.pptx
infant assessment fdbbdbdddinal ppt.pptx
 
Escorts Service Basapura ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Basapura ☎ 7737669865☎ Book Your One night Stand (Bangalore)Escorts Service Basapura ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Basapura ☎ 7737669865☎ Book Your One night Stand (Bangalore)
 
AMBER GRAIN EMBROIDERY | Growing folklore elements | Root-based materials, w...
AMBER GRAIN EMBROIDERY | Growing folklore elements |  Root-based materials, w...AMBER GRAIN EMBROIDERY | Growing folklore elements |  Root-based materials, w...
AMBER GRAIN EMBROIDERY | Growing folklore elements | Root-based materials, w...
 
Peaches App development presentation deck
Peaches App development presentation deckPeaches App development presentation deck
Peaches App development presentation deck
 
VIP Model Call Girls Kalyani Nagar ( Pune ) Call ON 8005736733 Starting From ...
VIP Model Call Girls Kalyani Nagar ( Pune ) Call ON 8005736733 Starting From ...VIP Model Call Girls Kalyani Nagar ( Pune ) Call ON 8005736733 Starting From ...
VIP Model Call Girls Kalyani Nagar ( Pune ) Call ON 8005736733 Starting From ...
 
call girls in Dakshinpuri (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
call girls in Dakshinpuri  (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️call girls in Dakshinpuri  (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
call girls in Dakshinpuri (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
 
Booking open Available Pune Call Girls Kirkatwadi 6297143586 Call Hot Indian...
Booking open Available Pune Call Girls Kirkatwadi  6297143586 Call Hot Indian...Booking open Available Pune Call Girls Kirkatwadi  6297143586 Call Hot Indian...
Booking open Available Pune Call Girls Kirkatwadi 6297143586 Call Hot Indian...
 
Booking open Available Pune Call Girls Nanded City 6297143586 Call Hot India...
Booking open Available Pune Call Girls Nanded City  6297143586 Call Hot India...Booking open Available Pune Call Girls Nanded City  6297143586 Call Hot India...
Booking open Available Pune Call Girls Nanded City 6297143586 Call Hot India...
 
Call Girls Basavanagudi Just Call 👗 7737669865 👗 Top Class Call Girl Service ...
Call Girls Basavanagudi Just Call 👗 7737669865 👗 Top Class Call Girl Service ...Call Girls Basavanagudi Just Call 👗 7737669865 👗 Top Class Call Girl Service ...
Call Girls Basavanagudi Just Call 👗 7737669865 👗 Top Class Call Girl Service ...
 
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Hy...
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Hy...Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Hy...
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Hy...
 
UI:UX Design and Empowerment Strategies for Underprivileged Transgender Indiv...
UI:UX Design and Empowerment Strategies for Underprivileged Transgender Indiv...UI:UX Design and Empowerment Strategies for Underprivileged Transgender Indiv...
UI:UX Design and Empowerment Strategies for Underprivileged Transgender Indiv...
 

Big Data By Vijay Bhaskar Semwal

  • 1. Big Data Analytics- Hand Out Practice Vijay Bhaskar Semwal Senior System Engineer Siemens Information System Research lab, Gurgaon August,2012 ( New Trainee Joining Course) Energy Fossil Fuel ,Instrumentation and electronic division previous health care Bangalore
  • 2. Why Big Data? • Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured Key enablers for the appearance and growth of ‘Big-Data’ are: +Increase in storage capabilities +Increase in processing power +Availability of data
  • 3.
  • 4. What is the aim of the course Focus is on “Systems” and applications for cloud- based storage and processing of BIG DATA. +Big Data - Definition +Big Data - Analytics +Big Data - Storage (HDFS) +Big Data - Computing (Map/Reduce) +Big Data - Database (HBase) +Big Data – Graph DB (Titan) +Big Data - Streaming (Strom)
  • 5. Mantra “Learning is not just restricted to listening, it is actively asking relevant questions” “It is crime not to ask question, ask any question. I can expect any question but you can not expect garbage answer. My Rule do not keep any doubt in mind” Rule of class
  • 7. What are we going to understand • What is Big Data? • Why we landed up there? • To whom does it matter • Where is the money? • Are we ready to handle it? • What are the concerns? • Tools and Technologies ▫ Is Big Data <=> Hadoop
  • 8. Start UP • What is the maximum file size you have dealt so far? ▫ Movies/Files/Streaming video that you have used? ▫ What have you observed? • What is the maximum download speed you get? • Simple computation ▫ How much time to just transfer.
  • 9. What is big data? • Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is “big data.”
  • 10. Big data spans three dimensions: Volume, Velocity and Variety • Volume: Enterprises are awash with ever-growing data of all types, easily amassing terabytes—even petabytes—of information. ▫ Turn 12 terabytes of Tweets created each day into improved product sentiment analysis ▫ Convert 350 billion annual meter readings to better predict power consumption • Velocity: Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value. ▫ Scrutinize 5 million trade events created each day to identify potential fraud ▫ Analyze 500 million daily call detail records in real-time to predict customer churn faster ▫ The latest I have heard is 10 nano seconds delay is too much. • Variety: Big data is any type of data - structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together. ▫ Monitor 100’s of live video feeds from surveillance cameras to target points of interest ▫ Exploit the 80% data growth in images, video and documents to improve customer satisfaction
  • 11. • At SAS, we consider two additional dimensions when thinking about big data: • Variability. In addition to the increasing velocities and varieties of data, data flows can be highly inconsistent with periodic peaks. Is something trending in social media? Daily, seasonal and event- triggered peak data loads can be challenging to manage. Even more so with unstructured data involved. • Complexity. Today's data comes from multiple sources. And it is still an undertaking to link, match, cleanse and transform data across systems. However, it is necessary to connect and correlate relationships, hierarchies and multiple data linkages or your data can quickly spiral out of control.
  • 12. Finally…. Big- Data’ is similar to ‘Small-data’ but bigger .. But having data bigger it requires different approaches: Techniques, tools, architecture … with an aim to solve new problems Or old problems in a better way
  • 13. The Social Layer in an Instrumented Interconnected World
  • 14. What does Big Data trigger?
  • 15. BIG DATA is not just HADOOP
  • 16. Introduction: Explosion in Quantity of Data- Our Data-driven World • Science ▫ Data bases from astronomy, genomics, environmental data, transportation data, … • Humanities and Social Sciences ▫ Scanned books, historical documents, social interactions data, new technology like GPS … • Business & Commerce ▫ Corporate sales, stock market transactions, census, airline traffic, … • Entertainment ▫ Internet images, Hollywood movies, MP3 files, … • Medicine ▫ MRI & CT scans, patient records, …
  • 17. • Fish and Oceans of Data • What we do with these amount of data? •Ignore
  • 18. Big Data Characteristics 7 Big Data Vectors (3Vs) - high-volume amount of data - high-velocity Speed rate in collecting or acquiring or generating or processing of data - high-variety different data type such as audio, video, image data (mostly unstructured data)
  • 19. Cost Problem (example) 8 Cost of processing 1 Petabyte of data with 1000 node ? 1 PB = 1015 B = 1 million gigabytes = 1 thousand terabytes - 9 hours for each node to process 500GB at rate of 15MB/S - 15*60*60*9 = 486000MB ~ 500 GB - 1000 * 9 * 0.34$ = 3060$ for single run - 1 PB = 1000000 / 500 = 2000 * 9 = 18000 h /24 = 750 Day - The cost for 1000 cloud node each processing 1PB 2000 * 3060$ = 6,120,000$
  • 20. Usage Example in Big Data - Moneyball: The Art of Winning an Unfair Game Oakland Athletics baseball team and its general manager Billy Beane - Oakland A's' front office took advantage of more analytical gauges of player performance to field a team that could compete successfully against richer competitors in MLB - Oakland approximately $41 million in salary, New York Yankees, $125 million in payroll that same season. Oakland is forced to find players undervalued by the market, - Moneyball had a huge impact in other teams in MLB And there is a moneyball movie!!!!!
  • 21. Usage Example of Big Data US 2012 Election - data mining for individualized ad targeting - Orca big-data app - YouTube channel( 23,700 subscribers and 26 million page views) - Ace of Spades HQ - predictive modeling - mybarackobama.com - drive traffic to other campaign sites Facebook page (33 million "likes") YouTube channel (240,000 subscribers and 246 million page views). - a contest to dine with Sarah Jessica Parker - Every single night, the team ran 66,000 computer simulations, Reddit!!! - Amazon web services
  • 22. Usage Example in Big Data 13 Data Analysis prediction for US 2012 Election Drew Linzer, June 2012 332 for Obama, 206 for Romney Nate Silver’s, Five thirty Eight blog Predict Obama had a 86% chance of winning Predicted all 50 state correctly Sam Wang, the Princeton Election Consortium The probability of Obama's re-election at more than 98% media continue reporting the race as very tight
  • 23. Some Challenges in Big Data  Big Data Integration is Multidisciplinary Less than 10% of Big Data world are genuinely relational Meaningful data integration in the real, messy, schema-less and complex Big Data world of database and semantic web using multidisciplinary and multi-technology methode  The Billion Triple Challenge Web of data contain 31 billion RDf triples, that 446million of them are RDF links, 13 Billion government data, 6 Billion geographic data, 4.6 Billion Publication and Media data, 3 Billion life science data BTC 2011, Sindice 2011  The Linked Open Data Ripper Mapping, Ranking, Visualization, Key Matching, Snappiness  Demonstrate the Value of Semantics: let data integration drive DBMS technology Large volumes of heterogeneous data, like link data and RDF
  • 24. Other Aspects of Big Data 15 1- Automating Research Changes the Definition of Knowledge 2- Claim to Objectively and Accuracy are Misleading 3- Bigger Data are not always Better data 4- Not all Data are equivalent 5- Just because it is accessible doesn’t make it ethical 6- Limited access to big data creatrs new digital divides Six Provocations for Big Data
  • 25. Other Aspects of Big Data • Five Big Question about big Data: 1- What happens in a world of radical transparency, with data widely available? 2- If you could test all your decisions, how would that change the way you compete? 3- How would your business change if you used big data for widespread, real time customization? 4- How can big data augment or even replace Management? 5-Could you create a new business model based on data? 16
  • 26. 17 Platforms for Large-scale Data Analysis • Parallel DBMS technologies ▫ Proposed in late eighties ▫ Matured over the last two decades ▫ Multi-billion dollar industry: Proprietary DBMS Engines intended as Data Warehousing solutions for very large enterprises • Map Reduce ▫ pioneered by Google ▫ popularized by Yahoo! (Hadoop)
  • 27. 1 8 MapReduce • Overview: ▫ Data-parallel programming model ▫ An associated parallel and distributed implementation for commodity clusters • Pioneered by Google ▫ Processes 20 PB of data per day • Popularized by open-source Hadoop ▫ Used by Yahoo!, Facebook, Amazon, and the list is growing … Parallel DBMS technologies  Popularly used for more than two decades  Research Projects: Gamma, Grace, …  Commercial: Multi-billion dollar industry but access to only a privileged few  Relational Data Model  Indexing  Familiar SQL interface  Advanced query optimization  Well understood and studied
  • 28. 19 MapReduce Raw Input: <key, value> MAP <K2,V2><K1, V1> <K3,V3> REDUCE
  • 29. 2 0 MapReduce Advantages • Automatic Parallelization: ▫ Depending on the size of RAW INPUT DATA  instantiate multiple MAP tasks ▫ Similarly, depending upon the number of intermediate <key, value> partitions  instantiate multiple REDUCE tasks • Run-time: ▫ Data partitioning ▫ Task scheduling ▫ Handling machine failures ▫ Managing inter-machine communication • Completely transparent to the programmer/analyst/user
  • 30. 21 Map Reduce vs Parallel DBMS Parallel DBMS MapReduce Schema Support  Not out of the box Indexing  Not out of the box Programming Model Declarative (SQL) Imperative (C/C++, Java, …) Extensions through Pig and Hive Optimizations (Compression, Query Optimization)  Not out of the box Flexibility Not out of the box  Fault Tolerance Coarse grained techniques 
  • 31. Zeta-Byte Horizon 2 2  the total amount of global data is expected to grow to 2.7 zettabytes during 2012. This is 48% up from 2011 Wrap Up 2012 2020 x50  As of 2009, the entire World Wide Web was estimated to contain close to 500 exabytes. This is a half zettabyte
  • 32. References 1. B. Brown, M. Chuiu and J. Manyika, “Are you ready for the era of Big Data?” McKinsey Quarterly, Oct 2011, McKinsey Global Institute 2. C. Bizer, P. Bonez, M. L. Bordie and O. Erling, “The Meaningful Use of Big Data: Four Perspective – Four Challenges” SIGMOD Vol. 40, No. 4, December 2011 3. D. Boyd and K. Crawford, “Six Provation for Big Data” A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society, September 2011, Oxford Internet Institute 4. D. Agrawal, S. Das and A. E. Abbadi, “Big Data and Cloud Computing: Current State and Future Opportunities” ETDB 2011, Uppsala, Sweden 5. D. Agrawal, S. Das and A. E. Abbadi, “Big Data and Cloud Computing: New Wine or Just New Bottles?” VLDB 2010, Vol. 3, No. 2 6. F. J. Alexander, A. Hoisie and A. Szalay, “Big Data” IEEE Computing in Science and Engineering journal 2011 7. O. Trelles, P Prins, M. Snir and R. C. Jansen, “Big Data, but are we ready?” Nature Reviews, Feb 2011 8. K. Bakhshi, “Considerations for Big data: Architecture and approach” Aerospace Conference, 2012 IEEE 8. S. Lohr, “The Age of Big Data” Thr New York times Publication, February 2012 10. M. Nielsen, “Aguide to the day of big data”, Nature, vol. 462, December 2009 2 4
  • 34. THINK