SlideShare une entreprise Scribd logo
1  sur  30
What
BIG data
Isn't
! = GB/TB/PB
! =
What
Everyone
Is saying
"Big Data is the amount of data that one single
machine cannot store and process"
OTN - 2014
"I have travelled the length and breadth of this
country and talked with the best people, and I
can assure you that data processing is a fad that
won't last out the year."
Editor, Prentice Hall - 1957
"Information is the oil of the 21st
century, and analytics is the
combustion engine."
Peter Søndergaard – Gartner group
"Data is the new
science. Big Data holds
the answers."
Pat Gelsinger - EMC
"Big Data is not the new oil."
Jes Thorp – Harvard busienss review
"Not everything that can be counted
counts, and not everything that counts
can be counted."
William Bruce Cameron
"You can have data without information, but
you cannot have information without data."
Daniel Keys Moran
As for "Big Data" I think that is also a concept. In living memory keeping
detailed sales by style, color, and size was too much to hold for most
retail chains and at least two that tried screwed themselves into
bankruptcy. By now we have mostly advanced to vendor managed inventory of
not only the inventory and sales, but the in store shelf locations and
dollar turn per cubic centimeter of shelf space to leverage the vendors
against each other in negotiating for shelf space. That was an epochal
change in how the world of retail works, which as a side effect helps
non-brick and mortar establishments negotiate with vendors as well.
Being able to keep even transiently more orders of magnitude of data and
analyze it in a way that even *might* give a competitive advantage is the
concept of "Big Data" that makes it something different. I completely
dislike the name, but I think the concept is extremely useful. I don't think
it has a single thing to do with the physical infrastructure that processes
the data. A big part of the concept is that it includes data collection from
non-transactional systems and behaviors where the Internet Of Things is
included in the search space.
Mark Farnham - Oaktable
My take away from OpenWorld, was that you buy $1m in gear,
harvest 27 billion tweets....do the hadoop equivalent of:
select count(*)
from shitloads_of_tweets
where text like '%you suck%'
and work from there ?
Am I missing something ?
Connor McDonald - Oaktable
Others can call Big Data whatever shit they want but these days but
the only viable Big Data stack that is somewhat guaranteed to survive (but
likely evolve a lot) the rest is Hadoop. IMHO.
And there are TWO things it let you do:
1. Due to commodity software and hardware phenomena of last years,
you can now build scalable data processing systems affordable to pretty
much ANY organization. On few TB scale you just use Linux file system
and MySQL or Postgres if needed and maybe flash storage. Beyond that -
it's Hadoop.
2. Since running scalable Hadoop cluster is so cheap, efficiency of
processing becomes secondary and value moves towards its flexibility -
how quickly you can try things, grow the system and integrate new kinds
of data into it. Agility is king - time to market is critical.
What most forget is that in its current state, Hadoop requires shit load of
really good engineering talent. This is why it's only justifiable at certain
scale because savings on h/w and s/w will trump the cost of additional
engineering getting into order of magnitude difference or two.
I'll take my coat...
--
Alex Gorbachev
Software and hardware must be affordable
at scale, or you can go home. Oracle,
EMC, Teradata, IBM, Netapp can all just
forget about it.
Jeffrey Needham – One of the hadoops
Certainly 1000 node (or 5000 node, if you like) clusters are fully automated ...
The data science pipelines are not, nor is the surrounding ecosystem
engineering, but the Hadoop cluster needs a shopping cart to operate.
Nobody "operates" or admins clusters as this scale. This would be pure
insanity. XXXX operates 8 4000 node clusters with 10 people. These people
mostly surf YouTube on their NOC screens as there isn't much for them to do
either.
My job was in production engineering - making sure all the grids worked
across all colos (and for $100, no less). However, search engineering (or data
science production engineering is probably what the new group will be called)
has their back.
Everyone on oak table should figure out how to either build or be in a data
science production group)
Don't bother learning how to operate HDFS and Yarn (and the 8 zillion plugins).
Hadoop 2.0 (be it Hwx or CDH) will be the next OS/database kernel you need to
learn.
And It's OK if you don't believe me ...
So what is
BIG data?
DATA processing that scales
DATA processing with fault tolerance
DATA accessible from everywhere
“When there is an elephant in the room
– Introduce him”
Randy Pausch – The Last Lecture
https://www.youtube.com/watch?v=ji5_MqicxSo&t=0m45s
The Hadoop Distributed File System is not a complex, feature-rich,
kitchen sink file system, but it does two things very well: it’s
economical
and functional at enormous scale.
Affordable. At. Scale.
Maybe that’s all it should be. A big data reservoir should make it
possible for traditional database products to directly access HDFS
and still provide a canal for enterprises to channel their old data
sources into the new
Reservoir.
Big data reservoirs must allow old and new data to coexist and inter‐
mingle. For example, DB2 currently supports table spaces on tradi‐
tional OS file systems, but when it supports HDFS directly, it could
provide customers with a built-in channel from the past to the future.
HDFS contains a feature called federation that, over time, could be
used to create a reservoir of reservoirs, which will make it possible to
create planetary file systems that can act locally but think globally.
4. import java.util.*;
5.
6. import org.apache.hadoop.fs.Path;
7. import org.apache.hadoop.filecache.DistributedCache;
8. import org.apache.hadoop.conf.*;
9. import org.apache.hadoop.io.*;
10. import org.apache.hadoop.mapred.*;
11. import org.apache.hadoop.util.*;
12.
13. public class WordCount extends Configured implements Tool {
14.
15. public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
16.
17. static enum Counters { INPUT_WORDS }
18.
19. private final static IntWritable one = new IntWritable(1);
20. private Text word = new Text();
21.
22. private boolean caseSensitive = true;
23. private Set<String> patternsToSkip = new HashSet<String>();
24.
25. private long numRecords = 0;
26. private String inputFile;
27.
28. public void configure(JobConf job) {
29. caseSensitive = job.getBoolean("wordcount.case.sensitive", true);
30. inputFile = job.get("map.input.file");
31.
32. if (job.getBoolean("wordcount.skip.patterns", false)) {
33. Path[] patternsFiles = new Path[0];
34. try {
35. patternsFiles = DistributedCache.getLocalCacheFiles(job);
36. } catch (IOException ioe) {
37. System.err.println("Caught exception while getting cached files: " + StringUtils.stringifyException(ioe));
38. }
39. for (Path patternsFile : patternsFiles) {
40. parseSkipFile(patternsFile);
41. }
42. }
43. }
44.
45. private void parseSkipFile(Path patternsFile) {
46. try {
47. BufferedReader fis = new BufferedReader(new FileReader(patternsFile.toString()));
48. String pattern = null;
49. while ((pattern = fis.readLine()) != null) {
50. patternsToSkip.add(pattern);
51. }
52. } catch (IOException ioe) {
53. System.err.println("Caught exception while parsing the cached file '" + patternsFile + "' : " + StringUtils.stringifyException(ioe));
54. }
55. }
56.
57. public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
58. String line = (caseSensitive) ? value.toString() : value.toString().toLowerCase();
59.
60. for (String pattern : patternsToSkip) {
61. line = line.replaceAll(pattern, "");
62. }
63.
64. StringTokenizer tokenizer = new StringTokenizer(line);
65. while (tokenizer.hasMoreTokens()) {
66. word.set(tokenizer.nextToken());
67. output.collect(word, one);
68. reporter.incrCounter(Counters.INPUT_WORDS, 1);
69. }
70.
71. if ((++numRecords % 100) == 0) {
72. reporter.setStatus("Finished processing " + numRecords + " records " + "from the input file: " + inputFile);
73. }
74. }
75. }
76.
77. public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
78. public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
79. int sum = 0;
80. while (values.hasNext()) {
81. sum += values.next().get();
82. }
83. output.collect(key, new IntWritable(sum));
84. }
85. }
86.
87. public int run(String[] args) throws Exception {
88. JobConf conf = new JobConf(getConf(), WordCount.class);
89. conf.setJobName("wordcount");
90.
91. conf.setOutputKeyClass(Text.class);
= select word, count(word)
From words_Table
Group by word;
De fleste syntes ikke at det var
smart
Welcome Hadoop platforms
Time for a demo!
So what can we do with Oracle and Hadoop?
Data Loader for Oracle Oracle Direct Connector
Correlation != Causation
DSB vs. P3
Top kunstnere skyld I forsinkelser:
Rihanna 03,46%
Medina 01,78%
Lady Gaga 01,26%
Andre < 1%
Danske kunstnere skyld I forsinkelser:
Medina 09,74%
Fallulah 04,31%
Panamah 02,83%
Pharfar 01,34%
Ukendt Kunstner 01,11%
Andre < 1%
DSB vs. Pollen
El
Hassel
Elm
Birk
Bynke
Græs
Andre
0 5 10 15 20 25 30 35
Pollen forsinkelser
Procent
Pollentype
DSB vs. Nasdaq
Januar Februar Marts April Maj Juni Juli August September Oktober November December
-5
0
5
10
15
20
Size does not matter
It is all about the data
?

Contenu connexe

Similaire à BigData primer

A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopStefano Paluello
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"Nicola Ferraro
 
Dubai Big Data in Finance, Intro to Hadoop 2-Apr-14 - Michael Segel
Dubai Big Data in Finance, Intro to Hadoop 2-Apr-14 - Michael SegelDubai Big Data in Finance, Intro to Hadoop 2-Apr-14 - Michael Segel
Dubai Big Data in Finance, Intro to Hadoop 2-Apr-14 - Michael SegelMichael Segel
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with SparkArjen de Vries
 
Large Scale Data With Hadoop
Large Scale Data With HadoopLarge Scale Data With Hadoop
Large Scale Data With Hadoopguest27e6764
 
Big data & Hadoop & How we use it at Alchetron
Big data & Hadoop & How we use it at AlchetronBig data & Hadoop & How we use it at Alchetron
Big data & Hadoop & How we use it at AlchetronPaul Jr.
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldDez Blanchfield
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1gauravsc36
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridEvert Lammerts
 
2014 july 24_what_ishadoop
2014 july 24_what_ishadoop2014 july 24_what_ishadoop
2014 july 24_what_ishadoopAdam Muise
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Mahantesh Angadi
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
HAX - Chaotic Good
HAX - Chaotic GoodHAX - Chaotic Good
HAX - Chaotic Goodbtopro
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talksyhadoop
 

Similaire à BigData primer (20)

A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and Hadoop
 
Next generation technology
Next generation technologyNext generation technology
Next generation technology
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
 
Final deck
Final deckFinal deck
Final deck
 
Dubai Big Data in Finance, Intro to Hadoop 2-Apr-14 - Michael Segel
Dubai Big Data in Finance, Intro to Hadoop 2-Apr-14 - Michael SegelDubai Big Data in Finance, Intro to Hadoop 2-Apr-14 - Michael Segel
Dubai Big Data in Finance, Intro to Hadoop 2-Apr-14 - Michael Segel
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
 
Large Scale Data With Hadoop
Large Scale Data With HadoopLarge Scale Data With Hadoop
Large Scale Data With Hadoop
 
Big data & Hadoop & How we use it at Alchetron
Big data & Hadoop & How we use it at AlchetronBig data & Hadoop & How we use it at Alchetron
Big data & Hadoop & How we use it at Alchetron
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
2014 july 24_what_ishadoop
2014 july 24_what_ishadoop2014 july 24_what_ishadoop
2014 july 24_what_ishadoop
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
 
00 hadoop welcome_transcript
00 hadoop welcome_transcript00 hadoop welcome_transcript
00 hadoop welcome_transcript
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
HAX - Chaotic Good
HAX - Chaotic GoodHAX - Chaotic Good
HAX - Chaotic Good
 
HadoopWorkshopJuly2014
HadoopWorkshopJuly2014HadoopWorkshopJuly2014
HadoopWorkshopJuly2014
 
Hadoop
HadoopHadoop
Hadoop
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 

Dernier

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...gajnagarg
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 

Dernier (20)

Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 

BigData primer

  • 1.
  • 2.
  • 5. ! =
  • 7. "Big Data is the amount of data that one single machine cannot store and process" OTN - 2014 "I have travelled the length and breadth of this country and talked with the best people, and I can assure you that data processing is a fad that won't last out the year." Editor, Prentice Hall - 1957 "Information is the oil of the 21st century, and analytics is the combustion engine." Peter Søndergaard – Gartner group "Data is the new science. Big Data holds the answers." Pat Gelsinger - EMC "Big Data is not the new oil." Jes Thorp – Harvard busienss review "Not everything that can be counted counts, and not everything that counts can be counted." William Bruce Cameron "You can have data without information, but you cannot have information without data." Daniel Keys Moran
  • 8. As for "Big Data" I think that is also a concept. In living memory keeping detailed sales by style, color, and size was too much to hold for most retail chains and at least two that tried screwed themselves into bankruptcy. By now we have mostly advanced to vendor managed inventory of not only the inventory and sales, but the in store shelf locations and dollar turn per cubic centimeter of shelf space to leverage the vendors against each other in negotiating for shelf space. That was an epochal change in how the world of retail works, which as a side effect helps non-brick and mortar establishments negotiate with vendors as well. Being able to keep even transiently more orders of magnitude of data and analyze it in a way that even *might* give a competitive advantage is the concept of "Big Data" that makes it something different. I completely dislike the name, but I think the concept is extremely useful. I don't think it has a single thing to do with the physical infrastructure that processes the data. A big part of the concept is that it includes data collection from non-transactional systems and behaviors where the Internet Of Things is included in the search space. Mark Farnham - Oaktable My take away from OpenWorld, was that you buy $1m in gear, harvest 27 billion tweets....do the hadoop equivalent of: select count(*) from shitloads_of_tweets where text like '%you suck%' and work from there ? Am I missing something ? Connor McDonald - Oaktable
  • 9. Others can call Big Data whatever shit they want but these days but the only viable Big Data stack that is somewhat guaranteed to survive (but likely evolve a lot) the rest is Hadoop. IMHO. And there are TWO things it let you do: 1. Due to commodity software and hardware phenomena of last years, you can now build scalable data processing systems affordable to pretty much ANY organization. On few TB scale you just use Linux file system and MySQL or Postgres if needed and maybe flash storage. Beyond that - it's Hadoop. 2. Since running scalable Hadoop cluster is so cheap, efficiency of processing becomes secondary and value moves towards its flexibility - how quickly you can try things, grow the system and integrate new kinds of data into it. Agility is king - time to market is critical. What most forget is that in its current state, Hadoop requires shit load of really good engineering talent. This is why it's only justifiable at certain scale because savings on h/w and s/w will trump the cost of additional engineering getting into order of magnitude difference or two. I'll take my coat... -- Alex Gorbachev Software and hardware must be affordable at scale, or you can go home. Oracle, EMC, Teradata, IBM, Netapp can all just forget about it. Jeffrey Needham – One of the hadoops
  • 10. Certainly 1000 node (or 5000 node, if you like) clusters are fully automated ... The data science pipelines are not, nor is the surrounding ecosystem engineering, but the Hadoop cluster needs a shopping cart to operate. Nobody "operates" or admins clusters as this scale. This would be pure insanity. XXXX operates 8 4000 node clusters with 10 people. These people mostly surf YouTube on their NOC screens as there isn't much for them to do either. My job was in production engineering - making sure all the grids worked across all colos (and for $100, no less). However, search engineering (or data science production engineering is probably what the new group will be called) has their back. Everyone on oak table should figure out how to either build or be in a data science production group) Don't bother learning how to operate HDFS and Yarn (and the 8 zillion plugins). Hadoop 2.0 (be it Hwx or CDH) will be the next OS/database kernel you need to learn. And It's OK if you don't believe me ...
  • 11. So what is BIG data?
  • 12. DATA processing that scales DATA processing with fault tolerance DATA accessible from everywhere
  • 13. “When there is an elephant in the room – Introduce him” Randy Pausch – The Last Lecture https://www.youtube.com/watch?v=ji5_MqicxSo&t=0m45s
  • 14.
  • 15. The Hadoop Distributed File System is not a complex, feature-rich, kitchen sink file system, but it does two things very well: it’s economical and functional at enormous scale. Affordable. At. Scale. Maybe that’s all it should be. A big data reservoir should make it possible for traditional database products to directly access HDFS and still provide a canal for enterprises to channel their old data sources into the new Reservoir. Big data reservoirs must allow old and new data to coexist and inter‐ mingle. For example, DB2 currently supports table spaces on tradi‐ tional OS file systems, but when it supports HDFS directly, it could provide customers with a built-in channel from the past to the future. HDFS contains a feature called federation that, over time, could be used to create a reservoir of reservoirs, which will make it possible to create planetary file systems that can act locally but think globally.
  • 16.
  • 17.
  • 18.
  • 19. 4. import java.util.*; 5. 6. import org.apache.hadoop.fs.Path; 7. import org.apache.hadoop.filecache.DistributedCache; 8. import org.apache.hadoop.conf.*; 9. import org.apache.hadoop.io.*; 10. import org.apache.hadoop.mapred.*; 11. import org.apache.hadoop.util.*; 12. 13. public class WordCount extends Configured implements Tool { 14. 15. public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { 16. 17. static enum Counters { INPUT_WORDS } 18. 19. private final static IntWritable one = new IntWritable(1); 20. private Text word = new Text(); 21. 22. private boolean caseSensitive = true; 23. private Set<String> patternsToSkip = new HashSet<String>(); 24. 25. private long numRecords = 0; 26. private String inputFile; 27. 28. public void configure(JobConf job) { 29. caseSensitive = job.getBoolean("wordcount.case.sensitive", true); 30. inputFile = job.get("map.input.file"); 31. 32. if (job.getBoolean("wordcount.skip.patterns", false)) { 33. Path[] patternsFiles = new Path[0]; 34. try { 35. patternsFiles = DistributedCache.getLocalCacheFiles(job); 36. } catch (IOException ioe) { 37. System.err.println("Caught exception while getting cached files: " + StringUtils.stringifyException(ioe)); 38. } 39. for (Path patternsFile : patternsFiles) { 40. parseSkipFile(patternsFile); 41. } 42. } 43. } 44. 45. private void parseSkipFile(Path patternsFile) { 46. try { 47. BufferedReader fis = new BufferedReader(new FileReader(patternsFile.toString())); 48. String pattern = null; 49. while ((pattern = fis.readLine()) != null) { 50. patternsToSkip.add(pattern); 51. } 52. } catch (IOException ioe) { 53. System.err.println("Caught exception while parsing the cached file '" + patternsFile + "' : " + StringUtils.stringifyException(ioe)); 54. } 55. } 56. 57. public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { 58. String line = (caseSensitive) ? value.toString() : value.toString().toLowerCase(); 59. 60. for (String pattern : patternsToSkip) { 61. line = line.replaceAll(pattern, ""); 62. } 63. 64. StringTokenizer tokenizer = new StringTokenizer(line); 65. while (tokenizer.hasMoreTokens()) { 66. word.set(tokenizer.nextToken()); 67. output.collect(word, one); 68. reporter.incrCounter(Counters.INPUT_WORDS, 1); 69. } 70. 71. if ((++numRecords % 100) == 0) { 72. reporter.setStatus("Finished processing " + numRecords + " records " + "from the input file: " + inputFile); 73. } 74. } 75. } 76. 77. public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { 78. public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { 79. int sum = 0; 80. while (values.hasNext()) { 81. sum += values.next().get(); 82. } 83. output.collect(key, new IntWritable(sum)); 84. } 85. } 86. 87. public int run(String[] args) throws Exception { 88. JobConf conf = new JobConf(getConf(), WordCount.class); 89. conf.setJobName("wordcount"); 90. 91. conf.setOutputKeyClass(Text.class); = select word, count(word) From words_Table Group by word; De fleste syntes ikke at det var smart
  • 21.
  • 22.
  • 23. Time for a demo!
  • 24. So what can we do with Oracle and Hadoop? Data Loader for Oracle Oracle Direct Connector
  • 26. DSB vs. P3 Top kunstnere skyld I forsinkelser: Rihanna 03,46% Medina 01,78% Lady Gaga 01,26% Andre < 1% Danske kunstnere skyld I forsinkelser: Medina 09,74% Fallulah 04,31% Panamah 02,83% Pharfar 01,34% Ukendt Kunstner 01,11% Andre < 1%
  • 27. DSB vs. Pollen El Hassel Elm Birk Bynke Græs Andre 0 5 10 15 20 25 30 35 Pollen forsinkelser Procent Pollentype
  • 28. DSB vs. Nasdaq Januar Februar Marts April Maj Juni Juli August September Oktober November December -5 0 5 10 15 20
  • 29. Size does not matter It is all about the data
  • 30. ?

Notes de l'éditeur

  1. Big data – Hvorfor hedder det det? 1. Oprindeligt fra DNA sekvensering 2. Så ko m de store DW installationer med 3. Så kom internettet og specielt søgemaskiner
  2. Det er ikke størrelsen der gør det Hvis det var, er vi så færdige med Big Data når en disk bliver på 1000PB? Eller?
  3. What is unstructured data??? If it is not structured how can we save and process it?