Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Hadoop - Introduzione all’architettura ed approcci applicativi

1 265 vues

Publié le

• Cos’è Apache Hadoop?
• Un po’ di storia
• L’algoritmo Map-Reduce
• L’architettura
• Cloudera
• Esempio Applicativo
• Configurazione
• Amministrazione
• Sicurezza
• «Estensioni» di
Hadoop
• Bibliografia

Publié dans : Ingénierie
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (Unlimited) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/qURD } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/qURD } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/qURD } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/qURD } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/qURD } ......................................................................................................................... Download doc Ebook here { https://soo.gd/qURD } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Hadoop - Introduzione all’architettura ed approcci applicativi

  1. 1. Introduzione all’architettura ed approcci applicativi Messina, 21/03/2015 Dario Catalano
  2. 2. Qualcosa su di me… Messina, 21/03/2015 Dario Catalano dario@catalano.email Linkedin Google+ Twitter
  3. 3. Di cosa parleremo Messina, 21/03/2015 Dario Catalano • Cos’è Apache Hadoop? • Un po’ di storia • L’algoritmo Map-Reduce • L’architettura • Cloudera • Esempio Applicativo • Configurazione • Amministrazione • Sicurezza • «Estensioni» di Hadoop • Bibliografia
  4. 4. Cosa è Hadoop? Messina, 21/03/2015 Dario Catalano Framework Cluster Big Data Map Reduce Distribuited File System API Fault tollerant Cloud Scalable Cost effective Extensible Flexibile Java
  5. 5. Un po’ di storia Messina, 21/03/2015 Dario Catalano 2003 Google File System
  6. 6. Un po’ di storia Messina, 21/03/2015 Dario Catalano 2004 Google Map Reduce
  7. 7. Un po’ di storia Messina, 21/03/2015 Dario Catalano 2005 Doug Cutting Mike Cafarella
  8. 8. Un po’ di storia Messina, 21/03/2015 Dario Catalano 2006 2011 2013
  9. 9. Map Reduce » Step I Messina, 21/03/2015 Dario Catalano … … … … … Mapper Mapper Mapper Mapper Datas … Intermediate Datas Record Record Record Record Record Record Record Partitionated Datas Record Record Record Record Record Record Record Record Record Map Tasks (K1,Va) (K2,Vb) (K3,Vc) (K4,Vd) (K5,Ve) (K6,Vf) … (K1,Vc) (K5,Vc) (K5,Vf) (K2,Vd) (K1,Ve) (K1,Vf) … (K3,Va) (K2,Va) (K1,Vc) (K7,Vd) (K2,Vb) (K3,Vf) … (K4,Va) (K2,Vb) (K8,Vc) (K7,Vf) (K2,Ve) (K5,Vf)
  10. 10. Map Reduce » Step II Messina, 21/03/2015 Dario Catalano … Intermediate Datas (K1,Va) (K2,Vb) (K3,Vc) (K4,Vd) (K5,Ve) (K6,Vf) … (K1,Vc) (K5,Vc) (K5,Vf) (K2,Vd) (K1,Ve) (K1,Vf) … (K3,Va) (K2,Va) (K1,Vc) (K7,Vd) (K2,Vb) (K3,Vf) … (K4,Va) (K1,Vb) (K8,Vc) (K7,Vf) (K2,Ve) (K5,Vf) K1 Va Ve K2 K3 K4 Vf Vb K5 Ve Vf Vc Vf K6 K7 Vd K8 Vc Vc Vb Vc Va Vd Ve Vb Vc Va Vf Vd Va Vf Vf Intermediate DatasShuffle, Partitionig and Sorting
  11. 11. Map Reduce » Step III Messina, 21/03/2015 Dario Catalano K1 Va Ve K2 K3 K4 Vf Vb K5 Ve Vf Vc Vf K6 K7 Vd K8 Vc Vc Vb Vc Va Vd Ve Vb Vc Va Vf Vd Va Vf Vf Intermediate Datas Reducer … Record Record Reducer … Record Record Reduce Tasks … Record Record Record Record Record Record Record Record Output Datas
  12. 12. Map Reduce » Esempio Messina, 21/03/2015 Dario Catalano the, 3 brown,2 fox, 2 how, 1 now, 1 quick, 1 ate, 1 mouse, 1 cow, 1
  13. 13. Architettura » Visione ad alto livello Messina, 21/03/2015 Dario Catalano HDFS MapReduce Java Client
  14. 14. Architettura » HDFS Messina, 21/03/2015 Dario Catalano • Distribuito • Master/Slave • Blocchi solitamente >= 64 Mb (grande mole di dati) • Ridondante (3 copie) • Facilmente scalabile
  15. 15. Architettura » HDFS Messina, 21/03/2015 Dario Catalano
  16. 16. Architettura » HDFS » NameNode Messina, 21/03/2015 Dario Catalano • Ruolo Master • Responsabile dei Metadata  Struttura directory, file e relativi permessi  Posizione dei blocchi  Stato dei files  Identità dei DataNode caricata al boot  Filename dei blocchi nei fs locale dei DataNode • Dati in memoria
  17. 17. Architettura » HDFS » Scrittura file Messina, 21/03/2015 Dario Catalano
  18. 18. Architettura » HDFS » Lettura file Messina, 21/03/2015 Dario Catalano
  19. 19. Architettura » HDFS » Secondary NameNode Messina, 21/03/2015 Dario Catalano
  20. 20. Architettura » HDFS » Comandi Messina, 21/03/2015 Dario Catalano hadoop fs –cat file:///file2 hadoop fs –mkdir /user/hadoop/dir1  /user/hadoop/dir2 hadoop fs –copyFromLocal <fromDir> <toDir> hadoop fs –put <localfile>  hdfs://nn.example.com/hadoop/hadoopfile hadoop fs –ls /user/hadoop/dir1 hadoop fs –cat hdfs://nn1.example.com/file1 hadoop fs –get /user/hadoop/file <localfile> sudo hadoop jar <jarFileName> <method> <fromDir>  <toDir> 
  21. 21. Architettura » HDFS » Affidabilità Messina, 21/03/2015 Dario Catalano • DataNode Heartbeat • Trade-off replicazione blocchi (1 locale e 2 in un altro rack)  Fattore di replicazione configurabile per file (in heartbeat) • Checksum dei blocchi • Cancellazione: Trash directory (6 ore) » Cancellazione fisica • NameNode collo di bottiglia in Hadoop 1.x  Dimensioni dei metadati  Mancanza di replicazione
  22. 22. Architettura » Map Reduce v1 Messina, 21/03/2015 Dario Catalano
  23. 23. Architettura » Map Reduce v1 Messina, 21/03/2015 Dario Catalano
  24. 24. Architettura » Map Reduce v1 Messina, 21/03/2015 Dario Catalano • Master / Slave • TaskTracker:  Creazione task slot-based  JVM fork  Heartbeat • JobTracker:  Responsabile/Gestore del Job  Colloquia con il NameNode  Effettua recovery di task falliti  Punto debole dell’architettura
  25. 25. Architettura » Master / Slave Messina, 21/03/2015 Dario Catalano • Hdfs e MapReduce nello stesso nodo =  minore traffico di rete = maggiore  performance
  26. 26. Architettura » YARN Messina, 21/03/2015 Dario Catalano • Container  Unità computazionale  Controlla CPU e RAM assegnate • Node Manager  Riceve richieste del RS (Slave)  Gestisce ciclo vita dei container  Gestisce logging e servizi ausiliari • Resource Manager:  Riceve richieste da AM  Schedula con politiche variabili (Fair, Capacity,…) • Application Master  Dipende dal tipo di applicazione  Separazione delle responsabilità = Scalabilità
  27. 27. Architettura » YARN Messina, 21/03/2015 Dario Catalano 1. CL -> RM (inizio applicazione) 2. RM -> NM (richiesta nuovo AM) 3. AM -> RM (registrazione) 4. AM -> RM (richiesta risorse) 5. AM -> NM(s) (avvio containers) 6. CS -> AM (containers eseguono il codice ed inviano checks) 7. CL -> AM (client chiede stato applicazione) 8. AM -> RM (shutdown)
  28. 28. Architettura » YARN Messina, 21/03/2015 Dario Catalano Hadoop 1.x Hadoop 2.x Tipo di elaborazione Solo Map Reduce Implementazioni multiple Gestione delle risorse e delle elaborazioni Unica (JobTracker) Separata (ResourceManager e  Application Master) Scalabilità di HDFS Singolo NameNode HDFS Federation Affidabilità di HDFS Singolo NameNode HDFS High Availability Limite Nodi 4.000 10.000
  29. 29. HostHost Processo Modalità di Esecuzione Messina, 21/03/2015 Dario Catalano Singolo Processo NameNode Job Tracker Task Tracker Pseudo Distribuita Data Node Secondary NameNode Host NN JT Host DN TT M Host DN TT R Distribuita
  30. 30. Prima del codice… Messina, 21/03/2015 Dario Catalano • Servizi, architetture e formazione su Apache Hadoop • Apache Main Contributor • CDH (Cloudera Distribution with Hadoop)
  31. 31. Cloudera Quickstart VM Messina, 21/03/2015 Dario Catalano • CDH 5 è basata su Linux Centos 6.4 • Contiene:  HDFS, MapReduce, Hadoop Common, Hbase, Hive, Pig, Oozie, Sqoop, Flume,  ZooKeeper, Hue, Whirr, Mahout, Cloudera Manager • Disponibile per VMWare, KVM, Oracle Virtualbox • Requisititi Minimi:  4 Gb RAM (8 raccomandati)  64 bit host OS • Scaricabile da:  http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms/ cdh-5-3-x.html
  32. 32. Word Count API «vecchia» Messina, 21/03/2015 Dario Catalano import java.io.IOException; ... import org.apache.hadoop.mapred.TextOutputFormat; public class WordCountOldAPI { public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCountOldAPI.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(MyMapper.class); conf.setCombinerClass(MyReducer.class); conf.setReducerClass(MyReducer.class); conf.setNumReduceTasks(1); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); }
  33. 33. Word Count API «vecchia» Messina, 21/03/2015 Dario Catalano public static class MyMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text,  IntWritable> { public void map(LongWritable key, Text value,OutputCollector<Text, IntWritable> output, Reporter  reporter) throws IOException { output.collect(new Text(value.toString()), new IntWritable(1)); } } public static class MyReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text,  IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output,  Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } } }
  34. 34. La «nuova» API Messina, 21/03/2015 Dario Catalano • Introdotta con Hadoop 0.20 (2009) • Nuovo package • Più concisa e compatta • Più pulita e leggibile • Permette un controllo del Job più completo ed accurato • Non confondere versione API con versione dell’architettura (1.x o 2.x)
  35. 35. Word Count API «nuova» Messina, 21/03/2015 Dario Catalano import java.io.IOException; ... import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; public class WordCountNewAPI { public static void main(String[] args) throws Exception { Job job = Job.getInstance(new Configuration()); job.setJarByClass(WordCountNewAPI.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(MyMapper.class); job.setReducerClass(MyReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); boolean status = job.waitForCompletion(true); if (status) System.exit(0) else System.exit(1); }
  36. 36. Word Count API «nuova» Messina, 21/03/2015 Dario Catalano public static class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable> { public void map(LongWritable key, Text value, Context context) throws IOException,  InterruptedException { String w = value.toString(); context.write(new Text(w), new IntWritable(1)); } } public static class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException,  InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } }
  37. 37. E adesso… Messina, 21/03/2015 Dario Catalano …un po’ di pratica
  38. 38. Configurazione Messina, 21/03/2015 Dario Catalano • Configurazione XML per ogni nodo • *‐default.xml nei JAR di Hadoop, *‐site.xml nella cartella di configurazione • 4 tipi file principali:  core‐*.xml  hdfs‐*.xml  mapred‐*.xml  yarn‐*.xml • Precedenza delle proprietà definite in varie posizioni:  Oggetto Job o JobConf all’interno del codice  File *‐site.xml all’interno del nodo Client  File *‐site.xml all’interno del nodo Slave  File *‐default.xml nei JAR (uguali in tutti i nodi)
  39. 39. Amministrazione e Monitoring Messina, 21/03/2015 Dario Catalano • Command Line Interface • Log files • Interfacce Web per ogni processo • YARN REST API • JMX • Manager Tools  Cloudera Manager  Ambari
  40. 40. Sicurezza Messina, 21/03/2015 Dario Catalano • Inizialmente trascurata (solo dati pubblici) • Hadoop solo su reti private • Sviluppo software di terze parti:  Cloudera Sentry, IBM InfoSphere Optim Data Masking, Intel's secure Hadoop distribution, DataStax Enterprise, DataGuise for Hadoop,ecc. • Dalla versione 0.20.x:  Autenticazione tra servizi Kerberos  Autenticazione Web Console personalizzabile  HDFS Permessi ed ACL  Autenticazioni Token based per diminuire overhead  Possibile crittazione delle connessioni • Problemi ancora da risolvere  HDFS non crittato  Difficile integrazione in ambienti non Kerberos  Regole di autorizzazione non sufficientemente flessibili  Modello complessivo della sicurezza complicato • Intel Project Rhino
  41. 41. Estensioni Messina, 21/03/2015 Dario Catalano
  42. 42. HBase Messina, 21/03/2015 Dario Catalano • NoSql Datastore • Chiavi multidimensionali • Schema dinamico • Obiettivo : massime performance • Visione logica: Tabelle, righe, colonne e famiglie di colonne
  43. 43. HIVE Messina, 21/03/2015 Dario Catalano • Database • Data wharehouse e BI • Linguaggio dichiarativo • Tabelle -> File su HDFS • SQL-like query -> MapReduce • Tabelle (managed ed esterne), viste, partizioni, … = flessibilità organizzativa.
  44. 44. Pig Messina, 21/03/2015 Dario Catalano • Scripting • Data flow e Pipelining • ETL oriented • Linguaggio procedurale • LOAD, FILTER, JOIN, GROUP, STORE,… = controllo del dato step by step
  45. 45. HCatalog Messina, 21/03/2015 Dario Catalano • Integrazione di varie tecnologie Hadoop based (HIVE, Pig, MapReduce) • Astrazione per rendere uniformi BI e ETL • REST API
  46. 46. Sqoop Messina, 21/03/2015 Dario Catalano
  47. 47. HAMA Messina, 21/03/2015 Dario Catalano • Bulk Synchronous Parallel • Yarn-based • Fasi  Processing  Exchange Messages  Barrier Synchronization
  48. 48. Spark Messina, 21/03/2015 Dario Catalano • Obiettivo performance (10x più veloce di MR) • Scala based (Java, Scala, Python API) • Resilient Distributed Dataset (Scala Seq) • Hadoop, Mesos, Stand- alone
  49. 49. Mahout Messina, 21/03/2015 Dario Catalano • Machine Learning (IA)  Classificazione  Clusterizzazione  Fuzzy Logic  Neural Network  … • Data Mininig • 2 Fasi  Apprendimento  Applicazione
  50. 50. Bibliografia Messina, 21/03/2015 Dario Catalano • Libri  Pro Hadoop Second Edition, Sameer Wadkar, Madhu Siddalingaiah, Jason Venner , Apress  Hadoop: The Definitive Guide Third Edition, Tom White, O’ Reilly • Web  Apache Hadoop Official Site, https://hadoop.apache.org/  What is Hadoop?, http://www-01.ibm.com/software/data/infosphere/hadoop/  Cloudera, http://www.cloudera.com/  Introduzione ad Hadoop, https://paolobernardi.wordpress.com/2011/10/09/introduzione-ad- hadoop/  Introduction to Hadoop 2.0 and advantages of Hadoop 2.0, http://www.edureka.co/blog/introduction-to-hadoop-2-0-and-advantages-of-hadoop-2-0/  The New Hadoop API 0.20.x, http://sonerbalkir.blogspot.it/2010/01/new-hadoop-api-020x.html  Big Data Security: The Evolution of Hadoop’s Security Model, http://www.infoq.com/articles/HadoopSecurityModel

×