SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Adiós Hadoop
Hola Spark!
1	
  
@dhiguero
dhiguero@stratio.com
Daniel Higuero
•  Introducción
•  Spark
§  Conceptos básicos
§  Ecosistema
Agenda
2	
  
3	
  
VIEWER DISCRETION IS ADVISED
All	
  elephants	
  are	
  innocent	
  un3l	
  proven	
  guilty	
  in	
  a	
  
court	
  of	
  development	
  
Opinions	
  expressed	
  are	
  solely	
  my	
  own	
  and	
  do	
  not	
  express	
  the	
  views	
  or	
  opinions	
  of	
  my	
  employer.	
  
Introducción
4	
  
Timeline
#t3chfest2015 5	
  
2002	
   2003	
   2004	
   2005	
   2006	
   2007	
   2008	
   2009	
   2010	
   2011	
   2012	
   2013	
   2014	
   2015	
  
Google	
  
MapReduce	
  
paper	
  
Google	
  
GFS	
  paper	
  
Timeline
#t3chfest2015 6	
  
2002	
   2003	
   2004	
   2005	
   2006	
   2007	
   2008	
   2009	
   2010	
   2011	
   2012	
   2013	
   2014	
   2015	
  
Google	
  
MapReduce	
  
paper	
  
Google	
  
GFS	
  paper	
   Hive	
  
HBase	
  
Hadoop	
  1TB,	
  
910	
  nodes	
  <	
  4	
  
min	
  
Timeline
#t3chfest2015 7	
  
2002	
   2003	
   2004	
   2005	
   2006	
   2007	
   2008	
   2009	
   2010	
   2011	
   2012	
   2013	
   2014	
   2015	
  
Google	
  
MapReduce	
  
paper	
  
Google	
  
GFS	
  paper	
   Hive	
  
HBase	
  
Hadoop	
  1TB,	
  
910	
  nodes	
  <	
  4	
  
min	
  
alpha-­‐0.1	
  
Spark	
  0.7	
  
Timeline
#t3chfest2015 8	
  
2002	
   2003	
   2004	
   2005	
   2006	
   2007	
   2008	
   2009	
   2010	
   2011	
   2012	
   2013	
   2014	
   2015	
  
Google	
  
MapReduce	
  
paper	
  
Google	
  
GFS	
  paper	
   Hive	
  
HBase	
  
Hadoop	
  1TB,	
  
910	
  nodes	
  <	
  4	
  
min	
  
Hadoop	
  103	
  TB,	
  
2100	
  nodes,	
  72	
  
min	
  
alpha-­‐0.1	
  
Spark	
  0.7	
  
Timeline
#t3chfest2015 9	
  
2002	
   2003	
   2004	
   2005	
   2006	
   2007	
   2008	
   2009	
   2010	
   2011	
   2012	
   2013	
   2014	
   2015	
  
Google	
  
MapReduce	
  
paper	
  
Google	
  
GFS	
  paper	
   Hive	
  
HBase	
  
Hadoop	
  1TB,	
  
910	
  nodes	
  <	
  4	
  
min	
  
Spark	
  100	
  TB,	
  
206	
  nodes,	
  23	
  
min	
  
Hadoop	
  103	
  TB,	
  
2100	
  nodes,	
  72	
  
min	
  
alpha-­‐0.1	
  
Spark	
  0.7	
   Spark	
  1.2+	
  
o  ¿Qué es Spark?
o  Framework de procesamiento paralelo
o  Historia
Introducción
10	
  
https://spark.apache.org/
Apache	
  SoOware	
  Founda3on	
  
#t3chfest2015
o  Concepto de programación funcional
o  Popularizado por Google
Map-reduce
11	
  
(map	
  'list	
  (lambda	
  (x)	
  (+	
  x	
  10))	
  '(1	
  2	
  3	
  4))	
  
	
   	
   	
   	
   	
   	
   	
  	
  =>	
  (11	
  12	
  13	
  14)	
  
(reduce	
  #'+	
  '(1	
  2	
  3	
  4))	
  =>	
  10	
  
Jeff	
  Dean	
  and	
  Sanjay	
  Ghemawat.	
  "MapReduce:	
  Simplified	
  Data	
  
Processing	
  on	
  Large	
  Clusters."	
  OSDI	
  (2004)	
  
#t3chfest2015
Map-Reduce
12	
  
Input	
  data	
  
Map	
  
Map	
  
Map	
  
Map	
  
Reduce	
  
Reduce	
  
Reduce	
  
result	
  
#t3chfest2015
Map-Reduce
13	
  #t3chfest2015
val	
  wordCounts	
  =	
  textFile.flatMap(line	
  =>	
  line.split("	
  "))	
  
	
   	
   	
  .map(word	
  =>	
  (word,	
  1))	
  
	
   	
   	
  .reduceByKey(_	
  +	
  _)	
  
Apache	
  Spark	
  is	
  an	
  open-­‐source	
  cluster	
  compu3ng	
  
framework	
  originally	
  developed	
  in	
  the	
  AMPLab	
  at	
  UC	
  
Berkeley.	
  In	
  contrast	
  to	
  Hadoop's	
  two-­‐stage	
  disk-­‐
based	
  MapReduce	
  paradigm,	
  Spark's	
  in-­‐memory	
  
primi3ves	
  provide	
  performance	
  up	
  to	
  100	
  3mes	
  
faster	
  for	
  certain	
  applica3ons.	
  By	
  allowing	
  user	
  
programs	
  to	
  load	
  data	
  into	
  a	
  cluster's	
  memory	
  and	
  
query	
  it	
  repeatedly,	
  Spark	
  is	
  well	
  suited	
  to	
  machine	
  
learning	
  algorithms	
  
Array[String]	
  
Apache	
  
Spark	
  
is	
  
an	
  
open-­‐source	
  
cluster	
  
…	
  
Array[(String,	
  Int)]	
  
(Apache,	
  1)	
  
(Spark,	
  1)	
  
(is,	
  1)	
  
…	
  
(Spark,	
  1)	
  
(is,	
  1)	
  
…	
  
Array[(String,	
  Int)]	
  
(Apache,	
  1)	
  
(Spark,	
  2)	
  
(is,	
  2)	
  
…	
  
(to,	
  4)	
  
(the,	
  1)	
  
…	
  
Source:	
  Wikipedia	
  
o  Mayor flexibilidad en la definición de
transformaciones
o  Menor uso de almacenamiento en disco
o  Aprovechamiento de la memoria
o  Tolerancia a fallos
o  Tracción de la comunidad
Ventajas de Spark
14	
  #t3chfest2015
Conceptos básicos
15	
  
o  Abstracción básica en Spark
o  Contiene las transformaciones que se van a
realizar sobre un conjunto de datos
•  Inmutable
•  Lazy evaluation
•  En caso de fallo se puede recuperar el estado
•  Control de persistencia y particionado
RDD
16	
  #t3chfest2015
Ecosistema
17	
  
Ecosistema Spark
18	
  
©	
  databricks	
  
#t3chfest2015
o  Proporciona las abstracciones básicas y se
encarga del scheduling
Spark core engine
19	
  
RDD	
   DAG	
  Scheduling	
  
Cluster	
  
manager	
  
Threads	
  
Block	
  
manager	
  
Task	
  
scheduling	
  
Worker	
  
#t3chfest2015
o  Permite transformar una fuente streaming en
un conjunto de mini-batch
•  Definición de una ventana
§  Temporal
Spark Streaming
20	
  #t3chfest2015
Spark Streaming
21	
  
Window	
  =	
  5	
  
batch0	
   batch1	
   batch2	
   batch3	
   batch4	
   batch5	
   batch6	
   batch7	
  
3empo	
  
3empo	
  
#t3chfest2015
o  Librería para Machine Learning
o  Abstracciones útiles para cómputo
o  Vectores, Matrices dispersas
o  Implementación de algoritmos conocidos
o  Clasificación, regresión, collaborative
filtering y clustering
MLlib
22	
  #t3chfest2015
o  Capa de acceso SQL para ejecutar
operaciones sobre RDD
o  DataFrame (antes SchemaRDD)
SparkSQL
23	
  
val	
  people	
  =	
  sqlContext.parquetFile("...")	
  
val	
  department	
  =	
  sqlContext.parquetFile("...")	
  
people.filter("age"	
  >	
  30)	
  
	
  	
  	
  .join(department,	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  people("deptId")	
  ===	
  department("id"))	
  	
  
	
  	
  	
  .groupBy(department("name"),	
  "gender”)	
  
	
  	
  	
  	
   ©	
  databricks	
  
#t3chfest2015
Primeros pasos
24	
  
$	
  wget	
  http://www.apache.org/.../spark-­‐1.2.0-­‐bin-­‐hadoop2.4.tgz	
  
$	
  tar	
  xvzf	
  spark-­‐1.2.0-­‐bin-­‐hadoop2.4.tgz	
  
$	
  cd	
  spark-­‐1.2.0-­‐bin-­‐hadoop2.4	
  
$	
  cp	
  conf/spark-­‐env.sh.template	
  conf/spark-­‐env.sh	
  
$	
  ./bin/spark-­‐shell	
  
$	
  ./bin/spark-­‐shell	
  
…	
  
15/02/09	
  15:47:50	
  INFO	
  HttpServer:	
  Starting	
  HTTP	
  Server	
  
15/02/09	
  15:47:50	
  INFO	
  Utils:	
  Successfully	
  started	
  service	
  'HTTP	
  class	
  server'	
  on	
  port	
  60416.	
  
Welcome	
  to	
  
	
  	
  	
  	
  	
  	
  ____	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  __	
  
	
  	
  	
  	
  	
  /	
  __/__	
  	
  ___	
  _____/	
  /__	
  
	
  	
  	
  	
  _	
  /	
  _	
  /	
  _	
  `/	
  __/	
  	
  '_/	
  
	
  	
  	
  /___/	
  .__/_,_/_/	
  /_/_	
  	
  	
  version	
  1.2.0	
  
	
  	
  	
  	
  	
  	
  /_/	
  
	
  
Using	
  Scala	
  version	
  2.10.4	
  (Java	
  HotSpot(TM)	
  64-­‐Bit	
  Server	
  VM,	
  Java	
  1.7.0_71)	
  
Type	
  in	
  expressions	
  to	
  have	
  them	
  evaluated.	
  
scala>	
  	
  
hep://localhost:4040	
  
#t3chfest2015
25	
  
WE ARE HIRING!
Java
Scala
Ping
pong
Nerf
Big
Data
Spark
Hadoop
Cassandra
MongoDB
NoSQL
Passion
BIG DATA
CHILD`S PLAY
@dhiguero
dhiguero@stratio.com
Daniel Higuero
Acknowledgements: This work has been partially funded by
the Spanish Ministry of Economy and Competitiveness under
grant PTQ-13-05997

Contenu connexe

Tendances

Cassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and Spark
Cassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and SparkCassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and Spark
Cassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and SparkDataStax Academy
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet odsc
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkEvan Chan
 
Cassandra spark connector
Cassandra spark connectorCassandra spark connector
Cassandra spark connectorDuyhai Doan
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSpark Summit
 
Apache spark linkedin
Apache spark linkedinApache spark linkedin
Apache spark linkedinYukti Kaura
 
Lightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkLightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkManish Gupta
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideWhizlabs
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupFrens Jan Rumph
 
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on TutorialsSparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on TutorialsDatabricks
 
Wisely Chen Spark Talk At Spark Gathering in Taiwan
Wisely Chen Spark Talk At Spark Gathering in Taiwan Wisely Chen Spark Talk At Spark Gathering in Taiwan
Wisely Chen Spark Talk At Spark Gathering in Taiwan Wisely chen
 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014cdmaxime
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and SparkEvan Chan
 
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Spark Summit
 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overviewMartin Zapletal
 

Tendances (20)

Cassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and Spark
Cassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and SparkCassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and Spark
Cassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and Spark
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Spark tutorial
Spark tutorialSpark tutorial
Spark tutorial
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
Cassandra spark connector
Cassandra spark connectorCassandra spark connector
Cassandra spark connector
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
 
Apache spark linkedin
Apache spark linkedinApache spark linkedin
Apache spark linkedin
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
Lightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkLightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache Spark
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark Meetup
 
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on TutorialsSparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
 
Wisely Chen Spark Talk At Spark Gathering in Taiwan
Wisely Chen Spark Talk At Spark Gathering in Taiwan Wisely Chen Spark Talk At Spark Gathering in Taiwan
Wisely Chen Spark Talk At Spark Gathering in Taiwan
 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and Spark
 
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overview
 

En vedette

¿Por que cambiar de Apache Hadoop a Apache Spark?
¿Por que cambiar de Apache Hadoop a Apache Spark?¿Por que cambiar de Apache Hadoop a Apache Spark?
¿Por que cambiar de Apache Hadoop a Apache Spark?Socialmetrix
 
Tutorial en Apache Spark - Clasificando tweets en realtime
Tutorial en Apache Spark - Clasificando tweets en realtimeTutorial en Apache Spark - Clasificando tweets en realtime
Tutorial en Apache Spark - Clasificando tweets en realtimeSocialmetrix
 
Introduccion a Apache Spark
Introduccion a Apache SparkIntroduccion a Apache Spark
Introduccion a Apache SparkGustavo Arjones
 
Introducción a Apache Spark a través de un caso de uso cotidiano
Introducción a Apache Spark a través de un caso de uso cotidianoIntroducción a Apache Spark a través de un caso de uso cotidiano
Introducción a Apache Spark a través de un caso de uso cotidianoSocialmetrix
 
Revelando los secretos de twitter, Festival de Software Libre 2014
Revelando los secretos de twitter, Festival de Software Libre 2014Revelando los secretos de twitter, Festival de Software Libre 2014
Revelando los secretos de twitter, Festival de Software Libre 2014Abel Alejandro Coronado Iruegas
 
7 Disparadores de Engagement para o mercado de consumo massivo
7 Disparadores de Engagement para o mercado de consumo massivo7 Disparadores de Engagement para o mercado de consumo massivo
7 Disparadores de Engagement para o mercado de consumo massivoSocialmetrix
 
Bases de Datos NoSQL - Riak
Bases de Datos NoSQL - Riak Bases de Datos NoSQL - Riak
Bases de Datos NoSQL - Riak Andrei Amador
 
Primeros pasos con Apache Spark - Madrid Meetup
Primeros pasos con Apache Spark - Madrid MeetupPrimeros pasos con Apache Spark - Madrid Meetup
Primeros pasos con Apache Spark - Madrid Meetupdhiguero
 
Guia practica de_gestion_de_riesgos
Guia practica de_gestion_de_riesgosGuia practica de_gestion_de_riesgos
Guia practica de_gestion_de_riesgosMM CO
 
24 HOP edición Español - Machine learning - Cesar Oviedo
24 HOP edición Español - Machine learning - Cesar Oviedo24 HOP edición Español - Machine learning - Cesar Oviedo
24 HOP edición Español - Machine learning - Cesar OviedoSpanishPASSVC
 
Technological pillars to enable Smarter (Collaborative + Inclusive) Environme...
Technological pillars to enable Smarter (Collaborative + Inclusive) Environme...Technological pillars to enable Smarter (Collaborative + Inclusive) Environme...
Technological pillars to enable Smarter (Collaborative + Inclusive) Environme...Diego López-de-Ipiña González-de-Artaza
 
09 gestion de los riesgos
09 gestion de los riesgos09 gestion de los riesgos
09 gestion de los riesgosRuben Rodriguez
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Julien Le Dem
 

En vedette (20)

¿Por que cambiar de Apache Hadoop a Apache Spark?
¿Por que cambiar de Apache Hadoop a Apache Spark?¿Por que cambiar de Apache Hadoop a Apache Spark?
¿Por que cambiar de Apache Hadoop a Apache Spark?
 
Tutorial en Apache Spark - Clasificando tweets en realtime
Tutorial en Apache Spark - Clasificando tweets en realtimeTutorial en Apache Spark - Clasificando tweets en realtime
Tutorial en Apache Spark - Clasificando tweets en realtime
 
Introduccion a Apache Spark
Introduccion a Apache SparkIntroduccion a Apache Spark
Introduccion a Apache Spark
 
Spark Hands-on
Spark Hands-onSpark Hands-on
Spark Hands-on
 
Introducción a Apache Spark a través de un caso de uso cotidiano
Introducción a Apache Spark a través de un caso de uso cotidianoIntroducción a Apache Spark a través de un caso de uso cotidiano
Introducción a Apache Spark a través de un caso de uso cotidiano
 
Revelando los secretos de twitter, Festival de Software Libre 2014
Revelando los secretos de twitter, Festival de Software Libre 2014Revelando los secretos de twitter, Festival de Software Libre 2014
Revelando los secretos de twitter, Festival de Software Libre 2014
 
Hypertable Nosql
Hypertable NosqlHypertable Nosql
Hypertable Nosql
 
7 Disparadores de Engagement para o mercado de consumo massivo
7 Disparadores de Engagement para o mercado de consumo massivo7 Disparadores de Engagement para o mercado de consumo massivo
7 Disparadores de Engagement para o mercado de consumo massivo
 
Bases de Datos NoSQL - Riak
Bases de Datos NoSQL - Riak Bases de Datos NoSQL - Riak
Bases de Datos NoSQL - Riak
 
Cloud or not to Cloud? That’s the question Businesses need an answer for!
Cloud or not to Cloud? That’s the question Businesses need an answer for!Cloud or not to Cloud? That’s the question Businesses need an answer for!
Cloud or not to Cloud? That’s the question Businesses need an answer for!
 
Primeros pasos con Apache Spark - Madrid Meetup
Primeros pasos con Apache Spark - Madrid MeetupPrimeros pasos con Apache Spark - Madrid Meetup
Primeros pasos con Apache Spark - Madrid Meetup
 
Guia practica de_gestion_de_riesgos
Guia practica de_gestion_de_riesgosGuia practica de_gestion_de_riesgos
Guia practica de_gestion_de_riesgos
 
24 HOP edición Español - Machine learning - Cesar Oviedo
24 HOP edición Español - Machine learning - Cesar Oviedo24 HOP edición Español - Machine learning - Cesar Oviedo
24 HOP edición Español - Machine learning - Cesar Oviedo
 
Big data big opportunities
Big data big opportunitiesBig data big opportunities
Big data big opportunities
 
Technological pillars to enable Smarter (Collaborative + Inclusive) Environme...
Technological pillars to enable Smarter (Collaborative + Inclusive) Environme...Technological pillars to enable Smarter (Collaborative + Inclusive) Environme...
Technological pillars to enable Smarter (Collaborative + Inclusive) Environme...
 
Curso Cloud Computing, Parte 1: Amazon Web Services
Curso Cloud Computing, Parte 1: Amazon Web ServicesCurso Cloud Computing, Parte 1: Amazon Web Services
Curso Cloud Computing, Parte 1: Amazon Web Services
 
Cloud Computing: una perspectiva tecnológica
Cloud Computing: una perspectiva tecnológicaCloud Computing: una perspectiva tecnológica
Cloud Computing: una perspectiva tecnológica
 
09 gestion de los riesgos
09 gestion de los riesgos09 gestion de los riesgos
09 gestion de los riesgos
 
MongoDB: la BBDD NoSQL más popular del mercado
MongoDB: la BBDD NoSQL más popular del mercadoMongoDB: la BBDD NoSQL más popular del mercado
MongoDB: la BBDD NoSQL más popular del mercado
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
 

Similaire à Adios hadoop, Hola Spark! T3chfest 2015

Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupNed Shawa
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in SparkDatabricks
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersDatabricks
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksDatabricks
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Michael Rys
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkC4Media
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
Apache Spark and DataStax Enablement
Apache Spark and DataStax EnablementApache Spark and DataStax Enablement
Apache Spark and DataStax EnablementVincent Poncet
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Anant Corporation
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache SparkAmir Sedighi
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introductionHektor Jacynycz García
 
«Почему Spark отнюдь не так хорош»
«Почему Spark отнюдь не так хорош»«Почему Spark отнюдь не так хорош»
«Почему Spark отнюдь не так хорош»Olga Lavrentieva
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Djamel Zouaoui
 
Scala Meetup Hamburg - Spark
Scala Meetup Hamburg - SparkScala Meetup Hamburg - Spark
Scala Meetup Hamburg - SparkIvan Morozov
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkCloudera, Inc.
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Helena Edelson
 
Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkCleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkVince Gonzalez
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Databricks
 
Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016Holden Karau
 

Similaire à Adios hadoop, Hola Spark! T3chfest 2015 (20)

Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Apache Spark and DataStax Enablement
Apache Spark and DataStax EnablementApache Spark and DataStax Enablement
Apache Spark and DataStax Enablement
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introduction
 
«Почему Spark отнюдь не так хорош»
«Почему Spark отнюдь не так хорош»«Почему Spark отнюдь не так хорош»
«Почему Spark отнюдь не так хорош»
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Scala Meetup Hamburg - Spark
Scala Meetup Hamburg - SparkScala Meetup Hamburg - Spark
Scala Meetup Hamburg - Spark
 
Big data clustering
Big data clusteringBig data clustering
Big data clustering
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkCleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - Spark
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
 
Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016
 

Dernier

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 

Dernier (20)

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 

Adios hadoop, Hola Spark! T3chfest 2015

  • 1. Adiós Hadoop Hola Spark! 1   @dhiguero dhiguero@stratio.com Daniel Higuero
  • 2. •  Introducción •  Spark §  Conceptos básicos §  Ecosistema Agenda 2  
  • 3. 3   VIEWER DISCRETION IS ADVISED All  elephants  are  innocent  un3l  proven  guilty  in  a   court  of  development   Opinions  expressed  are  solely  my  own  and  do  not  express  the  views  or  opinions  of  my  employer.  
  • 5. Timeline #t3chfest2015 5   2002   2003   2004   2005   2006   2007   2008   2009   2010   2011   2012   2013   2014   2015   Google   MapReduce   paper   Google   GFS  paper  
  • 6. Timeline #t3chfest2015 6   2002   2003   2004   2005   2006   2007   2008   2009   2010   2011   2012   2013   2014   2015   Google   MapReduce   paper   Google   GFS  paper   Hive   HBase   Hadoop  1TB,   910  nodes  <  4   min  
  • 7. Timeline #t3chfest2015 7   2002   2003   2004   2005   2006   2007   2008   2009   2010   2011   2012   2013   2014   2015   Google   MapReduce   paper   Google   GFS  paper   Hive   HBase   Hadoop  1TB,   910  nodes  <  4   min   alpha-­‐0.1   Spark  0.7  
  • 8. Timeline #t3chfest2015 8   2002   2003   2004   2005   2006   2007   2008   2009   2010   2011   2012   2013   2014   2015   Google   MapReduce   paper   Google   GFS  paper   Hive   HBase   Hadoop  1TB,   910  nodes  <  4   min   Hadoop  103  TB,   2100  nodes,  72   min   alpha-­‐0.1   Spark  0.7  
  • 9. Timeline #t3chfest2015 9   2002   2003   2004   2005   2006   2007   2008   2009   2010   2011   2012   2013   2014   2015   Google   MapReduce   paper   Google   GFS  paper   Hive   HBase   Hadoop  1TB,   910  nodes  <  4   min   Spark  100  TB,   206  nodes,  23   min   Hadoop  103  TB,   2100  nodes,  72   min   alpha-­‐0.1   Spark  0.7   Spark  1.2+  
  • 10. o  ¿Qué es Spark? o  Framework de procesamiento paralelo o  Historia Introducción 10   https://spark.apache.org/ Apache  SoOware  Founda3on   #t3chfest2015
  • 11. o  Concepto de programación funcional o  Popularizado por Google Map-reduce 11   (map  'list  (lambda  (x)  (+  x  10))  '(1  2  3  4))                  =>  (11  12  13  14)   (reduce  #'+  '(1  2  3  4))  =>  10   Jeff  Dean  and  Sanjay  Ghemawat.  "MapReduce:  Simplified  Data   Processing  on  Large  Clusters."  OSDI  (2004)   #t3chfest2015
  • 12. Map-Reduce 12   Input  data   Map   Map   Map   Map   Reduce   Reduce   Reduce   result   #t3chfest2015
  • 13. Map-Reduce 13  #t3chfest2015 val  wordCounts  =  textFile.flatMap(line  =>  line.split("  "))        .map(word  =>  (word,  1))        .reduceByKey(_  +  _)   Apache  Spark  is  an  open-­‐source  cluster  compu3ng   framework  originally  developed  in  the  AMPLab  at  UC   Berkeley.  In  contrast  to  Hadoop's  two-­‐stage  disk-­‐ based  MapReduce  paradigm,  Spark's  in-­‐memory   primi3ves  provide  performance  up  to  100  3mes   faster  for  certain  applica3ons.  By  allowing  user   programs  to  load  data  into  a  cluster's  memory  and   query  it  repeatedly,  Spark  is  well  suited  to  machine   learning  algorithms   Array[String]   Apache   Spark   is   an   open-­‐source   cluster   …   Array[(String,  Int)]   (Apache,  1)   (Spark,  1)   (is,  1)   …   (Spark,  1)   (is,  1)   …   Array[(String,  Int)]   (Apache,  1)   (Spark,  2)   (is,  2)   …   (to,  4)   (the,  1)   …   Source:  Wikipedia  
  • 14. o  Mayor flexibilidad en la definición de transformaciones o  Menor uso de almacenamiento en disco o  Aprovechamiento de la memoria o  Tolerancia a fallos o  Tracción de la comunidad Ventajas de Spark 14  #t3chfest2015
  • 16. o  Abstracción básica en Spark o  Contiene las transformaciones que se van a realizar sobre un conjunto de datos •  Inmutable •  Lazy evaluation •  En caso de fallo se puede recuperar el estado •  Control de persistencia y particionado RDD 16  #t3chfest2015
  • 18. Ecosistema Spark 18   ©  databricks   #t3chfest2015
  • 19. o  Proporciona las abstracciones básicas y se encarga del scheduling Spark core engine 19   RDD   DAG  Scheduling   Cluster   manager   Threads   Block   manager   Task   scheduling   Worker   #t3chfest2015
  • 20. o  Permite transformar una fuente streaming en un conjunto de mini-batch •  Definición de una ventana §  Temporal Spark Streaming 20  #t3chfest2015
  • 21. Spark Streaming 21   Window  =  5   batch0   batch1   batch2   batch3   batch4   batch5   batch6   batch7   3empo   3empo   #t3chfest2015
  • 22. o  Librería para Machine Learning o  Abstracciones útiles para cómputo o  Vectores, Matrices dispersas o  Implementación de algoritmos conocidos o  Clasificación, regresión, collaborative filtering y clustering MLlib 22  #t3chfest2015
  • 23. o  Capa de acceso SQL para ejecutar operaciones sobre RDD o  DataFrame (antes SchemaRDD) SparkSQL 23   val  people  =  sqlContext.parquetFile("...")   val  department  =  sqlContext.parquetFile("...")   people.filter("age"  >  30)        .join(department,                    people("deptId")  ===  department("id"))          .groupBy(department("name"),  "gender”)           ©  databricks   #t3chfest2015
  • 24. Primeros pasos 24   $  wget  http://www.apache.org/.../spark-­‐1.2.0-­‐bin-­‐hadoop2.4.tgz   $  tar  xvzf  spark-­‐1.2.0-­‐bin-­‐hadoop2.4.tgz   $  cd  spark-­‐1.2.0-­‐bin-­‐hadoop2.4   $  cp  conf/spark-­‐env.sh.template  conf/spark-­‐env.sh   $  ./bin/spark-­‐shell   $  ./bin/spark-­‐shell   …   15/02/09  15:47:50  INFO  HttpServer:  Starting  HTTP  Server   15/02/09  15:47:50  INFO  Utils:  Successfully  started  service  'HTTP  class  server'  on  port  60416.   Welcome  to              ____                            __            /  __/__    ___  _____/  /__          _  /  _  /  _  `/  __/    '_/        /___/  .__/_,_/_/  /_/_      version  1.2.0              /_/     Using  Scala  version  2.10.4  (Java  HotSpot(TM)  64-­‐Bit  Server  VM,  Java  1.7.0_71)   Type  in  expressions  to  have  them  evaluated.   scala>     hep://localhost:4040   #t3chfest2015
  • 25. 25   WE ARE HIRING! Java Scala Ping pong Nerf Big Data Spark Hadoop Cassandra MongoDB NoSQL Passion
  • 26. BIG DATA CHILD`S PLAY @dhiguero dhiguero@stratio.com Daniel Higuero Acknowledgements: This work has been partially funded by the Spanish Ministry of Economy and Competitiveness under grant PTQ-13-05997