SlideShare une entreprise Scribd logo
1  sur  22
HUG France - 22 Sept 2014 - @Criteo 
Data Management platform for Hadoop 
Cédric Carbone, Talend CTO 
@carbone 
Jean-Baptiste Onofré, Falcon Committer 
@jbonofre 
Ce support est mis à disposition selon les termes de la Licence Creative 
Commons Attribution - Pas d’Utilisation Commerciale - Pas de Modification 2.0 
France. - http://creativecommons.org/licenses/by-nc-nd/2.0/fr/ 
© Talend 2014 1
Overview 
• Falcon is a Data Management solution for Hadoop 
• Falcon in production at InMobi since 2012 
• InMobi gave Falcon to ASF in April 2013 
• Falcon is in Apache incubation 
• Falcon embedded per default inside HDP 
• Falcon leverages a lot of Apache components 
- Oozie, Ambari, ActiveMQ, HCat, Sqoop… 
• Committer/PPMC/IPMC: 
- #8 InMobi 
- #5 Hortonworks 
- #1 Talend 
© Talend 2014 2
Why Falcon? 
© Talend 2014 3
Why Falcon? 
© Talend 2014 4
What is Falcon? 
• Data Motion Import, Export, CDC 
• Policy-based Lifecycle Management Retention, Replication, 
Archival, Anonymization of PII data 
• Process orchestration and scheduling Late data handling, 
reprocessing, dependency checking, etc. Multi-cluster management to 
support Local/Global Aggregations, Rollups, etc. 
• Data Governance Lineage, Audit, SLA 
© Talend 2014 5
Falcon - The Solution! 
• Introduces a higher layer of abstraction – Data Set Decouples a 
data location and its properties from workflows Understanding the life-time 
of a feed will allow for implicit validation of the processing rules 
• Provides the key services for data processing apps Common data 
services are simple directives, No need to define them verbosely in 
each job Allows process owners to keep their processing specific to 
their application logic Sits in the execution path, intercepts to handle 
OOB data / retries etc. 
• Promotes Polyglot Programming Does not do any heavy lifting but 
delegates to tools with in the Hadoop ecosystem 
© Talend 2014 6
Falcon Basic Concepts : Data Pipelines 
• Cluster: : Represents the Hadoop cluster 
• Feed: Defines a “dataset” 
• Process: Consumes feeds, invokes processing logic & produces feeds 
• Entity Actions: submit, list, dependency, schedule, suspend, resume, status, 
definition, delete, update 
© Talend 2014 7
Falcon Entity Relationships 
© Talend 2014 8
Cluster Entity 
<?xml version="1.0"?> 
<cluster colo=”talend-datacenter" description="" name=”prod-cluster"> 
<interfaces> 
Needed by distcp for replications 
Writing to HDFS 
<interface type="readonly" endpoint="hftp://nn:50070" version="2.2.0" /> 
<interface type="write" endpoint="hdfs://nn:8020" version="2.2.0" /> 
<interface type="execute" endpoint=”rm:8050" version="2.2.0" /> 
<interface type="workflow" endpoint="http://os:11000/oozie/" version="4.0.0" /> 
<interface type=”registry" endpoint=”thrift://hms:9083" version=”0.12.0" /> 
<interface type="messaging" endpoint="tcp://mq:61616?daemon=true" version="5.1.6" /> 
</interfaces> 
<locations> 
Used to submit processes as MR 
Submit Oozie jobs 
<location name="staging" path="/apps/falcon/prod-cluster/staging" /> 
<location name="temp" path="/tmp" /> 
<location name="working" path="/apps/falcon/prod-cluster/working" /> 
</locations> 
</cluster> 
Hive metastore to register/ 
deregister partitions and 
get events on partition availability 
Used For alerts 
HDFS directories used by Falcon server 
© Talend 2014 9
Feed Entity 
Feed run frequency in mins/hrs/days/mths 
<?xml version="1.0"?> 
<feed description=“" name=”testFeed” xmlns="uri:falcon:feed:0.1"> 
<frequency>hours(1)</frequency> 
<late-arrival cut-off="hours(6)”/> 
<groups>churnAnalysisFeeds</groups> 
<clusters> 
Late arrival cutoff 
<cluster name=”cluster-primary" type="source"> 
Feeds can belong to multiple groups 
<validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/> 
<retention limit="days(2)" action="delete"/> 
One or more source & target clusters for retention & replication 
</cluster> 
<cluster name=”cluster-secondary" type="target"> 
<validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/> 
<location type="data” path="/churn/weblogs/${YEAR}-${MONTH}-${DAY}-${HOUR} "/> 
<retention limit=”days(7)" action="delete"/> 
</cluster> 
</clusters> 
<locations> 
Global location across clusters - HDFS paths or Hive tables 
<location type="data” path="/weblogs/${YEAR}-${MONTH}-${DAY}-${HOUR} "/> 
</locations> 
<ACL owner=”hdfs" group="users" permission="0755"/> 
<schema location="/none" provider="none"/> 
</feed> 
Access Permissions 
© Talend 2014 10
Process Entity 
<process name="process-test" xmlns="uri:falcon:process:0.1”> 
<clusters> 
<cluster name="cluster-primary"> 
Which cluster should the process run on and when 
<validity start="2011-11-02T00:00Z" end="2011-12-30T00:00Z" /> 
</cluster> 
How frequently does the process run , how many instances can be run in parallel and in what order 
</clusters> 
<parallel>1</parallel> 
<order>FIFO</order> 
<frequency>days(1)</frequency> 
<inputs> 
<input end="today(0,0)" start="today(0,0)" feed="feed-clicks-raw" name="input" /> 
</inputs> 
<outputs> 
Input & output feeds for process 
<output instance="now(0,2)" feed="feed-clicks-clean" name="output" /> 
</outputs> 
<workflow engine="pig" path="/apps/clickstream/clean-script.pig" /> 
<retry policy="periodic" delay="minutes(10)" attempts="3"/> 
<late-process policy="exp-backoff" delay="hours(1)"> 
<late-input input="input" workflow-path="/apps/clickstream/late" /> 
</late-process> 
</process> 
The processing logic. 
Retry policy on failure Handling late input feeds 
© Talend 2014 11
High Level Architecture 
© Talend 2014 12
Demo #1 : my first feed 
• Start Falcon on a single cluster 
• Submission of one simple feed 
• Check into Oozie the generated job 
© Talend 2014 13
Replication & Retention 
• Sophisticated retention policies expressed in one place 
• Simplify data retention for audit, compliance, or for data re-processing 
© Talend 2014 14
Demo #2: processes notification 
 Motivation: being able to be notified about processes activity “outside” 
of the cluster 
 Falcon manages workflow, send JMS notification, and a Camel route 
react to the notification 
 Result: trigger an action (Camel route) when a notification is sent by 
Falcon (eviction, late-arrival, process execution, …) 
 Mix BigData/Hadoop and ESB technologies 
© Talend 2014 15
Demo #2: workflow 
Falcon ActiveMQ 
Camel 
2014-03-19 11:25:43,273 | INFO | LCON.my-process] | process-listener | rg.apache.camel.util.CamelLogger 
176 | 74 - org.apache.camel.camel-core - 2.13.0.SNAPSHOT | Exchange[ExchangePattern: InOnly, BodyType: 
java.util.HashMap, Body: {brokerUrl=tcp://localhost:61616, timeStamp=2014-03-19T10:24Z, status=SUCCEEDED, 
logFile=hdfs://localhost:8020/falcon/staging/falcon/workflows/process/my-process/logs/instancePaths-2013-11-15-06- 
05.csv, feedNames=output, runId=0, entityType=process, nominalTime=2013-11-15T06:05Z, brokerTTL=4320, 
workflowUser=null, entityName=my-process, feedInstancePaths=hdfs://localhost:8020/data/output, 
operation=GENERATE, logDir=null, workflowId=0000026-140319105443372-oozie-jbon-W, cluster=local, 
brokerImplClass=org.apache.activemq.ActiveMQConnectionFactory, topicName=FALCON.my-process}] 
© Talend 2014 16
Demo #2 : Data Notification 
• All notification will be in ActiveMQ 
• Subscribers : Camel routes 
• Add some files and see the notification system working! 
• http://blog.nanthrax.net/2014/03/hadoop-cdc-and-processes-notification-with- 
apache-falcon-apache-activemq-and-apache-camel/ 
© Talend 2014 17
Data Pipeline Tracing 
© Talend 2014 18
Topologies 
• STANDALONE 
– Single Data Center 
– Single Falcon Server 
– Hadoop jobs and relevant processing involves only one cluster 
• DISTRIBUTED 
– Multiple Data Centers 
– Falcon Server per DC 
– Multiple instances of hadoop clusters and workflow schedulers 
© Talend 2014 19
Multi cluster failover 
 Motivation: replicate subset of data from one cluster to another, 
guarantee the different eviction depending of the data subset 
 Falcon manages workflow on primary cluster, and replication on failover 
cluster 
 Result: support business continuity without requiring full data 
reprocessing 
© Talend 2014 20
Roadmap 
 Improvement on the late-arrival messages in FALCON.ENTITY.TOPIC 
(ActiveMQ) 
 Feed implicit processes (no need of process) providing “native” CDC 
 Straight forward MR usage (without pig, oozie workflow XML, …) 
 More data acquisition 
 Monitoring/Management/Designer dashboard 
 TLP !!! 
© Talend 2014 21
Questions? 
Data Management platform for Hadoop 
Cédric Carbone, Talend CTO 
@carbone 
Jean-Baptiste Onofré, Falcon Committer 
@jbonofre 
Ce support est mis à disposition selon les termes de la Licence Creative 
Commons Attribution - Pas d’Utilisation Commerciale - Pas de Modification 2.0 
France. - http://creativecommons.org/licenses/by-nc-nd/2.0/fr/ 
© Talend 2014 22

Contenu connexe

Tendances

Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - finalHortonworks
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Hortonworks
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentDataWorks Summit/Hadoop Summit
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopHortonworks
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNDataWorks Summit
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksData Con LA
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitDataWorks Summit
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark Hortonworks
 
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleManaging Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleDataWorks Summit/Hadoop Summit
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsDataWorks Summit/Hadoop Summit
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureDataWorks Summit
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDataWorks Summit
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...DataWorks Summit
 

Tendances (20)

Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - final
 
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage SubsystemEvolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
 
Toward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFSToward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFS
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service Deployment
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleManaging Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
IoT:what about data storage?
IoT:what about data storage?IoT:what about data storage?
IoT:what about data storage?
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 

En vedette

Big Data Paris: Etude de Cas: KPMG, l’innovation continue grâce au Data Lake ...
Big Data Paris: Etude de Cas: KPMG, l’innovation continue grâce au Data Lake ...Big Data Paris: Etude de Cas: KPMG, l’innovation continue grâce au Data Lake ...
Big Data Paris: Etude de Cas: KPMG, l’innovation continue grâce au Data Lake ...MongoDB
 
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopApache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopDataWorks Summit
 
Big Data Paris - Air France: Stratégie BigData et Use Cases
Big Data Paris - Air France: Stratégie BigData et Use CasesBig Data Paris - Air France: Stratégie BigData et Use Cases
Big Data Paris - Air France: Stratégie BigData et Use CasesMongoDB
 
Tiered Compilation in Hotspot JVM
Tiered Compilation in Hotspot JVMTiered Compilation in Hotspot JVM
Tiered Compilation in Hotspot JVMIgor Veresov
 
Uccellini Uccellacci
Uccellini UccellacciUccellini Uccellacci
Uccellini Uccellaccifedericoneri
 
Das Prinzip Coworking - Coworking in Köln
Das Prinzip Coworking - Coworking in KölnDas Prinzip Coworking - Coworking in Köln
Das Prinzip Coworking - Coworking in KölnPeter Schreck
 
Liebe Dein Brot / Love your bread
Liebe Dein Brot / Love your breadLiebe Dein Brot / Love your bread
Liebe Dein Brot / Love your breadPeter Schreck
 
Speech videogiochi-pistoia
Speech videogiochi-pistoiaSpeech videogiochi-pistoia
Speech videogiochi-pistoiaLuca De Biase
 
Moviendo Las IDEAS
Moviendo Las IDEASMoviendo Las IDEAS
Moviendo Las IDEAScaaraya
 
Os Rrcc As Bases Do Estado Moderno
Os Rrcc As Bases Do Estado ModernoOs Rrcc As Bases Do Estado Moderno
Os Rrcc As Bases Do Estado ModernoAulagalicia Hxg
 
Il treno della vita
Il treno della vitaIl treno della vita
Il treno della vitafedericoneri
 
Five Attributes to a Successful Big Data Strategy
Five Attributes to a Successful Big Data StrategyFive Attributes to a Successful Big Data Strategy
Five Attributes to a Successful Big Data StrategyPerficient, Inc.
 
Meet the experts dwo bde vds v7
Meet the experts dwo bde vds v7Meet the experts dwo bde vds v7
Meet the experts dwo bde vds v7mmathipra
 
2015 Find You - Fuzzy Big Data - A Case Study
2015 Find You - Fuzzy Big Data - A Case Study2015 Find You - Fuzzy Big Data - A Case Study
2015 Find You - Fuzzy Big Data - A Case StudyPhilip Topham
 
Power Big Data Analytics with Informatica Cloud Integration for Redshift, Kin...
Power Big Data Analytics with Informatica Cloud Integration for Redshift, Kin...Power Big Data Analytics with Informatica Cloud Integration for Redshift, Kin...
Power Big Data Analytics with Informatica Cloud Integration for Redshift, Kin...Amazon Web Services
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAmazon Web Services
 

En vedette (20)

Big Data Paris: Etude de Cas: KPMG, l’innovation continue grâce au Data Lake ...
Big Data Paris: Etude de Cas: KPMG, l’innovation continue grâce au Data Lake ...Big Data Paris: Etude de Cas: KPMG, l’innovation continue grâce au Data Lake ...
Big Data Paris: Etude de Cas: KPMG, l’innovation continue grâce au Data Lake ...
 
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopApache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
 
Big Data Paris - Air France: Stratégie BigData et Use Cases
Big Data Paris - Air France: Stratégie BigData et Use CasesBig Data Paris - Air France: Stratégie BigData et Use Cases
Big Data Paris - Air France: Stratégie BigData et Use Cases
 
Tiered Compilation in Hotspot JVM
Tiered Compilation in Hotspot JVMTiered Compilation in Hotspot JVM
Tiered Compilation in Hotspot JVM
 
Uccellini Uccellacci
Uccellini UccellacciUccellini Uccellacci
Uccellini Uccellacci
 
Das Prinzip Coworking - Coworking in Köln
Das Prinzip Coworking - Coworking in KölnDas Prinzip Coworking - Coworking in Köln
Das Prinzip Coworking - Coworking in Köln
 
Piringundines
PiringundinesPiringundines
Piringundines
 
Liebe Dein Brot / Love your bread
Liebe Dein Brot / Love your breadLiebe Dein Brot / Love your bread
Liebe Dein Brot / Love your bread
 
Speech videogiochi-pistoia
Speech videogiochi-pistoiaSpeech videogiochi-pistoia
Speech videogiochi-pistoia
 
Coworking Germany
Coworking Germany Coworking Germany
Coworking Germany
 
Moviendo Las IDEAS
Moviendo Las IDEASMoviendo Las IDEAS
Moviendo Las IDEAS
 
Os Rrcc As Bases Do Estado Moderno
Os Rrcc As Bases Do Estado ModernoOs Rrcc As Bases Do Estado Moderno
Os Rrcc As Bases Do Estado Moderno
 
ESKibana
ESKibanaESKibana
ESKibana
 
Il treno della vita
Il treno della vitaIl treno della vita
Il treno della vita
 
Big Data - uma visão executiva
Big Data - uma visão executivaBig Data - uma visão executiva
Big Data - uma visão executiva
 
Five Attributes to a Successful Big Data Strategy
Five Attributes to a Successful Big Data StrategyFive Attributes to a Successful Big Data Strategy
Five Attributes to a Successful Big Data Strategy
 
Meet the experts dwo bde vds v7
Meet the experts dwo bde vds v7Meet the experts dwo bde vds v7
Meet the experts dwo bde vds v7
 
2015 Find You - Fuzzy Big Data - A Case Study
2015 Find You - Fuzzy Big Data - A Case Study2015 Find You - Fuzzy Big Data - A Case Study
2015 Find You - Fuzzy Big Data - A Case Study
 
Power Big Data Analytics with Informatica Cloud Integration for Redshift, Kin...
Power Big Data Analytics with Informatica Cloud Integration for Redshift, Kin...Power Big Data Analytics with Informatica Cloud Integration for Redshift, Kin...
Power Big Data Analytics with Informatica Cloud Integration for Redshift, Kin...
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
 

Similaire à Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)

Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARNDataWorks Summit
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
 
How does Apache DolphinScheduler (Incubator) support scheduling 100,000-level...
How does Apache DolphinScheduler (Incubator) support scheduling 100,000-level...How does Apache DolphinScheduler (Incubator) support scheduling 100,000-level...
How does Apache DolphinScheduler (Incubator) support scheduling 100,000-level...li David
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDataWorks Summit
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lakeEMC
 
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...Courtney Llamas
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Seetharam Venkatesh
 
Cloud Roundtable | Pivoltal: Agile platform
Cloud Roundtable | Pivoltal: Agile platformCloud Roundtable | Pivoltal: Agile platform
Cloud Roundtable | Pivoltal: Agile platformCodemotion
 
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics WorkbenchPivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics WorkbenchEMC
 
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsArchitecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsAlluxio, Inc.
 
Oracle Coherence Strategy and Roadmap (OpenWorld, September 2014)
Oracle Coherence Strategy and Roadmap (OpenWorld, September 2014)Oracle Coherence Strategy and Roadmap (OpenWorld, September 2014)
Oracle Coherence Strategy and Roadmap (OpenWorld, September 2014)jeckels
 
DataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupDataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupThomas Weise
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013StampedeCon
 
Database failover from client perspective
Database failover from client perspectiveDatabase failover from client perspective
Database failover from client perspectivePriit Piipuu
 
Building Data Streaming Platforms using OpenShift and Kafka
Building Data Streaming Platforms using OpenShift and KafkaBuilding Data Streaming Platforms using OpenShift and Kafka
Building Data Streaming Platforms using OpenShift and KafkaNenad Bogojevic
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?DataWorks Summit
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019alanfgates
 

Similaire à Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo) (20)

Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
How does Apache DolphinScheduler (Incubator) support scheduling 100,000-level...
How does Apache DolphinScheduler (Incubator) support scheduling 100,000-level...How does Apache DolphinScheduler (Incubator) support scheduling 100,000-level...
How does Apache DolphinScheduler (Incubator) support scheduling 100,000-level...
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
 
Cloud Roundtable | Pivoltal: Agile platform
Cloud Roundtable | Pivoltal: Agile platformCloud Roundtable | Pivoltal: Agile platform
Cloud Roundtable | Pivoltal: Agile platform
 
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics WorkbenchPivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
 
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsArchitecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
 
Oracle Coherence Strategy and Roadmap (OpenWorld, September 2014)
Oracle Coherence Strategy and Roadmap (OpenWorld, September 2014)Oracle Coherence Strategy and Roadmap (OpenWorld, September 2014)
Oracle Coherence Strategy and Roadmap (OpenWorld, September 2014)
 
DataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupDataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application Meetup
 
OpenStack Murano
OpenStack MuranoOpenStack Murano
OpenStack Murano
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
 
Database failover from client perspective
Database failover from client perspectiveDatabase failover from client perspective
Database failover from client perspective
 
Building Data Streaming Platforms using OpenShift and Kafka
Building Data Streaming Platforms using OpenShift and KafkaBuilding Data Streaming Platforms using OpenShift and Kafka
Building Data Streaming Platforms using OpenShift and Kafka
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
 

Dernier

Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Sheetaleventcompany
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Servicesexy call girls service in goa
 
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.soniya singh
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...singhpriety023
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)Delhi Call girls
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...Neha Pandey
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445ruhi
 
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Onlineanilsa9823
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Servicegwenoracqe6
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGAPNIC
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Delhi Call girls
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLimonikaupta
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...Escorts Call Girls
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 

Dernier (20)

Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOG
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
 
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 

Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)

  • 1. HUG France - 22 Sept 2014 - @Criteo Data Management platform for Hadoop Cédric Carbone, Talend CTO @carbone Jean-Baptiste Onofré, Falcon Committer @jbonofre Ce support est mis à disposition selon les termes de la Licence Creative Commons Attribution - Pas d’Utilisation Commerciale - Pas de Modification 2.0 France. - http://creativecommons.org/licenses/by-nc-nd/2.0/fr/ © Talend 2014 1
  • 2. Overview • Falcon is a Data Management solution for Hadoop • Falcon in production at InMobi since 2012 • InMobi gave Falcon to ASF in April 2013 • Falcon is in Apache incubation • Falcon embedded per default inside HDP • Falcon leverages a lot of Apache components - Oozie, Ambari, ActiveMQ, HCat, Sqoop… • Committer/PPMC/IPMC: - #8 InMobi - #5 Hortonworks - #1 Talend © Talend 2014 2
  • 3. Why Falcon? © Talend 2014 3
  • 4. Why Falcon? © Talend 2014 4
  • 5. What is Falcon? • Data Motion Import, Export, CDC • Policy-based Lifecycle Management Retention, Replication, Archival, Anonymization of PII data • Process orchestration and scheduling Late data handling, reprocessing, dependency checking, etc. Multi-cluster management to support Local/Global Aggregations, Rollups, etc. • Data Governance Lineage, Audit, SLA © Talend 2014 5
  • 6. Falcon - The Solution! • Introduces a higher layer of abstraction – Data Set Decouples a data location and its properties from workflows Understanding the life-time of a feed will allow for implicit validation of the processing rules • Provides the key services for data processing apps Common data services are simple directives, No need to define them verbosely in each job Allows process owners to keep their processing specific to their application logic Sits in the execution path, intercepts to handle OOB data / retries etc. • Promotes Polyglot Programming Does not do any heavy lifting but delegates to tools with in the Hadoop ecosystem © Talend 2014 6
  • 7. Falcon Basic Concepts : Data Pipelines • Cluster: : Represents the Hadoop cluster • Feed: Defines a “dataset” • Process: Consumes feeds, invokes processing logic & produces feeds • Entity Actions: submit, list, dependency, schedule, suspend, resume, status, definition, delete, update © Talend 2014 7
  • 8. Falcon Entity Relationships © Talend 2014 8
  • 9. Cluster Entity <?xml version="1.0"?> <cluster colo=”talend-datacenter" description="" name=”prod-cluster"> <interfaces> Needed by distcp for replications Writing to HDFS <interface type="readonly" endpoint="hftp://nn:50070" version="2.2.0" /> <interface type="write" endpoint="hdfs://nn:8020" version="2.2.0" /> <interface type="execute" endpoint=”rm:8050" version="2.2.0" /> <interface type="workflow" endpoint="http://os:11000/oozie/" version="4.0.0" /> <interface type=”registry" endpoint=”thrift://hms:9083" version=”0.12.0" /> <interface type="messaging" endpoint="tcp://mq:61616?daemon=true" version="5.1.6" /> </interfaces> <locations> Used to submit processes as MR Submit Oozie jobs <location name="staging" path="/apps/falcon/prod-cluster/staging" /> <location name="temp" path="/tmp" /> <location name="working" path="/apps/falcon/prod-cluster/working" /> </locations> </cluster> Hive metastore to register/ deregister partitions and get events on partition availability Used For alerts HDFS directories used by Falcon server © Talend 2014 9
  • 10. Feed Entity Feed run frequency in mins/hrs/days/mths <?xml version="1.0"?> <feed description=“" name=”testFeed” xmlns="uri:falcon:feed:0.1"> <frequency>hours(1)</frequency> <late-arrival cut-off="hours(6)”/> <groups>churnAnalysisFeeds</groups> <clusters> Late arrival cutoff <cluster name=”cluster-primary" type="source"> Feeds can belong to multiple groups <validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/> <retention limit="days(2)" action="delete"/> One or more source & target clusters for retention & replication </cluster> <cluster name=”cluster-secondary" type="target"> <validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/> <location type="data” path="/churn/weblogs/${YEAR}-${MONTH}-${DAY}-${HOUR} "/> <retention limit=”days(7)" action="delete"/> </cluster> </clusters> <locations> Global location across clusters - HDFS paths or Hive tables <location type="data” path="/weblogs/${YEAR}-${MONTH}-${DAY}-${HOUR} "/> </locations> <ACL owner=”hdfs" group="users" permission="0755"/> <schema location="/none" provider="none"/> </feed> Access Permissions © Talend 2014 10
  • 11. Process Entity <process name="process-test" xmlns="uri:falcon:process:0.1”> <clusters> <cluster name="cluster-primary"> Which cluster should the process run on and when <validity start="2011-11-02T00:00Z" end="2011-12-30T00:00Z" /> </cluster> How frequently does the process run , how many instances can be run in parallel and in what order </clusters> <parallel>1</parallel> <order>FIFO</order> <frequency>days(1)</frequency> <inputs> <input end="today(0,0)" start="today(0,0)" feed="feed-clicks-raw" name="input" /> </inputs> <outputs> Input & output feeds for process <output instance="now(0,2)" feed="feed-clicks-clean" name="output" /> </outputs> <workflow engine="pig" path="/apps/clickstream/clean-script.pig" /> <retry policy="periodic" delay="minutes(10)" attempts="3"/> <late-process policy="exp-backoff" delay="hours(1)"> <late-input input="input" workflow-path="/apps/clickstream/late" /> </late-process> </process> The processing logic. Retry policy on failure Handling late input feeds © Talend 2014 11
  • 12. High Level Architecture © Talend 2014 12
  • 13. Demo #1 : my first feed • Start Falcon on a single cluster • Submission of one simple feed • Check into Oozie the generated job © Talend 2014 13
  • 14. Replication & Retention • Sophisticated retention policies expressed in one place • Simplify data retention for audit, compliance, or for data re-processing © Talend 2014 14
  • 15. Demo #2: processes notification  Motivation: being able to be notified about processes activity “outside” of the cluster  Falcon manages workflow, send JMS notification, and a Camel route react to the notification  Result: trigger an action (Camel route) when a notification is sent by Falcon (eviction, late-arrival, process execution, …)  Mix BigData/Hadoop and ESB technologies © Talend 2014 15
  • 16. Demo #2: workflow Falcon ActiveMQ Camel 2014-03-19 11:25:43,273 | INFO | LCON.my-process] | process-listener | rg.apache.camel.util.CamelLogger 176 | 74 - org.apache.camel.camel-core - 2.13.0.SNAPSHOT | Exchange[ExchangePattern: InOnly, BodyType: java.util.HashMap, Body: {brokerUrl=tcp://localhost:61616, timeStamp=2014-03-19T10:24Z, status=SUCCEEDED, logFile=hdfs://localhost:8020/falcon/staging/falcon/workflows/process/my-process/logs/instancePaths-2013-11-15-06- 05.csv, feedNames=output, runId=0, entityType=process, nominalTime=2013-11-15T06:05Z, brokerTTL=4320, workflowUser=null, entityName=my-process, feedInstancePaths=hdfs://localhost:8020/data/output, operation=GENERATE, logDir=null, workflowId=0000026-140319105443372-oozie-jbon-W, cluster=local, brokerImplClass=org.apache.activemq.ActiveMQConnectionFactory, topicName=FALCON.my-process}] © Talend 2014 16
  • 17. Demo #2 : Data Notification • All notification will be in ActiveMQ • Subscribers : Camel routes • Add some files and see the notification system working! • http://blog.nanthrax.net/2014/03/hadoop-cdc-and-processes-notification-with- apache-falcon-apache-activemq-and-apache-camel/ © Talend 2014 17
  • 18. Data Pipeline Tracing © Talend 2014 18
  • 19. Topologies • STANDALONE – Single Data Center – Single Falcon Server – Hadoop jobs and relevant processing involves only one cluster • DISTRIBUTED – Multiple Data Centers – Falcon Server per DC – Multiple instances of hadoop clusters and workflow schedulers © Talend 2014 19
  • 20. Multi cluster failover  Motivation: replicate subset of data from one cluster to another, guarantee the different eviction depending of the data subset  Falcon manages workflow on primary cluster, and replication on failover cluster  Result: support business continuity without requiring full data reprocessing © Talend 2014 20
  • 21. Roadmap  Improvement on the late-arrival messages in FALCON.ENTITY.TOPIC (ActiveMQ)  Feed implicit processes (no need of process) providing “native” CDC  Straight forward MR usage (without pig, oozie workflow XML, …)  More data acquisition  Monitoring/Management/Designer dashboard  TLP !!! © Talend 2014 21
  • 22. Questions? Data Management platform for Hadoop Cédric Carbone, Talend CTO @carbone Jean-Baptiste Onofré, Falcon Committer @jbonofre Ce support est mis à disposition selon les termes de la Licence Creative Commons Attribution - Pas d’Utilisation Commerciale - Pas de Modification 2.0 France. - http://creativecommons.org/licenses/by-nc-nd/2.0/fr/ © Talend 2014 22