SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Building a data pipeline to ingest
data into Hadoop in minutes
using Streamsets Data Collector
Guglielmo Iozzia,
Big Data Infrastructure Engineer @ IBM Ireland
Data Ingestion for Analytics: a real scenario
In the business area (cloud applications) to which my team belongs there were so
many questions to be answered. They were related to:
● Defect analysis
● Outage analysis
● Cyber-Security
“Data is the second
most important
thing in analytics”
Data Ingestion: multiple sources...
● Legacy systems
● DB2
● Lotus Domino
● MongoDB
● Application logs
● System logs
● New Relic
● Jenkins pipelines
● Testing tools output
● RESTful Services
… and so many tools available to get the data
What are we going to do with all those data?
Issues
● The need to collect data from multiple sources introduces redundancy, which
costs additional disk space and increases query times.
● A small team.
● Lack of skills and experience across the team (and the business area in
general) in managing Big Data tools.
● Low budget.
Alternatives
#1 Panic
Alternatives
#2 Cloning team members
Alternatives
#3 Find a smart way to simplify the data ingestion
process
A single tool needed...
● Design complex data flows with minimal coding and the maximum flexibility.
● Provide real-time data flow statistics, metrics for each flow stage.
● Automated error handling and alerting.
● Easy to use by everyone.
● Zero-downtime when upgrading the infrastructure due to logical isolation of
each flow stage.
● Open Source
… something like this
Streamsets Data Collector
Streamsets Data Collector
Streamsets Data Collector: supported origins
Streamsets Data Collector: available destinations
Streamsets Data Collector: available processors
● Base64 Field Decoder
● Base64 Field Encoder
● Expression Evaluator
● Field Converter
● JavaScript Evaluator
● JSON Parser
● Jython Evaluator
● Log Parser
● Stream Selector
● XML Parser
...and many others
Streamsets Data Collector
Demo
Streamsets DC: performance and reliability
● Two available execution modes: standalone or cluster
● Implemented in Java: so any performance best practice/recommendation for
Java applications applies here
● REST services for performance monitoring available
● Rules and alerts (metric and data both)
Streamsets Data Collector: security
● You can authenticate user accounts based on LDAP
● Authorization: the Data Collector provides several roles (admin, manager,
creator, guest)
● You can use Kerberos authentication to connect to origin and destination
systems
● Follow the usual security best practices in terms of iptables, networking, etc.
for Java web applications running on Linux machines.
Useful Links
Streamsets Data Collector:
https://streamsets.com/product/
Thanks!
My contacts:
Linkedin: https://ie.linkedin.com/in/giozzia
Blog: http://googlielmo.blogspot.ie/
Twitter: https://twitter.com/guglielmoiozzia
Building a data pipeline to ingest data into Hadoop in minutes using Streamsets Data Collector

Contenu connexe

Tendances

Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsWebinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsKinetica
 
Building Custom Big Data Integrations
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data IntegrationsPat Patterson
 
Dealing with Drift: Building an Enterprise Data Lake
Dealing with Drift: Building an Enterprise Data LakeDealing with Drift: Building an Enterprise Data Lake
Dealing with Drift: Building an Enterprise Data LakePat Patterson
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Pat Patterson
 
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdf
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdfHUGIreland_CronanMcNamara_DataScience_ExpertModels.pdf
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdfJohn Mulhall
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scaledatamantra
 
Introduction to basic data analytics tools
Introduction to basic data analytics toolsIntroduction to basic data analytics tools
Introduction to basic data analytics toolsNascenia IT
 
Presto: Fast SQL on Everything
Presto: Fast SQL on EverythingPresto: Fast SQL on Everything
Presto: Fast SQL on EverythingDavid Phillips
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Data Con LA
 
From Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexFrom Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexDataWorks Summit
 
Obfuscating LinkedIn Member Data
Obfuscating LinkedIn Member DataObfuscating LinkedIn Member Data
Obfuscating LinkedIn Member DataDataWorks Summit
 
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSuccinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSpark Summit
 
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...Spark Summit
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup Omid Vahdaty
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsMars Lan
 
Credit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph AnalysisCredit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph AnalysisJen Aman
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Flink Forward
 

Tendances (20)

Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsWebinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
 
Building Custom Big Data Integrations
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data Integrations
 
Dealing with Drift: Building an Enterprise Data Lake
Dealing with Drift: Building an Enterprise Data LakeDealing with Drift: Building an Enterprise Data Lake
Dealing with Drift: Building an Enterprise Data Lake
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!
 
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdf
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdfHUGIreland_CronanMcNamara_DataScience_ExpertModels.pdf
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdf
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scale
 
Introduction to basic data analytics tools
Introduction to basic data analytics toolsIntroduction to basic data analytics tools
Introduction to basic data analytics tools
 
Presto: Fast SQL on Everything
Presto: Fast SQL on EverythingPresto: Fast SQL on Everything
Presto: Fast SQL on Everything
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
 
From Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexFrom Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache Apex
 
Obfuscating LinkedIn Member Data
Obfuscating LinkedIn Member DataObfuscating LinkedIn Member Data
Obfuscating LinkedIn Member Data
 
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSuccinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
 
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
 
Credit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph AnalysisCredit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph Analysis
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Building a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with RBuilding a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with R
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
 

En vedette

A REPORT OF PROFESSIONAL TRAINING WORKSHOP
A REPORT OF PROFESSIONAL TRAINING WORKSHOPA REPORT OF PROFESSIONAL TRAINING WORKSHOP
A REPORT OF PROFESSIONAL TRAINING WORKSHOPEzekiel Tunde ADEBAMIWI
 
Propiedad intelectual y Proteccion Juridica del Software
Propiedad intelectual y Proteccion Juridica del SoftwarePropiedad intelectual y Proteccion Juridica del Software
Propiedad intelectual y Proteccion Juridica del Softwarejorge quispe
 
Informatica
InformaticaInformatica
Informaticaxgmikeh
 
JSIL Print Media Presentation_PR Lipper Award 2016
JSIL Print Media Presentation_PR Lipper Award 2016JSIL Print Media Presentation_PR Lipper Award 2016
JSIL Print Media Presentation_PR Lipper Award 2016Ahmad butt
 
Presentación Rol del estudiante y de los tutores en la educación a distancia
Presentación  Rol del estudiante y de los tutores en la educación a distanciaPresentación  Rol del estudiante y de los tutores en la educación a distancia
Presentación Rol del estudiante y de los tutores en la educación a distanciaMONICA CALDERON
 
Derecho informatico
Derecho informaticoDerecho informatico
Derecho informaticojorge quispe
 
Contratacion Electronica & Contratacion Informatica
Contratacion Electronica & Contratacion InformaticaContratacion Electronica & Contratacion Informatica
Contratacion Electronica & Contratacion Informaticajorge quispe
 
Teletrabajo en la administración pública
Teletrabajo en la administración públicaTeletrabajo en la administración pública
Teletrabajo en la administración públicajorge quispe
 
Delitos informaticos
Delitos informaticosDelitos informaticos
Delitos informaticosjorge quispe
 
Analisis economico del derecho
Analisis economico del derechoAnalisis economico del derecho
Analisis economico del derechojorge quispe
 
Comercio electrónico
Comercio electrónicoComercio electrónico
Comercio electrónicojorge quispe
 
CISSP Prep: Ch 1: Security Governance Through Principles and Policies
CISSP Prep: Ch 1: Security Governance Through Principles and PoliciesCISSP Prep: Ch 1: Security Governance Through Principles and Policies
CISSP Prep: Ch 1: Security Governance Through Principles and PoliciesSam Bowne
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudStreamsets Inc.
 
Practical Malware Analysis: Ch 11: Malware Behavior
Practical Malware Analysis: Ch 11: Malware BehaviorPractical Malware Analysis: Ch 11: Malware Behavior
Practical Malware Analysis: Ch 11: Malware BehaviorSam Bowne
 
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbgPractical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbgSam Bowne
 

En vedette (20)

Base de datos
Base de datosBase de datos
Base de datos
 
A REPORT OF PROFESSIONAL TRAINING WORKSHOP
A REPORT OF PROFESSIONAL TRAINING WORKSHOPA REPORT OF PROFESSIONAL TRAINING WORKSHOP
A REPORT OF PROFESSIONAL TRAINING WORKSHOP
 
Propiedad intelectual y Proteccion Juridica del Software
Propiedad intelectual y Proteccion Juridica del SoftwarePropiedad intelectual y Proteccion Juridica del Software
Propiedad intelectual y Proteccion Juridica del Software
 
Informatica
InformaticaInformatica
Informatica
 
JSIL Print Media Presentation_PR Lipper Award 2016
JSIL Print Media Presentation_PR Lipper Award 2016JSIL Print Media Presentation_PR Lipper Award 2016
JSIL Print Media Presentation_PR Lipper Award 2016
 
Presentación Rol del estudiante y de los tutores en la educación a distancia
Presentación  Rol del estudiante y de los tutores en la educación a distanciaPresentación  Rol del estudiante y de los tutores en la educación a distancia
Presentación Rol del estudiante y de los tutores en la educación a distancia
 
Housing Academy 2
Housing Academy 2Housing Academy 2
Housing Academy 2
 
Derecho informatico
Derecho informaticoDerecho informatico
Derecho informatico
 
Contratacion Electronica & Contratacion Informatica
Contratacion Electronica & Contratacion InformaticaContratacion Electronica & Contratacion Informatica
Contratacion Electronica & Contratacion Informatica
 
Chan
ChanChan
Chan
 
Teletrabajo en la administración pública
Teletrabajo en la administración públicaTeletrabajo en la administración pública
Teletrabajo en la administración pública
 
tieguy brochure
tieguy brochuretieguy brochure
tieguy brochure
 
Delitos informaticos
Delitos informaticosDelitos informaticos
Delitos informaticos
 
Tic
TicTic
Tic
 
Analisis economico del derecho
Analisis economico del derechoAnalisis economico del derecho
Analisis economico del derecho
 
Comercio electrónico
Comercio electrónicoComercio electrónico
Comercio electrónico
 
CISSP Prep: Ch 1: Security Governance Through Principles and Policies
CISSP Prep: Ch 1: Security Governance Through Principles and PoliciesCISSP Prep: Ch 1: Security Governance Through Principles and Policies
CISSP Prep: Ch 1: Security Governance Through Principles and Policies
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
 
Practical Malware Analysis: Ch 11: Malware Behavior
Practical Malware Analysis: Ch 11: Malware BehaviorPractical Malware Analysis: Ch 11: Malware Behavior
Practical Malware Analysis: Ch 11: Malware Behavior
 
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbgPractical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
 

Similaire à Building a data pipeline to ingest data into Hadoop in minutes using Streamsets Data Collector

Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDogRedis Labs
 
How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...PerformanceVision (previously SecurActive)
 
Game Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupGame Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupJelena Zanko
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
Challenges of monitoring distributed systems
Challenges of monitoring distributed systemsChallenges of monitoring distributed systems
Challenges of monitoring distributed systemsNenad Bozic
 
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...In-Memory Computing Summit
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Demi Ben-Ari
 
Log aggregation and analysis
Log aggregation and analysisLog aggregation and analysis
Log aggregation and analysisDhaval Mehta
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Codemotion
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Demi Ben-Ari
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Hernan Costante
 
Real-time analysis using an in-memory data grid - Cloud Expo 2013
Real-time analysis using an in-memory data grid - Cloud Expo 2013Real-time analysis using an in-memory data grid - Cloud Expo 2013
Real-time analysis using an in-memory data grid - Cloud Expo 2013ScaleOut Software
 
Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017Deepu K Sasidharan
 
Devoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipsterDevoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipsterJulien Dubois
 
IBM IoT Architecture and Capabilities at the Edge and Cloud
IBM IoT Architecture and Capabilities at the Edge and Cloud IBM IoT Architecture and Capabilities at the Edge and Cloud
IBM IoT Architecture and Capabilities at the Edge and Cloud Pradeep Natarajan
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16AppDynamics
 
Splunk App for Stream
Splunk App for StreamSplunk App for Stream
Splunk App for StreamSplunk
 

Similaire à Building a data pipeline to ingest data into Hadoop in minutes using Streamsets Data Collector (20)

Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...
 
Game Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupGame Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid Meetup
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Challenges of monitoring distributed systems
Challenges of monitoring distributed systemsChallenges of monitoring distributed systems
Challenges of monitoring distributed systems
 
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
 
Log aggregation and analysis
Log aggregation and analysisLog aggregation and analysis
Log aggregation and analysis
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
 
Monitoring in 2017 - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp DockerMonitoring in 2017 - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp Docker
 
Real-time analysis using an in-memory data grid - Cloud Expo 2013
Real-time analysis using an in-memory data grid - Cloud Expo 2013Real-time analysis using an in-memory data grid - Cloud Expo 2013
Real-time analysis using an in-memory data grid - Cloud Expo 2013
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017
 
Devoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipsterDevoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipster
 
IBM IoT Architecture and Capabilities at the Edge and Cloud
IBM IoT Architecture and Capabilities at the Edge and Cloud IBM IoT Architecture and Capabilities at the Edge and Cloud
IBM IoT Architecture and Capabilities at the Edge and Cloud
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
 
Splunk App for Stream
Splunk App for StreamSplunk App for Stream
Splunk App for Stream
 

Dernier

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 

Dernier (20)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 

Building a data pipeline to ingest data into Hadoop in minutes using Streamsets Data Collector

  • 1. Building a data pipeline to ingest data into Hadoop in minutes using Streamsets Data Collector Guglielmo Iozzia, Big Data Infrastructure Engineer @ IBM Ireland
  • 2. Data Ingestion for Analytics: a real scenario In the business area (cloud applications) to which my team belongs there were so many questions to be answered. They were related to: ● Defect analysis ● Outage analysis ● Cyber-Security
  • 3. “Data is the second most important thing in analytics”
  • 4. Data Ingestion: multiple sources... ● Legacy systems ● DB2 ● Lotus Domino ● MongoDB ● Application logs ● System logs ● New Relic ● Jenkins pipelines ● Testing tools output ● RESTful Services
  • 5. … and so many tools available to get the data
  • 6. What are we going to do with all those data?
  • 7. Issues ● The need to collect data from multiple sources introduces redundancy, which costs additional disk space and increases query times. ● A small team. ● Lack of skills and experience across the team (and the business area in general) in managing Big Data tools. ● Low budget.
  • 10. Alternatives #3 Find a smart way to simplify the data ingestion process
  • 11. A single tool needed... ● Design complex data flows with minimal coding and the maximum flexibility. ● Provide real-time data flow statistics, metrics for each flow stage. ● Automated error handling and alerting. ● Easy to use by everyone. ● Zero-downtime when upgrading the infrastructure due to logical isolation of each flow stage. ● Open Source
  • 15. Streamsets Data Collector: supported origins
  • 16. Streamsets Data Collector: available destinations
  • 17. Streamsets Data Collector: available processors ● Base64 Field Decoder ● Base64 Field Encoder ● Expression Evaluator ● Field Converter ● JavaScript Evaluator ● JSON Parser ● Jython Evaluator ● Log Parser ● Stream Selector ● XML Parser ...and many others
  • 19. Streamsets DC: performance and reliability ● Two available execution modes: standalone or cluster ● Implemented in Java: so any performance best practice/recommendation for Java applications applies here ● REST services for performance monitoring available ● Rules and alerts (metric and data both)
  • 20. Streamsets Data Collector: security ● You can authenticate user accounts based on LDAP ● Authorization: the Data Collector provides several roles (admin, manager, creator, guest) ● You can use Kerberos authentication to connect to origin and destination systems ● Follow the usual security best practices in terms of iptables, networking, etc. for Java web applications running on Linux machines.
  • 21. Useful Links Streamsets Data Collector: https://streamsets.com/product/
  • 22. Thanks! My contacts: Linkedin: https://ie.linkedin.com/in/giozzia Blog: http://googlielmo.blogspot.ie/ Twitter: https://twitter.com/guglielmoiozzia