Ingesting Data from Kafka to JDBC with Transformation and Enrichment

•Download as PPTX, PDF•

1 like•897 views

Presenter - Dr Sandeep Deshmukh, Committer Apache Apex, DataTorrent engineer Abstract: Ingesting and extracting data from Hadoop can be a frustrating, time consuming activity for many enterprises. Apache Apex Data Ingestion is a standalone big data application that simplifies the collection, aggregation and movement of large amounts of data to and from Hadoop for a more efficient data processing pipeline. Apache Apex Data Ingestion makes configuring and running Hadoop data ingestion and data extraction a point and click process enabling a smooth, easy path to your Hadoop-based big data project. In this series of talks, we would cover how Hadoop Ingestion is made easy using Apache Apex. The third talk in this series would focus on ingesting unbounded data from Kafka to JDBC with couple of processing operators -Transform and enrichment.

Technology

Meetup Series on
Hadoop Ingestion Made Easy
Part 3: Ingestion of Unbounded Data
(Kafka -> Enrich -> Transform -> JDBC)
Dr. Sandeep Deshmukh
Committer@Apache Apex
Engineer@DataTorrent
Yogi Devendra
Committer@Apache Apex
Engineer@DataTorrent

2
What is Ingestion
Data ingestion
• A process of obtaining, importing, and analyzing data for later use or storage
in a database
Big Data Ingestion
• Discovering the data sources
• Importing the data
• Processing data to produce intermediate data
• Sending data out to durable data stores
ETL + Big Data => Data ingestion

3
Requirements
• Read from stream
• Kafka
• JMS
• JDBC
• ….
• Enrich
• Transform
• Store
• JDBC

4
Use Case
Kafka
Reader
Enrich Transform
JDBC
Output
● Kafka 0.9
● Multi-topic
● Multi-partition
● Enrich State
using City
● Mark a
transaction as
HIGH or LOW
value
● Store in
DB
Parallel Partition can scale to large volume

5 5
Let us walk-through the code and appreciate how simple it is to
use Apex
● Code walk-through for an data ingestion pipeline using
○ Operators/Modules in Malhar
○ Drag-n-drop user interface
● Validate
○ Partitioning
○ Scalability
Let us create and run the app

6
Resources
• http://apex.apache.org/
• Learn more: http://apex.apache.org/docs.html
• Subscribe - http://apex.apache.org/community.html
• Download - http://apex.apache.org/downloads.html
• Follow @ApacheApex - https://twitter.com/apacheapex
• Meetups – http://www.meetup.com/pro/apacheapex/
• More examples: https://github.com/DataTorrent/examples
• Slideshare: http://www.slideshare.net/ApacheApex/presentations
• https://www.youtube.com/results?search_query=apache+apex
• Free Enterprise License for Startups -
https://www.datatorrent.com/product/startup-accelerator/

What's hot

Introduction to Apache ApexApache Apex

Java High Level Stream APIApache Apex

Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingApache Apex

Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacApache Apex

Deep Dive into Apache Apex App DevelopmentApache Apex

Introduction to Apache Apex - CoDS 2016Bhupesh Chawda

Developing streaming applications with apache apex (strata + hadoop world)Apache Apex

Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareApache Apex

Apache Apex: Stream Processing Architecture and ApplicationsThomas Weise

Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexApache Apex

Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Apex

Building your first aplication using Apache ApexYogi Devendra Vyavahare

DataTorrent Presentation @ Big Data Application MeetupThomas Weise

Architectual Comparison of Apache Apex and Spark StreamingApache Apex

Introduction to Apache ApexApache Apex

Smart Partitioning with Apache Apex (Webinar)Apache Apex

Ingestion and Dimensions Compute and Enrich using Apache ApexApache Apex

Introduction to Apache Apex and writing a big data streaming application Apache Apex

Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, UberHostedbyConfluent

Apache Apex Fault Tolerance and Processing SemanticsApache Apex

What's hot (20)

Introduction to Apache Apex

Java High Level Stream API

Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming

Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac

Deep Dive into Apache Apex App Development

Introduction to Apache Apex - CoDS 2016

Developing streaming applications with apache apex (strata + hadoop world)

Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare

Apache Apex: Stream Processing Architecture and Applications

Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex

Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex

Building your first aplication using Apache Apex

DataTorrent Presentation @ Big Data Application Meetup

Architectual Comparison of Apache Apex and Spark Streaming

Introduction to Apache Apex

Smart Partitioning with Apache Apex (Webinar)

Ingestion and Dimensions Compute and Enrich using Apache Apex

Introduction to Apache Apex and writing a big data streaming application

Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, Uber

Apache Apex Fault Tolerance and Processing Semantics

Viewers also liked

Apache Apex Kafka Input OperatorApache Apex

Intro to Big Data HadoopApache Apex

Introduction to YarnApache Apex

Fault-Tolerant File Input & OutputApache Apex

Apache Apex connector with Kafka 0.9 consumer APIApache Apex

Gobblin: Unifying Data Ingestion for HadoopYinan Li

High Speed Continuous & Reliable Data Ingest into HadoopDataWorks Summit

Stream data from Apache Kafka for processing with Apache ApexApache Apex

Spark Streaming into contextDavid Martínez Rego

Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...Chris Fregly

Spark streaming: Best PracticesPrakash Chockalingam

Data Ingestion, Extraction & Parsing on Hadoopskaluska

Spark Advanced Analytics NJ Data Science Meetup - Princeton UniversityAlex Zeltov

Map reduce: beyond word countJeff Patti

Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...Spark Summit

Spark+flume seattleHari Shreedharan

Scala - The Simple Parts, SFScala presentationMartin Odersky

Advanced Production DebuggingTakipi

HDFS InternalsApache Apex

Introduction to Map ReduceApache Apex

Viewers also liked (20)

Apache Apex Kafka Input Operator

Intro to Big Data Hadoop

Introduction to Yarn

Fault-Tolerant File Input & Output

Apache Apex connector with Kafka 0.9 consumer API

Gobblin: Unifying Data Ingestion for Hadoop

High Speed Continuous & Reliable Data Ingest into Hadoop

Stream data from Apache Kafka for processing with Apache Apex

Spark Streaming into context

Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...

Spark streaming: Best Practices

Data Ingestion, Extraction & Parsing on Hadoop

Spark Advanced Analytics NJ Data Science Meetup - Princeton University

Map reduce: beyond word count

Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...

Spark+flume seattle

Scala - The Simple Parts, SFScala presentation

Advanced Production Debugging

HDFS Internals

Introduction to Map Reduce

Similar to Ingesting Data from Kafka to JDBC with Transformation and Enrichment

What's New in Apache Hive 3.0?DataWorks Summit

What's New in Apache Hive 3.0 - TokyoDataWorks Summit

Azure Data Engineering.pptxpriyadharshini626440

Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyAlluxio, Inc.

Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel

Whats new in Oracle Database 12c release 12.1.0.2Connor McDonald

Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...AboutYouGmbH

Slide 2 collecting, storing and analyzing big dataTrieu Nguyen

Kafka & Hadoop in RakutenRakuten Group, Inc.

Gruter TECHDAY 2014 Realtime Processing in TelcoGruter

Building Apps with Distributed In-Memory Computing Using Apache GeodePivotalOpenSourceHub

Replicate from Oracle to data warehouses and analyticsContinuent

Hadoop and IDW - When_to_use_whichDan TheMan

A Journey from Oracle to PostgreSQLEDB

Oracle big data appliance and solutionssolarisyougood

JDBC Connectivity Modelkunj desai

Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal

Hadoop and Voldemort @ LinkedInHadoop User Group

Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks

Presentation big dataappliance-overview_oow_v3xKinAnx

Similar to Ingesting Data from Kafka to JDBC with Transformation and Enrichment (20)

What's New in Apache Hive 3.0?

What's New in Apache Hive 3.0 - Tokyo

Azure Data Engineering.pptx

Modernizing Global Shared Data Analytics Platform and our Alluxio Journey

Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure

Whats new in Oracle Database 12c release 12.1.0.2

Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...

Slide 2 collecting, storing and analyzing big data

Kafka & Hadoop in Rakuten

Gruter TECHDAY 2014 Realtime Processing in Telco

Building Apps with Distributed In-Memory Computing Using Apache Geode

Replicate from Oracle to data warehouses and analytics

Hadoop and IDW - When_to_use_which

A Journey from Oracle to PostgreSQL

Oracle big data appliance and solutions

JDBC Connectivity Model

Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010

Hadoop and Voldemort @ LinkedIn

Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform

Presentation big dataappliance-overview_oow_v3

Recently uploaded

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Artificial Intelligence: Facts and MythsJoaquim Jorge

A Domino Admins Adventures (Engage 2024)Gabriella Davis

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

Scaling API-first – The story of a global engineering organizationRadu Cotescu

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Histor y of HAM Radio presentation slidevu2urc

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Breaking the Kubernetes Kill Chain: Host Path Mount

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Artificial Intelligence: Facts and Myths

A Domino Admins Adventures (Engage 2024)

CNv6 Instructor Chapter 6 Quality of Service

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

Scaling API-first – The story of a global engineering organization

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Exploring the Future Potential of AI-Enabled Smartphone Processors

Histor y of HAM Radio presentation slide

How to Troubleshoot Apps for the Modern Connected Worker

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

GenCyber Cyber Security Day Presentation

Boost Fertility New Invention Ups Success Rates.pdf

Powerful Google developer tools for immediate impact! (2023-24 C)

Ingesting Data from Kafka to JDBC with Transformation and Enrichment

1. Meetup Series on Hadoop Ingestion Made Easy Part 3: Ingestion of Unbounded Data (Kafka -> Enrich -> Transform -> JDBC) Dr. Sandeep Deshmukh Committer@Apache Apex Engineer@DataTorrent Yogi Devendra Committer@Apache Apex Engineer@DataTorrent

2. 2 What is Ingestion Data ingestion • A process of obtaining, importing, and analyzing data for later use or storage in a database Big Data Ingestion • Discovering the data sources • Importing the data • Processing data to produce intermediate data • Sending data out to durable data stores ETL + Big Data => Data ingestion

3. 3 Requirements • Read from stream • Kafka • JMS • JDBC • …. • Enrich • Transform • Store • JDBC

4. 4 Use Case Kafka Reader Enrich Transform JDBC Output ● Kafka 0.9 ● Multi-topic ● Multi-partition ● Enrich State using City ● Mark a transaction as HIGH or LOW value ● Store in DB Parallel Partition can scale to large volume

5. 5 5 Let us walk-through the code and appreciate how simple it is to use Apex ● Code walk-through for an data ingestion pipeline using ○ Operators/Modules in Malhar ○ Drag-n-drop user interface ● Validate ○ Partitioning ○ Scalability Let us create and run the app

6. 6 Resources • http://apex.apache.org/ • Learn more: http://apex.apache.org/docs.html • Subscribe - http://apex.apache.org/community.html • Download - http://apex.apache.org/downloads.html • Follow @ApacheApex - https://twitter.com/apacheapex • Meetups – http://www.meetup.com/pro/apacheapex/ • More examples: https://github.com/DataTorrent/examples • Slideshare: http://www.slideshare.net/ApacheApex/presentations • https://www.youtube.com/results?search_query=apache+apex • Free Enterprise License for Startups - https://www.datatorrent.com/product/startup-accelerator/

7. Q&A 7

Ingesting Data from Kafka to JDBC with Transformation and Enrichment

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Ingesting Data from Kafka to JDBC with Transformation and Enrichment

Similar to Ingesting Data from Kafka to JDBC with Transformation and Enrichment (20)

More from Apache Apex

More from Apache Apex (8)

Recently uploaded

Recently uploaded (20)

Ingesting Data from Kafka to JDBC with Transformation and Enrichment