Soumettre la recherche
Mettre en ligne
Integrating data stored in rdbms and hadoop
•
15 j'aime
•
1,123 vues
leorick lin
Suivre
Using Apache Spark to access data stored in RDBMS & Hadoop.
Lire moins
Lire la suite
Technologie
Signaler
Partager
Signaler
Partager
1 sur 13
Télécharger maintenant
Télécharger pour lire hors ligne
Recommandé
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Etu Solution
Red Hat® Ceph Storage and Network Solutions for Software Defined Infrastructure
Red Hat® Ceph Storage and Network Solutions for Software Defined Infrastructure
Intel® Software
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges
DataWorks Summit
DDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your Hardware
inside-BigData.com
Why Software-Defined Storage Matters
Why Software-Defined Storage Matters
Colleen Corrice
ha_module5
ha_module5
Gurmukh Singh
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
DataWorks Summit
IBM Power9 Features and Specifications
IBM Power9 Features and Specifications
inside-BigData.com
Recommandé
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Etu Solution
Red Hat® Ceph Storage and Network Solutions for Software Defined Infrastructure
Red Hat® Ceph Storage and Network Solutions for Software Defined Infrastructure
Intel® Software
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges
DataWorks Summit
DDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your Hardware
inside-BigData.com
Why Software-Defined Storage Matters
Why Software-Defined Storage Matters
Colleen Corrice
ha_module5
ha_module5
Gurmukh Singh
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
DataWorks Summit
IBM Power9 Features and Specifications
IBM Power9 Features and Specifications
inside-BigData.com
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red_Hat_Storage
How to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL server
EDB
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
mundlapudi
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
Nicolas Poggi
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
Red_Hat_Storage
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2
hdhappy001
9/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'16
Kangaroot
Migration DB2 to EDB - Project Experience
Migration DB2 to EDB - Project Experience
EDB
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
Nicolas Poggi
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
inside-BigData.com
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red_Hat_Storage
HBase with MapR
HBase with MapR
Tomer Shiran
Learning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under Containers
inside-BigData.com
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
Great Wide Open
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoop
mcsrivas
Hadoop, Taming Elephants
Hadoop, Taming Elephants
Ovidiu Dimulescu
OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
EDB
Performance tuning your Hadoop/Spark clusters to use cloud storage
Performance tuning your Hadoop/Spark clusters to use cloud storage
DataWorks Summit
How to Design for Database High Availability
How to Design for Database High Availability
EDB
Best Practices & Lessons Learned from Deployment of PostgreSQL
Best Practices & Lessons Learned from Deployment of PostgreSQL
EDB
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
DataStax Academy
Scala introduction
Scala introduction
vito jeng
Contenu connexe
Tendances
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red_Hat_Storage
How to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL server
EDB
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
mundlapudi
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
Nicolas Poggi
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
Red_Hat_Storage
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2
hdhappy001
9/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'16
Kangaroot
Migration DB2 to EDB - Project Experience
Migration DB2 to EDB - Project Experience
EDB
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
Nicolas Poggi
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
inside-BigData.com
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red_Hat_Storage
HBase with MapR
HBase with MapR
Tomer Shiran
Learning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under Containers
inside-BigData.com
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
Great Wide Open
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoop
mcsrivas
Hadoop, Taming Elephants
Hadoop, Taming Elephants
Ovidiu Dimulescu
OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
EDB
Performance tuning your Hadoop/Spark clusters to use cloud storage
Performance tuning your Hadoop/Spark clusters to use cloud storage
DataWorks Summit
How to Design for Database High Availability
How to Design for Database High Availability
EDB
Best Practices & Lessons Learned from Deployment of PostgreSQL
Best Practices & Lessons Learned from Deployment of PostgreSQL
EDB
Tendances
(20)
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
How to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL server
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2
9/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'16
Migration DB2 to EDB - Project Experience
Migration DB2 to EDB - Project Experience
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
HBase with MapR
HBase with MapR
Learning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under Containers
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoop
Hadoop, Taming Elephants
Hadoop, Taming Elephants
OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
Performance tuning your Hadoop/Spark clusters to use cloud storage
Performance tuning your Hadoop/Spark clusters to use cloud storage
How to Design for Database High Availability
How to Design for Database High Availability
Best Practices & Lessons Learned from Deployment of PostgreSQL
Best Practices & Lessons Learned from Deployment of PostgreSQL
En vedette
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
DataStax Academy
Scala introduction
Scala introduction
vito jeng
Cloudera 助力台灣大數據產業的發展
Cloudera 助力台灣大數據產業的發展
Etu Solution
Apache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析
Etu Solution
Vpon - 廣告效果導向為基礎的行動廣告系統
Vpon - 廣告效果導向為基礎的行動廣告系統
Vpon
Strata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on Spark
Adam Gibson
Hadoop, the Apple of Our Eyes (這些年,我們一起追的 Hadoop)
Hadoop, the Apple of Our Eyes (這些年,我們一起追的 Hadoop)
Kuo-Chun Su
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Etu Solution
翻轉醫療-人類基因大數據解密
翻轉醫療-人類基因大數據解密
Chung-Tsai Su
唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub
Chao Zhu
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
Amazon Web Services
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
AWS電商和零售業解決方案介紹
AWS電商和零售業解決方案介紹
Amazon Web Services
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
Cloudera, Inc.
Integration of HIve and HBase
Integration of HIve and HBase
Hortonworks
En vedette
(16)
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Scala introduction
Scala introduction
Cloudera 助力台灣大數據產業的發展
Cloudera 助力台灣大數據產業的發展
Apache Hadoop and HBase
Apache Hadoop and HBase
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析
Vpon - 廣告效果導向為基礎的行動廣告系統
Vpon - 廣告效果導向為基礎的行動廣告系統
Strata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on Spark
Hadoop, the Apple of Our Eyes (這些年,我們一起追的 Hadoop)
Hadoop, the Apple of Our Eyes (這些年,我們一起追的 Hadoop)
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
翻轉醫療-人類基因大數據解密
翻轉醫療-人類基因大數據解密
唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
AWS電商和零售業解決方案介紹
AWS電商和零售業解決方案介紹
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
Integration of HIve and HBase
Integration of HIve and HBase
Similaire à Integrating data stored in rdbms and hadoop
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
Databricks
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
Patrick Wendell
Nko workshop - node js & nosql
Nko workshop - node js & nosql
Simon Su
Map reduce vs spark
Map reduce vs spark
Tudor Lapusan
Oracle sharding : Installation & Configuration
Oracle sharding : Installation & Configuration
suresh gandhi
Beneath RDD in Apache Spark by Jacek Laskowski
Beneath RDD in Apache Spark by Jacek Laskowski
Spark Summit
Bogdan Kecman Advanced Databasing
Bogdan Kecman Advanced Databasing
Bogdan Kecman
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS Accelerator
Databricks
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Sameer Farooqui
Bogdan Kecman INIT Presentation
Bogdan Kecman INIT Presentation
arhismece
Improve PostgreSQL replication with Oracle GoldenGate
Improve PostgreSQL replication with Oracle GoldenGate
Bobby Curtis
Node.js
Node.js
Danilo Sousa
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student Slides
Databricks
How Spark Does It Internally?
How Spark Does It Internally?
Knoldus Inc.
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
SegFaultConf
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
Inhacking
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Аліна Шепшелей
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Kohei KaiGai
Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.
Keshav Murthy
Introduction to Grunt.js on Taiwan JavaScript Conference
Introduction to Grunt.js on Taiwan JavaScript Conference
Bo-Yi Wu
Similaire à Integrating data stored in rdbms and hadoop
(20)
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
Nko workshop - node js & nosql
Nko workshop - node js & nosql
Map reduce vs spark
Map reduce vs spark
Oracle sharding : Installation & Configuration
Oracle sharding : Installation & Configuration
Beneath RDD in Apache Spark by Jacek Laskowski
Beneath RDD in Apache Spark by Jacek Laskowski
Bogdan Kecman Advanced Databasing
Bogdan Kecman Advanced Databasing
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS Accelerator
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Bogdan Kecman INIT Presentation
Bogdan Kecman INIT Presentation
Improve PostgreSQL replication with Oracle GoldenGate
Improve PostgreSQL replication with Oracle GoldenGate
Node.js
Node.js
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student Slides
How Spark Does It Internally?
How Spark Does It Internally?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.
Introduction to Grunt.js on Taiwan JavaScript Conference
Introduction to Grunt.js on Taiwan JavaScript Conference
Plus de leorick lin
How to prepare for pca certification 2021
How to prepare for pca certification 2021
leorick lin
1.5.ensemble learning with apache spark m llib 1.5
1.5.ensemble learning with apache spark m llib 1.5
leorick lin
1.5.recommending music with apache spark ml
1.5.recommending music with apache spark ml
leorick lin
analyzing hdfs files using apace spark and mapreduce FixedLengthInputformat
analyzing hdfs files using apace spark and mapreduce FixedLengthInputformat
leorick lin
Multiclassification with Decision Tree in Spark MLlib 1.3
Multiclassification with Decision Tree in Spark MLlib 1.3
leorick lin
Email Classifier using Spark 1.3 Mlib / ML Pipeline
Email Classifier using Spark 1.3 Mlib / ML Pipeline
leorick lin
Plus de leorick lin
(6)
How to prepare for pca certification 2021
How to prepare for pca certification 2021
1.5.ensemble learning with apache spark m llib 1.5
1.5.ensemble learning with apache spark m llib 1.5
1.5.recommending music with apache spark ml
1.5.recommending music with apache spark ml
analyzing hdfs files using apace spark and mapreduce FixedLengthInputformat
analyzing hdfs files using apace spark and mapreduce FixedLengthInputformat
Multiclassification with Decision Tree in Spark MLlib 1.3
Multiclassification with Decision Tree in Spark MLlib 1.3
Email Classifier using Spark 1.3 Mlib / ML Pipeline
Email Classifier using Spark 1.3 Mlib / ML Pipeline
Dernier
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Delhi Call girls
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Malak Abu Hammad
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
The Digital Insurer
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Delhi Call girls
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
Slack Application Development 101 Slides
Slack Application Development 101 Slides
praypatel2
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Michael W. Hawkins
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
Pixlogix Infotech
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
RTylerCroy
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Delhi Call girls
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
UK Journal
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Enterprise Knowledge
Dernier
(20)
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Slack Application Development 101 Slides
Slack Application Development 101 Slides
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Integrating data stored in rdbms and hadoop
1.
Integrating data stored
in RDBMS and Hadoop leoricklin@gmail.com
2.
Problem: How to process
data stored in the RDBMS (master tables) and Hadoop (log tables) within one platform ?
3.
Pros: ● 過程簡單, 讀取RDBMS成為RDD 後進行操作 Cons: ●
系統設定複雜, 需開通所有spark nodes 對 RDBMS的接口 ● JdbcRDD僅能使用在範圍查詢語法, 且範圍條件欄位限制為Long型態 => 當作spark nodes分佈存取的依 據 Solution 1: Spark with JdbcRDD
4.
Mysql table schema mysql>
create table calendar ( date_cd int not null ,wd_flag INT not null) engine=csv; mysql> select date_cd, wd_flag from calendar where date_cd >= 20130101 and date_cd <= 20141031; +----------+---------+ | date_cd | wd_flag | +----------+---------+ | 20141001 | 0 | | 20141002 | 0 | .... 62 rows in set (0.05 sec)
5.
利用JdbcRDD讀取Mysql $SPARK_HOME/bin/spark-shell --jars /usr/share/java/mysql-connector-java-5.1.17.jar
-- master yarn-client --num-executors 4 --executor-cores 2 --executor-memory 4g --driver- memory 1g scala> import org.apache.spark.rdd.JdbcRDD import java.sql.{Connection, DriverManager, ResultSet} val url="jdbc:mysql://mysqlserver.com:3306/db" val username = "user" val password = "pwd" val driverName="com.mysql.jdbc.Driver" case class CAL(date_cd:String, hd_flag:Int) val calrdd = new JdbcRDD( sc , () => DriverManager.getConnection(url,username,password) , "select date_cd, wd_flag from calendar where date_cd >= ? and date_cd <= ?" , 20130101, 20141031, 12 , r =>( CAL(r.getString(1),r.getInt(2)) ) ) // calrdd.count: Long = 62 // scheduler.DAGScheduler: Job 0 finished: ... took 1.429489 s
6.
利用HadooopRDD讀取HDFS case class LOG(req_date:String,
operation:String, http_code:Int) val path = "hdfs://mycluster/tmp/log.CSV" val logrdd = sc.textFile(path, 12 // ret: RDD[String] ).map( i => Helper.tokenize(i,",",true) // ret: RDD[Array[String]] ).map(ary => Helper.replaceChar(ary,"-","0") // ret: RDD[Array[String]] ).map(ary => toLOG(ary)) // ret: RDD[LOG] // logrdd.count: Long = 4000000 // scheduler.DAGScheduler: Job 1 finished: ... took 10.404451 s
7.
進行RDD join import org.apache.spark.SparkContext._ val
logkv = logrdd.map( i => (i.req_date.substring(0,8), i) ) // ret: RDD[String, LOG] val calkv = calrdd.map( i => (i.date_cd, i) ) // ret: RDD[String, CAL] val joinrdd = logkv.join(calkv) /* joinrdd.first: : (String, (LOG, CAL)) = ( 20130129,( LOG(20130129231106,Put,200) ,CAL(20130129,1)) ) scheduler.DAGScheduler: Job 20 finished... took 97.416966 s */ .
8.
Solution 2: Spark
with JDBC Client Pros: ● 系統設定簡單, 僅開通driver nodes 對 RDBMS的接口 ● 支援原生SELECT語法 ● SQL查詢結果可自訂為RDD or Broadcast variable. Cons: ● 過程略複雜, 須將ResultSet在 driver node轉為RDD(或BV)後進 行操作 ● RDBMS資料量受限於driver node MEM限制
9.
ref Mysql table
schema
10.
利用Jdbc Driver讀取Mysql val conn:java.sql.Connection
= SqlHelper.getConn(driverName, uri, username, password) val stmt:java.sql.Statement = SqlHelper.getStmt(conn) val sql="select date_cd,wd_flag from calendar where date_cd >= 20130101 and date_cd <= 20141031" val ret:java.sql.ResultSet = SqlHelper.getResult(stmt, sql) val rows = new scala.collection.mutable.ListBuffer[Tuple2[String,Int]]() while (ret.next) { rows += calToTup(ret) } // rows.size:Int = 62 // rows(0):(String, Int) = (20141001,0) // return immediately val calrdd = sc.parallelize(rows, 12)
11.
ref 利用HadooopRDD讀取HDFS ref 進行RDD
join
12.
進行Map-side Join ... val sqlkv
= new scala.collection.mutable.HashMap[String,Int]() while (ret.next) { sqlkv += calToTup(ret) } val sqlbc = sc.broadcast(sqlkv) val logkv = logrdd.map( ... ) /* 利用HadooopRDD讀取HDFS並轉換為RDD[String,LOG] */ val result = logkv.mapPartitions( { iter => var sqlkv = sqlbc.value for{ (key, value) <- iter if(sqlkv.contains(key)) } yield (key, (value, sqlkv.getOrElse(key, () => ""))) }) /* result.first: (String, (LOG, Any)) = ( 20130129,( LOG(20130129231106,Put,200) ,1 ) ) scheduler.DAGScheduler: Job 0 finished... took 2.051693 s */
13.
Summary ● JdbcRDD方式 o Pros:
程式撰寫較簡易 o Cons: 系統設定較複雜 ● JDBC Clien方式 o Pros: 系統設定較簡易 o Cons: 程式撰寫較複雜(但可進行性能優化)
Télécharger maintenant