More Related Content
Similar to Moving Data Between Exadata and Hadoop
Similar to Moving Data Between Exadata and Hadoop (20)
Moving Data Between Exadata and Hadoop
- 2. www.enkitec.com++ 2+++
Intro:+About+me+
• Tanel+Põder+
• Former+Oracle+Database+Performance+geek+
• Present+Exadata+Performance+geek+
• Future+Hadoop+Perfomance+geek+
• My+Exadata+experience+
• 2009+...+2013+
• Exadata+V1+…+X3+
• MulOPrack+Exadatas+
• MixedPrack+Exadatas+
• My+Hadoop+Experience+
• Ask+again+next+year+;P)+
+
Expert'Oracle'Exadata'
book+
(with+Kerry+Osborne+and+
Randy+Johnson+of+Enkitec)+
- 3. www.enkitec.com++ 3+++
About+Enkitec+
• Enkitec+
• North+America+
• EMEA+
+
• 100++staff+
• In+US,+Europe+
• Consultants+with++
Oracle+experience++
of+15++years+on+average+
• What+makes+us+so+awesome+
• 200+'Exadata'implementaBons'to'date'
+
• Enkitec+ExaPLab++
• We+have+3+Exadatas+(V2,+X2P2,+X3P2)+
• FullPRack+Big+Data+Appliance+
• ExalyOcs+
• ODA+
Everything'Exa'
'
Planning/PoC+
ImplementaOon+
ConsolidaOon+
MigraOon+
Backup/Recovery+
Patching+
TroubleshooOng+
Performance+
Capacity+
Training+
- 7. www.enkitec.com++ 7+++
Oracle+SQL+Connector+for+HDFS+
CREATE TABLE "TANEL"."TERASORT_1T_100"
( "TOKEN_TYPE" VARCHAR2(4000),
"DATE_MONTH" VARCHAR2(4000),
"TOKEN_COUNT" VARCHAR2(4000),
"TOKEN_VALUE" VARCHAR2(4000)
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "EXT_HDFS_TEST_DIR"
ACCESS PARAMETERS
( RECORDS DELIMITED BY 0X'0A'
PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream'
FIELDS TERMINATED BY 0X'3058273927'
( "TOKEN_TYPE" CHAR(4000),
"DATE_MONTH" CHAR(4000),
"TOKEN_COUNT" CHAR(4000),
"TOKEN_VALUE" CHAR(4000)
)
)
LOCATION
( 'osch-tanel-00000',
'osch-tanel-00001',
'osch-tanel-00002',
'osch-tanel-00003'
)
) ...
Visible+to+Oracle+as+an+
External+Table.+
Parallelizable.+Insert+select,+
CTAS+
The+PREPROCESSOR+
program+hdfs_stream+is+a+
java+program+capable+of+
reading/streaming+files+from+
HDFS+
The+Oracle+SQL+Connector+
Data+"locaOon+pointer"+files+
to+1'TB+of+data+
- 8. www.enkitec.com++ 8+++
OSCH+data+locaOon+files+
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<locationFile>
<header>
<version>1.0</version>
<fileName>osch-20130708020324-4644-1</fileName>
<createDate>2013-07-08T14:03:24</createDate>
<publishDate>2013-07-08T02:03:24</publishDate>
<productName>Oracle SQL Connector for HDFS Release 2.1.0 - Production</productName>
<productVersion>2.1.0</productVersion>
</header>
<uri_list>
<uri_list_item size="10000000000" compressionCodec="">
hdfs://enkbda-ns/user/acolvin/terasort/part-00000
</uri_list_item>
<uri_list_item size="10000000000" compressionCodec="">
hdfs://enkbda-ns/user/acolvin/terasort/part-00006
</uri_list_item>
<uri_list_item size="10000000000" compressionCodec="">
hdfs://enkbda-ns/user/acolvin/terasort/part-00008
</uri_list_item>
<uri_list_item size="10000000000" compressionCodec="">
hdfs://enkbda-ns/user/acolvin/terasort/part-00014
</uri_list_item>
<uri_list_item size="10000000000" compressionCodec="">
hdfs://enkbda-ns/user/acolvin/terasort/part-00016
</uri_list_item>
...
Each+"locaOon+pointer"+file+
the+external+table+loader+
uses+points+to+one+or+more+
actual+HDFS+files+
+
(this+config+file+is+edited+for+
fomaong+purposes)+
- 13. www.enkitec.com++ 13+++
Increase+Max+Allowed+External+Table+Parallelism+
CREATE TABLE terasort_1t_100 (
...
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "EXT_HDFS_TEST_DIR"
...
PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream'
...
LOCATION
(
'osch-tanel-00000'
, 'osch-tanel-00001'
, 'osch-tanel-00002'
, 'osch-tanel-00003'
, 'osch-tanel-00004'
, 'osch-tanel-00005'
, 'osch-tanel-00006'
, 'osch-tanel-00007'
, 'osch-tanel-00008'
, 'osch-tanel-00009'
, 'osch-tanel-00010'
...
, 'osch-tanel-00098'
, 'osch-tanel-00099'
)
...
SoluOon:+Create+more+
"locaOon+pointer"+files.++
100+"locaOon+pointer+files",+
each+poinOng+to+a+single+
HDFS+file+(in+my+test)+
This+allows+up#to+100+slaves+
in+parallel,+accessing+one+
HDFS+stream+each.+
- 14. www.enkitec.com++ 14+++
More+"finePgrained"+OSCH+data+locaOon+files+
$ cat osch-tanel-00099
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<locationFile>
<header>
<version>1.0</version>
<fileName>osch-tanel-00099</fileName>
<createDate>2013-07-08T14:03:24</createDate>
<publishDate>2013-07-08T02:03:24</publishDate>
<productName>Oracle SQL Connector for HDFS Release 2.1.0 - Production</productName>
<productVersion>2.1.0</productVersion>
</header>
<uri_list>
<uri_list_item size="10000000000" compressionCodec="">
hdfs://enkbda-ns/user/acolvin/terasort/part-00099
</uri_list_item>
</uri_list>
</locationFile>
$ ls -l osch-tanel*
-rwxr-xr-x 1 nobody users 598 Sep 24 12:07 osch-tanel-00000
-rwxr-xr-x 1 nobody users 598 Sep 24 12:07 osch-tanel-00001
-rwxr-xr-x 1 nobody users 598 Sep 24 12:07 osch-tanel-00002
-rwxr-xr-x 1 nobody users 598 Sep 24 12:07 osch-tanel-00003
...
-rwxr-xr-x 1 nobody users 598 Sep 24 12:07 osch-tanel-00099
100+files,+allowing+up'to+100+
HDFS+streams+in+parallel.+
+
With+less+PX+slaves,+each+
slave+can+access+mulOple+
files+sequenOally.++
- 18. www.enkitec.com++ 18+++
Drilling+deeper+into+the+CPU+usage+
SQL> @ostackprof 788 0.1 100
Below is the stack prefix common to all samples:
------------------------------------------------------------------------
Frame->function()
------------------------------------------------------------------------
# 49 ->main()
.... some lines snipped .....
# 11 ->pextproc()
# 10 ->spefmccallstd()
# 9 ->spefcpfa()
# 8 ->qxxqFetch()
# 7 ->kpxsFetch()
# 6 ->kpxsFetchField()
# 5 ->kpxsFetchDriver()
.... some lines snipped .....
# -#--------------------------------------------------------------------
# - Num.Samples -> in call stack()
# ----------------------------------------------------------------------
35 ->kudmxfe()->kudmdtp()->lxoSchPat()
25 ->kudmxfe()->kudmdtp()->lxmfwdx()
23 ->kudmxfe()->kudmdtp()->
4 ->kpxsDoConvert()->OCIDirPathColArrayToStream()->kpudpcs_colArrayToStream()-
>kpudpcsf_intColArrayToStream()
3 ->kudmxfe()->lxmfwdx()
3 ->kudmxfe()->kudmrn()->kudmrt()
2 ->qerxtCBFetch()->qerxtProcessRows()->qeaeCn1Serial()
2 ->qerxtCBFetch()->qerxtProcessRows()->klxprParseRow()
1 ->OCIDirPathColArrayReset()
83%+of+Ome+spent+in+
datatype+conversion+(kudm)+
++
60%+in+lx*+funcOons+–+string/
datatype+processing++