SlideShare a Scribd company logo
1 of 35
Leveraging Hadoop for Legacy Systems

   Mathias Herberts - @herberts
Crédit Mutuel Arkéa key Facts & Figures (as of 2011-06-30)
A Regional Bank with a National Network
Why Hadoop?
Why Hadoop?
▪ Ever increasing volume of data

▪ Very regulated sector (Basel II/III, Solvency II)

    ▪ Need to produce compliance reports

▪ Competitive sector

    ▪ Need to create value, data identified as a great source of it

▪ Keep costs under control
▪ Fond of Open Source
▪ Engineers like big challenges
What Challenge?
Storing Data
Types of logical storage


      Virtual Storage Access Method
      Record-oriented (fixed or variable length) indexed datasets




      Physically Sequential
      Record-oriented (fixed or variable length) datasets, not indexed
      Can exist on different types of media




      IBM DB2 Relational Model Database Server
Types of binary records stored

COBOL Records (conform to a COPYBOOK)




DB2 'UNLOAD' Records (conform to a DDL statement)
Types of data stored in HDFS


      {Tab, Comma, ...} Separated Values
      One line records of multiple columns




      Text
      Line-oriented (eg logs)




      Hadoop SequenceFiles
      Block compressed
       ▪ Mostly BytesWritable key/value
        ▪ COBOL records
        ▪ DB2 unloaded records
        ▪ Serialized Thrift structures
       ▪ Use of DefaultCodec (pure Java)
Moving Data
Standard data transfer process




  ▪ On the fly charset conversion
  ▪ Loss of notion of records
Hadoop data transfer process




  ▪ On the fly compression
  ▪ Keep original charset
  ▪ Preserved notion of records
Staging Server



▪ Gateway In & Out of an HDFS Cell
▪ Reads/Writes to /hdfs/staging/{in,out}/... (runs as hdfs)
▪ HTTP Based (POST/GET)

▪ Upload to http://hadoop-staging/put[/hdfs/staging/in/...]
   Stores directly in HDFS, no intermediary storage
   Multiple files support
   Random target directory created if none specified
   Parameters user, group, perm, suffix
   curl -F "file=@local;filename=remote" http://../put?user=foo&group=bar&perm=644&suffix=.test


▪ Download from http://hadoop-staging/get/hdfs/staging/out/...
   Ability to unpack SequenceFile records (unpack={base64,hex}) as key:value lines
fileutil



▪   Swiss Army Knife for SequenceFiles, HDFS Staging Server, ZooKeeper
▪   Written in Java, single jar
▪   Works in all our environments (z/OS, Unix, Windows, ...)
▪   Can be ran using TWS/OPC on z/OS (via a JCL), $Universe on Unix, cron ...
▪   Multiple commands
      sfstage            Convert a z/OS dataset to a SF and push it to the staging server
      {stream,file}stage Push a stream or files to the staging server
      filesfstage        Convert a file to a SF (one record per block) and stage it
      sfpack             Pack key:value lines (cf unpack) in a SequenceFile
      sfarchive          Create a SequenceFile, one record per input file
      zk{ls,lsr,cat,stat}Read data from ZooKeeper
      get                Retrieve data via URI
      ...
Accessing Data
Data Organization


▪ Use of a directory structure that mimics the datasets names

      PR0172.PVS00.F7209588

      Environment / Silo / Application

      /hdfs/data/si/PR/01/72/PR0172.PVS00.F7209588.SUFFIX

▪   Group ACLs at the Environment/Silo/Application levels
▪   Suffix is mainly used to add .YYYYMM to Generation Data Groups
▪   Suffix added by the staging server
▪   DB2 Table unloads follow similar rules

      P11DBA.T90XXXX
      S4SDWH11.T4S02CTSC_H
Bastion Hosts



▪   Hadoop Cells are isolated, all accesses MUST go through a bastion host
▪   All accesses to the bastion hosts are authenticated via SSH keys
▪   Users log in using their own user
▪   No SSH port forwarding allowed
▪   All shell commands are logged
▪   Batches scheduled on bastion hosts by $Universe (use of ssh-agent)

▪ Bastion hosts can interact with their HDFS cell (hadoop fs commands)
▪ Bastion hosts can launch jobs

▪ Admin tasks, user provisioning done on NameNode

▪ Kerberos Security not used (yet?)
▪ Need for pluggable security mechanism, using SSH signed tokens
Working With Data
We are a Piggy bank ...
                      Attribution: www.seniorliving.org
Why Pig?



▪ We <3 the '1 relation per line' approach, « no SQHell™ »




▪ No metadata service to maintain
▪ Ability to add UDFs
    ▪ A whole lot already added, more on this later...

▪ Batch scheduling
▪ Can handle all the data we store in HDFS

▪ Still open to other tools (Hive, Zohmg, ...)
com.arkea.commons.pig.SequenceFileLoadFunc




▪ Generic load function for our BytesWritable SequenceFiles
▪ Relies on Helper classes to interpret the record bytes
    SequenceFileLoadFunc('HelperClass', 'param', ...)
▪ Helper classes can also be used in regular MapReduce jobs

▪ SequenceFileLoadFunc outputs the following schema

{
    key: bytearray,
    value: bytearray,
    parsed: (
      Helper dependent schema
    )
}
Helper Classes



▪ COBOL – com.arkea.commons.pig.COBOLBinaryHelper
        ▪ COPYBOOK
▪ Thrift – com.arkea.comons.pig.ThriftBinaryHelper
        ▪ .class
▪ DB2 Unload – com.arkea.commons.pig.DB2UnloadBinaryHelper
        ▪ DDL + load script
▪ MySQL – com.arkea.commons.pig.MySQLBinaryHelper
        ▪ DDL
▪ ...
Initial Pig Target




           'proc sql' SAS Corpus
                          from sample to population


Need to give users tools that can reproduce what they did in their scripts
Groovy Closure Pig UDF



DEFINE InlineGroovyUDF cac.pig.udf.GroovyClosure(SCHEMA, CODE);

DEFINE FileGroovyUDF cac.pig.udf.GroovyClosure(SCHEMA, '/path/to/closure.groovy');




SCHEMA uses the standard Pig Schema syntax, i.e. 'str: chararray'

CODE is a short Groovy Closure, i.e. '{ a,b,c -> return a.replaceAll(b,c); }'

closure.groovy must be in a REGISTERed jar under path/to
//
// Import statements
//

import ....;

//
// Constants definitions
//

/**
 * Documentation for XXX
 */
final def XXX = ....;

//
// Closure definition
//

/**
  * Documentation for CLOSURE
  *
  * @param a ...
  * @param b ...
  * @param ...
  *
  * @return ...
  */
final def CLOSURE = {
    a,b,... ->
    ...
    ...
    return ...;
}

//
// Unit Tests
//

// Test specific comment ...
assert CLOSURE('A') == ...;

//
// Return Closure for usage in Pig
//

return CLOSURE;
Pig to Groovy

bag -> java.util.List
tuple -> Object[]
map -> java.util.Map
int -> int
long -> long
float -> float
double -> double
chararray -> java.lang.String
bytearray -> byte[]

Groovy to Pig

groovy.lang.Tuple -> tuple
Object[] -> tuple
java.util.List -> bag
java.util.Map -> map
byte/short/int -> int
long/BigInteger -> long
float -> float
double/BigDecimal -> double
java.lang.String -> chararray
byte[] -> bytearray
Wrap Up
⊕

▪ Fast and rich data pipeline between z/OS and Hadoop

▪ Pig Toolbox to analyze COBOL/DB2 data alongside Thrift/MySQL/xSV/...

▪ Groovy Closure support for rapid extension


▪ Still some missing features
    Pure Java compression codecs (JNI on z/OS anyone?)
    Pig support for BigInteger / BigDecimal (245 might not be enough)
    SSH(RSA) based auth tokens



▪ And yet another hard challenge: Cultural Change
http://www.arkea.com/



      @herberts
Appendix
com.arkea.commons.pig.COBOLBinaryHelper
REGISTER copybook.jar;
A = LOAD '$data' USING cacp.SequenceFileLoadFunc('cacp.COBOLBinaryHelper','[PREFIX:]COPYBOOK');

        000010*GAR* OS Y7XRRDC         DESCRIPTION RRDC NOUVAU FORMAT               30000020
        000020* LG=00328, ESD MAJ LE 04/12/98, ELS MAJ LE 26/01/01 PAR   C98310     30000030
        000030* GENERE LE 26/01/01 A 17H01, PFX : Y7XRRD-     MEMBRE :   Y7XRRDC    30000040
        000040 01         Y7XRRD-Y7XRRDC.                                           30000050
        000050*             DESCRIPTION RRDC NOUVAU FORMAT           1   04/12/98   30000060
        000060   03       Y7XRRD-ARTDS-CLE-SECD.                                    30000070   A: {
        000070*             CLE SECONDAIRE ARCHIVAGE TENU DE SOLDE   1   11/02/98   30000080     key: bytearray,
        000080     05     Y7XRRD-NO-CCM       PIC X(4).                             30000090
        000090*             NUMERO CAISSE                            1   28/12/94   30000100
                                                                                                 value: bytearray,
        000100     05     Y7XRRD-NO-PSE       PIC X(8).                             30000110     parsed: (
        000110*             NUMERO PERSONNE                          5   10/07/97   30000120        Y7XRRD_Y7XRRDC: bytearray,
        000120     05     Y7XRRD-CATEGORIE    PIC X(2).                             30000130
        000130*             CATéGORIE DU COMPTE                     13   09/01/01   30000140        Y7XRRD_ARTDS_CLE_SECD: bytearray,
        000140     05     Y7XRRD-RANG         PIC X(2).                             30000150        Y7XRRD_NO_CCM: chararray,
        010010*             RANG                                    15   22/01/01   30000160
        010020     05     Y7XRRD-NO-ORDRE     PIC X(2).                             30000170        Y7XRRD_NO_PSE: chararray,
        010030*             Numéro d'ordre                          17   28/12/94   30000180        Y7XRRD_CATEGORIE: chararray,
        010040     05     Y7XRRD-DA-TT-C2     PIC X(8).                             30000190
        010050*             DATE TRAITEMENT                 SX:-C2 19     -   -     30000200        Y7XRRD_RANG: chararray,
        010060     05     Y7XRRD-NO-ORDRE-ENR-C2 PIC 9(6).                          30000210        Y7XRRD_NO_ORDRE: chararray,
        010070*             Numéro d'ordre enregistrement   SX:-C2 27     -   -     30000220
        010080   03       Y7XRRD-MT-OPE-TDS   PIC S9(13)V9(2) COMP-3.               30000230
                                                                                                    Y7XRRD_DA_TT_C2: chararray,
        010090*             MONTANT OPERATION TENUE-DE-SOLDE        33   03/02/98   30000240        Y7XRRD_NO_ORDRE_ENR_C2: long,
        010100   03       Y7XRRD-CD-DVS-ORI-OPE PIC X(4).                           30000250
        010110*             CODE DEVISE ORIGINE OPERATION           41    -   -     30000260
                                                                                                    Y7XRRD_MT_OPE_TDS: double,
        010120   03       Y7XRRD-CD-DVS-GTN-TDS PIC X(4).                           30000270        Y7XRRD_CD_DVS_ORI_OPE: chararray,
        010130*             CODE DEVISE GESTION TENUE-DE-SOLDE      45    -   -     30000280        Y7XRRD_CD_DVS_GTN_TDS: chararray,
        010140   03       Y7XRRD-MT-CNVS-OPE PIC S9(13)V9(2) COMP-3.                30000290
        020010*             MONTANT CONVERTI OPERATION              49    -   -     30000300        Y7XRRD_MT_CNVS_OPE: double,
        020020   03       Y7XRRD-IDC-ATN-ORI-MT PIC X(1).                           30000310        Y7XRRD_IDC_ATN_ORI_MT: chararray,
        020030*             INDICATEUR AUTHENTICITE ORIGINE MONTAN 57    05/12/97   30000320
        020040   03       Y7XRRD-SLD-AV-IMPT PIC S9(13)V9(2) COMP-3.                30000330        Y7XRRD_SLD_AV_IMPT: double,
        020050*             SOLDE AVANT IMPUTATION                  58   03/02/98   30000340        Y7XRRD_DA_OPE_TDS: chararray,
        020060   03       Y7XRRD-DA-OPE-TDS   PIC X(8).                             30000350
        020070*             DATE OPERATION TENUE-DE-SOLDE           66    -   -     30000360        Y7XRRD_DA_VLR: chararray,
        020080   03       Y7XRRD-DA-VLR       PIC X(8).                             30000370        Y7XRRD_DA_ARR: chararray,
        020090*             DATE VALEUR                             74   28/12/94   30000380
        020100   03       Y7XRRD-DA-ARR       PIC X(8).                             30000390
                                                                                                    Y7XRRD_NO_STR_OPE: chararray,
        020110*             DATE ARRETE                             82    -   -     30000400        Y7XRRD_NO_REF_TNL_MED: chararray,
        020120   03       Y7XRRD-NO-STR-OPE   PIC X(6).                             30000410        Y7XRRD_NO_LOT: chararray,
        020130*             NUMERO STRUCTURE OPERATIONNELLE         90    -   -     30000420
        020140   03       Y7XRRD-NO-REF-TNL-MED PIC X(4).                           30000430        Y7XRRD_TDS_LIBELLES: bytearray,
        030010*             NUMERO REFERENCE TERMINAL MEDIA         96   03/02/98   30000440        Y7XRRD_LIB_CLI_OPE_1: chararray,
        030020   03       Y7XRRD-NO-LOT       PIC X(3).                             30000450
        030030*             NUMéRO DE LOT                          100   13/10/97   30000460        Y7XRRD_LIB_ITE_OPE: chararray,
        030040   03       Y7XRRD-TDS-LIBELLES.                                      30000470        Y7XRRD_LIB_CT_CLI: chararray,
        030050*             FAMILLE MONTANTS OPERATION T.DE.SOLDE 103    05/02/98   30000480
        030060     05     Y7XRRD-LIB-CLI-OPE-1 PIC X(50).                           30000490        Y7XRRD_CD_UTI_LIB_CPL: chararray,
        030070*             LIBELLE CLIENT OPERATION        SX:-1 103    03/02/98   30000500        Y7XRRD_IDC_COM_OPE: chararray,
        030080     05     Y7XRRD-LIB-ITE-OPE PIC X(32).                             30000510
        030090*             LIBELLE INTERNE OPERATION              153    -   -     30000520
                                                                                                    Y7XRRD_CD_TY_OPE_NIV_1: chararray,
        030100     05     Y7XRRD-LIB-CT-CLI   PIC X(32).                            30000530        Y7XRRD_CD_TY_OPE_NIV_2: chararray,
        030110*             LIBELLE COURT CLIENT                   185    -   -     30000540        FILLER: chararray,
        030120   03       Y7XRRD-CD-UTI-LIB-CPL PIC X(1).                           30000550
        030130*             Code utilisation libellés compl.       217   28/12/94   30000560        Y7XRRD_TDS_LIB_SUPPL: bytearray,
        030140   03       Y7XRRD-IDC-COM-OPE PIC X(1).                              30000570        Y7XRRD_LIB_CLI_OPE_02: chararray,
        040010*             INDICATEUR COMMISSION OPERATION        218   03/02/98   30000580
        040020   03       Y7XRRD-CD-TY-OPE-NIV-1 PIC X(1).                          30000590        Y7XRRD_LIB_CLI_OPE_03: chararray
        040030*             CODE TYPE OPERATION NIVEAU UN          219    -   -     30000600     )
        040040   03       Y7XRRD-CD-TY-OPE-NIV-2 PIC X(2).                          30000610
        040050*             CODE TYPE OPERATION NIVEAU DEUX        220    -   -     30000620   }
        040060   03       FILLER              PIC X(7).                             30000630
        040070*                                                    222              30000640
        040080   03       Y7XRRD-TDS-LIB-SUPPL.                                     30000650
        040090*             FAMILLE LIBELLES COMPLEMENTAIRES T.D.S 229   17/02/98   30000660
        040100     05     Y7XRRD-LIB-CLI-OPE-02 PIC X(50).                          30000670
        040110*             LIBELLE CLIENT OPERATION        SX:-02 229   03/02/98   30000680
        040120     05     Y7XRRD-LIB-CLI-OPE-03 PIC X(50).                          30000690
        040130*             LIBELLE CLIENT OPERATION        SX:-03 279    -   -     30000700
com.arkea.commons.pig.DB2UnloadBinaryHelper
 REGISTER ddl-load.jar;
 A = LOAD '$data' USING cacp.SequenceFileLoadFunc('cacp.DB2UnloadBinaryHelper','[PREFIX:]TABLE');



        CREATE TABLE SHDBA.TBDCOLS
        (COL_CHAR CHAR(4) FOR SBCS DATA WITH DEFAULT NULL,
        COL_DECIMAL DECIMAL(15, 2) WITH DEFAULT NULL,
        COL_NUMERIC DECIMAL(15, 0) WITH DEFAULT NULL,
.ddl    COL_SMALLINT SMALLINT WITH DEFAULT NULL,
        COL_INTEGER INTEGER WITH DEFAULT NULL,                                          A: {
        COL_VARCHAR VARCHAR(50) FOR SBCS DATA WITH DEFAULT NULL,                          key: bytearray,
        COL_DATE DATE WITH DEFAULT NULL,                                                  value: bytearray,
        COL_TIME TIME WITH DEFAULT NULL,                                                  parsed: (
        COL_TIMESTAMP TIMESTAMP WITH DEFAULT NULL) ;                                         COL_CHAR: chararray,
                                                                                             COL_DECIMAL: double,
                                                                                             COL_NUMERIC: long,
                                                                                             COL_SMALLINT: long,
        TEMPLATE DFEM8ERT
                                                                                             COL_INTEGER: long,
        DSN('XXXXX.PPSDR.B99BD02.SBDCOLS.REC')
                                                                                             COL_VARCHAR: chararray,
        DISP(OLD,KEEP,KEEP)
                                                                                             COL_DATE: chararray,
        LOAD DATA INDDN DFEM8ERT LOG NO RESUME YES
                                                                                             COL_TIME: chararray,
        EBCDIC CCSID(01147,00000,00000)
                                                                                             COL_TIMESTAMP:
        INTO TABLE "SHDBA"."TBDCOLS"
                                                                                        chararray
        WHEN(00001:00002) = X'003F'
                                                                                          )
.load   ( "COL_CHAR" POSITION( 00004:00007) CHAR(00004) NULLIF(00003)=X'FF',
                                                                                        }
        "COL_DECIMAL" POSITION( 00009:00016) DECIMAL NULLIF(00008)=X'FF',
        "COL_NUMERIC" POSITION( 00018:00025) DECIMAL NULLIF(00017)=X'FF',
        "COL_SMALLINT" POSITION( 00027:00028) SMALLINT NULLIF(00026)=X'FF',
        "COL_INTEGER" POSITION( 00030:00033) INTEGER NULLIF(00029)=X'FF',
        "COL_VARCHAR" POSITION( 00035:00086) VARCHAR NULLIF(00034)=X'FF',
        "COL_DATE" POSITION( 00088:00097) DATE EXTERNAL NULLIF(00087)=X'FF',
        "COL_TIME" POSITION( 00099:00106) TIME EXTERNAL NULLIF(00098)=X'FF',
        "COL_TIMESTAMP" POSITION( 00108:00133) TIMESTAMP EXTERNAL NULLIF(00107)=X'FF'
        )



Can also handle DB2 UDB unloads (done using hpu)
we're Thrifty too...
   REGISTER thrift-generated.jar;
   A = LOAD '$data' USING cacp.SequenceFileLoadFunc('cacp.ThiftBinaryHelper','CLASS');



                       struct Redirection{                    A: {
                         1: string alias,                       key: bytearray,
                         2: string url,                         value: bytearray,
                         3: string email,                       parsed: (
                         4: i64 timestamp,                         alias: chararray,
                         5: i64 lastupdate,                        url: chararray,
                         6: list<string> params,                   email: chararray,
                         7: bool external = 1,                     timestamp: long,
                         8: i64 owner,                             lastupdate: long,
                         9: string user,                           params: (),
                       }                                           external: long,
                                                                   owner: long,
                                                                   user: chararray
                                                                )
                                                              }




... and also use MySQL ...

   REGISTER mysql-ddl.jar;
   A = LOAD '$data' USING cacp.SequenceFileLoadFunc('cacp.MySQLBinaryHelper','TABLE');



... etc etc etc ...

More Related Content

What's hot

John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenPostgresOpen
 
2013 0928 programming by cuda
2013 0928 programming by cuda2013 0928 programming by cuda
2013 0928 programming by cuda小明 王
 
The Ring programming language version 1.9 book - Part 90 of 210
The Ring programming language version 1.9 book - Part 90 of 210The Ring programming language version 1.9 book - Part 90 of 210
The Ring programming language version 1.9 book - Part 90 of 210Mahmoud Samir Fayed
 
Migrating KSM page causes the VM lock up as the KSM page merging list is too ...
Migrating KSM page causes the VM lock up as the KSM page merging list is too ...Migrating KSM page causes the VM lock up as the KSM page merging list is too ...
Migrating KSM page causes the VM lock up as the KSM page merging list is too ...Gavin Guo
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopSages
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and SparkJosef Adersberger
 
AnyMQ, Hippie, and the real-time web
AnyMQ, Hippie, and the real-time webAnyMQ, Hippie, and the real-time web
AnyMQ, Hippie, and the real-time webclkao
 
"You shall not pass : anti-debug methodics"
"You shall not pass : anti-debug methodics""You shall not pass : anti-debug methodics"
"You shall not pass : anti-debug methodics"ITCP Community
 
Powered by Python - PyCon Germany 2016
Powered by Python - PyCon Germany 2016Powered by Python - PyCon Germany 2016
Powered by Python - PyCon Germany 2016Steffen Wenz
 
PyCon KR 2019 sprint - RustPython by example
PyCon KR 2019 sprint  - RustPython by examplePyCon KR 2019 sprint  - RustPython by example
PyCon KR 2019 sprint - RustPython by exampleYunWon Jeong
 
Unleash your inner console cowboy
Unleash your inner console cowboyUnleash your inner console cowboy
Unleash your inner console cowboyKenneth Geisshirt
 
Hopping in clouds - phpuk 17
Hopping in clouds - phpuk 17Hopping in clouds - phpuk 17
Hopping in clouds - phpuk 17Michele Orselli
 
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...Altinity Ltd
 
Introduction to cloudforecast
Introduction to cloudforecastIntroduction to cloudforecast
Introduction to cloudforecastMasahiro Nagano
 
Look Ma, “update DB to HTML5 using C++”, no hands! 
Look Ma, “update DB to HTML5 using C++”, no hands! Look Ma, “update DB to HTML5 using C++”, no hands! 
Look Ma, “update DB to HTML5 using C++”, no hands! aleks-f
 

What's hot (20)

John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
 
2013 0928 programming by cuda
2013 0928 programming by cuda2013 0928 programming by cuda
2013 0928 programming by cuda
 
The Ring programming language version 1.9 book - Part 90 of 210
The Ring programming language version 1.9 book - Part 90 of 210The Ring programming language version 1.9 book - Part 90 of 210
The Ring programming language version 1.9 book - Part 90 of 210
 
Migrating KSM page causes the VM lock up as the KSM page merging list is too ...
Migrating KSM page causes the VM lock up as the KSM page merging list is too ...Migrating KSM page causes the VM lock up as the KSM page merging list is too ...
Migrating KSM page causes the VM lock up as the KSM page merging list is too ...
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache Hadoop
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and Spark
 
Ordered Record Collection
Ordered Record CollectionOrdered Record Collection
Ordered Record Collection
 
AnyMQ, Hippie, and the real-time web
AnyMQ, Hippie, and the real-time webAnyMQ, Hippie, and the real-time web
AnyMQ, Hippie, and the real-time web
 
"You shall not pass : anti-debug methodics"
"You shall not pass : anti-debug methodics""You shall not pass : anti-debug methodics"
"You shall not pass : anti-debug methodics"
 
Powered by Python - PyCon Germany 2016
Powered by Python - PyCon Germany 2016Powered by Python - PyCon Germany 2016
Powered by Python - PyCon Germany 2016
 
20190705 py data_paris_meetup
20190705 py data_paris_meetup20190705 py data_paris_meetup
20190705 py data_paris_meetup
 
Osol Pgsql
Osol PgsqlOsol Pgsql
Osol Pgsql
 
PyCon KR 2019 sprint - RustPython by example
PyCon KR 2019 sprint  - RustPython by examplePyCon KR 2019 sprint  - RustPython by example
PyCon KR 2019 sprint - RustPython by example
 
Unleash your inner console cowboy
Unleash your inner console cowboyUnleash your inner console cowboy
Unleash your inner console cowboy
 
Qt Network Explained (Portuguese)
Qt Network Explained (Portuguese)Qt Network Explained (Portuguese)
Qt Network Explained (Portuguese)
 
Hopping in clouds - phpuk 17
Hopping in clouds - phpuk 17Hopping in clouds - phpuk 17
Hopping in clouds - phpuk 17
 
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
 
Kyotoproducts
KyotoproductsKyotoproducts
Kyotoproducts
 
Introduction to cloudforecast
Introduction to cloudforecastIntroduction to cloudforecast
Introduction to cloudforecast
 
Look Ma, “update DB to HTML5 using C++”, no hands! 
Look Ma, “update DB to HTML5 using C++”, no hands! Look Ma, “update DB to HTML5 using C++”, no hands! 
Look Ma, “update DB to HTML5 using C++”, no hands! 
 

Viewers also liked

Finding SWOT Analysis in LexisNexis Academic
Finding SWOT Analysis in LexisNexis AcademicFinding SWOT Analysis in LexisNexis Academic
Finding SWOT Analysis in LexisNexis AcademicWarner Memorial Library
 
la revolucion y la educacion
la revolucion y la educacionla revolucion y la educacion
la revolucion y la educacionlalojass
 
Resume for Julie R Leake
Resume for Julie R LeakeResume for Julie R Leake
Resume for Julie R LeakeJulie R Leake
 
동네변호사『SX797』『СOM』바카라사이트
동네변호사『SX797』『СOM』바카라사이트동네변호사『SX797』『СOM』바카라사이트
동네변호사『SX797』『СOM』바카라사이트qasdhkjas
 
Certificado instalando e configurando o hyper v
Certificado instalando e configurando o hyper vCertificado instalando e configurando o hyper v
Certificado instalando e configurando o hyper vVitor Savicki
 
Vst annotation 4
Vst annotation 4Vst annotation 4
Vst annotation 4crimzon36
 
The New Rules of Celebrity Engagement in the era of Converged Media
The New Rules of Celebrity Engagement in the era of Converged MediaThe New Rules of Celebrity Engagement in the era of Converged Media
The New Rules of Celebrity Engagement in the era of Converged MediaUtsav Chaudhuri
 
Selamat datang nozz hotel
Selamat datang nozz hotelSelamat datang nozz hotel
Selamat datang nozz hotelbayu marga
 
Sesion 9. eduardo silva
Sesion 9. eduardo silvaSesion 9. eduardo silva
Sesion 9. eduardo silvalalojass
 
Ma sem 3 Elt1 Unit 1 Sarasvati and Satan
Ma sem 3 Elt1 Unit 1 Sarasvati and SatanMa sem 3 Elt1 Unit 1 Sarasvati and Satan
Ma sem 3 Elt1 Unit 1 Sarasvati and SatanParth Bhatt
 
What do you mean by universalization of education
What do you mean by universalization of educationWhat do you mean by universalization of education
What do you mean by universalization of educationKirti Matliwala
 

Viewers also liked (17)

October 2015 Indepth 1
October 2015 Indepth 1October 2015 Indepth 1
October 2015 Indepth 1
 
Finding SWOT Analysis in LexisNexis Academic
Finding SWOT Analysis in LexisNexis AcademicFinding SWOT Analysis in LexisNexis Academic
Finding SWOT Analysis in LexisNexis Academic
 
la revolucion y la educacion
la revolucion y la educacionla revolucion y la educacion
la revolucion y la educacion
 
Resume for Julie R Leake
Resume for Julie R LeakeResume for Julie R Leake
Resume for Julie R Leake
 
동네변호사『SX797』『СOM』바카라사이트
동네변호사『SX797』『СOM』바카라사이트동네변호사『SX797』『СOM』바카라사이트
동네변호사『SX797』『СOM』바카라사이트
 
Certificado instalando e configurando o hyper v
Certificado instalando e configurando o hyper vCertificado instalando e configurando o hyper v
Certificado instalando e configurando o hyper v
 
Penguin
PenguinPenguin
Penguin
 
Vst annotation 4
Vst annotation 4Vst annotation 4
Vst annotation 4
 
The New Rules of Celebrity Engagement in the era of Converged Media
The New Rules of Celebrity Engagement in the era of Converged MediaThe New Rules of Celebrity Engagement in the era of Converged Media
The New Rules of Celebrity Engagement in the era of Converged Media
 
Trabajo tics
Trabajo ticsTrabajo tics
Trabajo tics
 
Selamat datang nozz hotel
Selamat datang nozz hotelSelamat datang nozz hotel
Selamat datang nozz hotel
 
Sesion 9. eduardo silva
Sesion 9. eduardo silvaSesion 9. eduardo silva
Sesion 9. eduardo silva
 
Sonnet 18 final
Sonnet 18 finalSonnet 18 final
Sonnet 18 final
 
Ma sem 3 Elt1 Unit 1 Sarasvati and Satan
Ma sem 3 Elt1 Unit 1 Sarasvati and SatanMa sem 3 Elt1 Unit 1 Sarasvati and Satan
Ma sem 3 Elt1 Unit 1 Sarasvati and Satan
 
What do you mean by universalization of education
What do you mean by universalization of educationWhat do you mean by universalization of education
What do you mean by universalization of education
 
Vasculitis
VasculitisVasculitis
Vasculitis
 
London Behaviour Summit 2009 Robert Conway
London Behaviour Summit 2009  Robert ConwayLondon Behaviour Summit 2009  Robert Conway
London Behaviour Summit 2009 Robert Conway
 

Similar to Leveraging Hadoop for Legacy Systems

Examining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail FilesExamining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail FilesBobby Curtis
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterAndrey Kudryavtsev
 
Vectorization on x86: all you need to know
Vectorization on x86: all you need to knowVectorization on x86: all you need to know
Vectorization on x86: all you need to knowRoberto Agostino Vitillo
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11gfcamachob
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01Karam Abuataya
 
Cryptography and secure systems
Cryptography and secure systemsCryptography and secure systems
Cryptography and secure systemsVsevolod Stakhov
 
Direct SGA access without SQL
Direct SGA access without SQLDirect SGA access without SQL
Direct SGA access without SQLKyle Hailey
 
InfluxDB IOx Tech Talks: A Rusty Introduction to Apache Arrow and How it App...
InfluxDB IOx Tech Talks:  A Rusty Introduction to Apache Arrow and How it App...InfluxDB IOx Tech Talks:  A Rusty Introduction to Apache Arrow and How it App...
InfluxDB IOx Tech Talks: A Rusty Introduction to Apache Arrow and How it App...InfluxData
 
Gaztea Tech Robotica 2016
Gaztea Tech Robotica 2016Gaztea Tech Robotica 2016
Gaztea Tech Robotica 2016Svet Ivantchev
 
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3Skills Matter
 
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)Gavin Guo
 
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120Linaro
 
OpenWorld Sep14 12c for_developers
OpenWorld Sep14 12c for_developersOpenWorld Sep14 12c for_developers
OpenWorld Sep14 12c for_developersConnor McDonald
 
Designing High Performance RTC Signaling Servers
Designing High Performance RTC Signaling ServersDesigning High Performance RTC Signaling Servers
Designing High Performance RTC Signaling ServersDaniel-Constantin Mierla
 
REST made simple with Java
REST made simple with JavaREST made simple with Java
REST made simple with Javaelliando dias
 
CONFidence 2017: Hacking embedded with OpenWrt (Vladimir Mitiouchev)
CONFidence 2017: Hacking embedded with OpenWrt (Vladimir Mitiouchev)CONFidence 2017: Hacking embedded with OpenWrt (Vladimir Mitiouchev)
CONFidence 2017: Hacking embedded with OpenWrt (Vladimir Mitiouchev)PROIDEA
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby SystemsEngine Yard
 

Similar to Leveraging Hadoop for Legacy Systems (20)

Examining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail FilesExamining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail Files
 
dotCloud and go
dotCloud and godotCloud and go
dotCloud and go
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
Vectorization on x86: all you need to know
Vectorization on x86: all you need to knowVectorization on x86: all you need to know
Vectorization on x86: all you need to know
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11g
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01
 
Quic illustrated
Quic illustratedQuic illustrated
Quic illustrated
 
Cryptography and secure systems
Cryptography and secure systemsCryptography and secure systems
Cryptography and secure systems
 
Direct SGA access without SQL
Direct SGA access without SQLDirect SGA access without SQL
Direct SGA access without SQL
 
InfluxDB IOx Tech Talks: A Rusty Introduction to Apache Arrow and How it App...
InfluxDB IOx Tech Talks:  A Rusty Introduction to Apache Arrow and How it App...InfluxDB IOx Tech Talks:  A Rusty Introduction to Apache Arrow and How it App...
InfluxDB IOx Tech Talks: A Rusty Introduction to Apache Arrow and How it App...
 
Gaztea Tech Robotica 2016
Gaztea Tech Robotica 2016Gaztea Tech Robotica 2016
Gaztea Tech Robotica 2016
 
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
 
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
 
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
 
OpenWorld Sep14 12c for_developers
OpenWorld Sep14 12c for_developersOpenWorld Sep14 12c for_developers
OpenWorld Sep14 12c for_developers
 
Designing High Performance RTC Signaling Servers
Designing High Performance RTC Signaling ServersDesigning High Performance RTC Signaling Servers
Designing High Performance RTC Signaling Servers
 
REST made simple with Java
REST made simple with JavaREST made simple with Java
REST made simple with Java
 
CONFidence 2017: Hacking embedded with OpenWrt (Vladimir Mitiouchev)
CONFidence 2017: Hacking embedded with OpenWrt (Vladimir Mitiouchev)CONFidence 2017: Hacking embedded with OpenWrt (Vladimir Mitiouchev)
CONFidence 2017: Hacking embedded with OpenWrt (Vladimir Mitiouchev)
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby Systems
 

More from Mathias Herberts

2019-09-25 Paris Time Series Meetup - Warp 10 - Advanced Time Series Technolo...
2019-09-25 Paris Time Series Meetup - Warp 10 - Advanced Time Series Technolo...2019-09-25 Paris Time Series Meetup - Warp 10 - Advanced Time Series Technolo...
2019-09-25 Paris Time Series Meetup - Warp 10 - Advanced Time Series Technolo...Mathias Herberts
 
20170516 hug france-warp10-time-seriesanalysisontopofhadoop
20170516 hug france-warp10-time-seriesanalysisontopofhadoop20170516 hug france-warp10-time-seriesanalysisontopofhadoop
20170516 hug france-warp10-time-seriesanalysisontopofhadoopMathias Herberts
 
IoT Silicon Valley - Cityzen Sciences and Cityzen Data presentation
IoT Silicon Valley - Cityzen Sciences and Cityzen Data presentationIoT Silicon Valley - Cityzen Sciences and Cityzen Data presentation
IoT Silicon Valley - Cityzen Sciences and Cityzen Data presentationMathias Herberts
 
Big Data - Open Coffee Brest - 20121121
Big Data - Open Coffee Brest - 20121121Big Data - Open Coffee Brest - 20121121
Big Data - Open Coffee Brest - 20121121Mathias Herberts
 
WebScale Computing and Big Data a Pragmatic Approach
WebScale Computing and Big Data a Pragmatic ApproachWebScale Computing and Big Data a Pragmatic Approach
WebScale Computing and Big Data a Pragmatic ApproachMathias Herberts
 

More from Mathias Herberts (9)

2019-09-25 Paris Time Series Meetup - Warp 10 - Advanced Time Series Technolo...
2019-09-25 Paris Time Series Meetup - Warp 10 - Advanced Time Series Technolo...2019-09-25 Paris Time Series Meetup - Warp 10 - Advanced Time Series Technolo...
2019-09-25 Paris Time Series Meetup - Warp 10 - Advanced Time Series Technolo...
 
20170516 hug france-warp10-time-seriesanalysisontopofhadoop
20170516 hug france-warp10-time-seriesanalysisontopofhadoop20170516 hug france-warp10-time-seriesanalysisontopofhadoop
20170516 hug france-warp10-time-seriesanalysisontopofhadoop
 
IoT Silicon Valley - Cityzen Sciences and Cityzen Data presentation
IoT Silicon Valley - Cityzen Sciences and Cityzen Data presentationIoT Silicon Valley - Cityzen Sciences and Cityzen Data presentation
IoT Silicon Valley - Cityzen Sciences and Cityzen Data presentation
 
Big Data - Open Coffee Brest - 20121121
Big Data - Open Coffee Brest - 20121121Big Data - Open Coffee Brest - 20121121
Big Data - Open Coffee Brest - 20121121
 
Big Data Tribute
Big Data TributeBig Data Tribute
Big Data Tribute
 
Hadoop Pig Syntax Card
Hadoop Pig Syntax CardHadoop Pig Syntax Card
Hadoop Pig Syntax Card
 
Hadoop Pig
Hadoop PigHadoop Pig
Hadoop Pig
 
WebScale Computing and Big Data a Pragmatic Approach
WebScale Computing and Big Data a Pragmatic ApproachWebScale Computing and Big Data a Pragmatic Approach
WebScale Computing and Big Data a Pragmatic Approach
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 

Recently uploaded

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 

Recently uploaded (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 

Leveraging Hadoop for Legacy Systems

  • 1. Leveraging Hadoop for Legacy Systems Mathias Herberts - @herberts
  • 2. Crédit Mutuel Arkéa key Facts & Figures (as of 2011-06-30)
  • 3. A Regional Bank with a National Network
  • 5. Why Hadoop? ▪ Ever increasing volume of data ▪ Very regulated sector (Basel II/III, Solvency II) ▪ Need to produce compliance reports ▪ Competitive sector ▪ Need to create value, data identified as a great source of it ▪ Keep costs under control ▪ Fond of Open Source ▪ Engineers like big challenges
  • 7.
  • 9. Types of logical storage Virtual Storage Access Method Record-oriented (fixed or variable length) indexed datasets Physically Sequential Record-oriented (fixed or variable length) datasets, not indexed Can exist on different types of media IBM DB2 Relational Model Database Server
  • 10. Types of binary records stored COBOL Records (conform to a COPYBOOK) DB2 'UNLOAD' Records (conform to a DDL statement)
  • 11. Types of data stored in HDFS {Tab, Comma, ...} Separated Values One line records of multiple columns Text Line-oriented (eg logs) Hadoop SequenceFiles Block compressed ▪ Mostly BytesWritable key/value ▪ COBOL records ▪ DB2 unloaded records ▪ Serialized Thrift structures ▪ Use of DefaultCodec (pure Java)
  • 13. Standard data transfer process ▪ On the fly charset conversion ▪ Loss of notion of records
  • 14. Hadoop data transfer process ▪ On the fly compression ▪ Keep original charset ▪ Preserved notion of records
  • 15. Staging Server ▪ Gateway In & Out of an HDFS Cell ▪ Reads/Writes to /hdfs/staging/{in,out}/... (runs as hdfs) ▪ HTTP Based (POST/GET) ▪ Upload to http://hadoop-staging/put[/hdfs/staging/in/...] Stores directly in HDFS, no intermediary storage Multiple files support Random target directory created if none specified Parameters user, group, perm, suffix curl -F "file=@local;filename=remote" http://../put?user=foo&group=bar&perm=644&suffix=.test ▪ Download from http://hadoop-staging/get/hdfs/staging/out/... Ability to unpack SequenceFile records (unpack={base64,hex}) as key:value lines
  • 16. fileutil ▪ Swiss Army Knife for SequenceFiles, HDFS Staging Server, ZooKeeper ▪ Written in Java, single jar ▪ Works in all our environments (z/OS, Unix, Windows, ...) ▪ Can be ran using TWS/OPC on z/OS (via a JCL), $Universe on Unix, cron ... ▪ Multiple commands sfstage Convert a z/OS dataset to a SF and push it to the staging server {stream,file}stage Push a stream or files to the staging server filesfstage Convert a file to a SF (one record per block) and stage it sfpack Pack key:value lines (cf unpack) in a SequenceFile sfarchive Create a SequenceFile, one record per input file zk{ls,lsr,cat,stat}Read data from ZooKeeper get Retrieve data via URI ...
  • 18. Data Organization ▪ Use of a directory structure that mimics the datasets names PR0172.PVS00.F7209588 Environment / Silo / Application /hdfs/data/si/PR/01/72/PR0172.PVS00.F7209588.SUFFIX ▪ Group ACLs at the Environment/Silo/Application levels ▪ Suffix is mainly used to add .YYYYMM to Generation Data Groups ▪ Suffix added by the staging server ▪ DB2 Table unloads follow similar rules P11DBA.T90XXXX S4SDWH11.T4S02CTSC_H
  • 19. Bastion Hosts ▪ Hadoop Cells are isolated, all accesses MUST go through a bastion host ▪ All accesses to the bastion hosts are authenticated via SSH keys ▪ Users log in using their own user ▪ No SSH port forwarding allowed ▪ All shell commands are logged ▪ Batches scheduled on bastion hosts by $Universe (use of ssh-agent) ▪ Bastion hosts can interact with their HDFS cell (hadoop fs commands) ▪ Bastion hosts can launch jobs ▪ Admin tasks, user provisioning done on NameNode ▪ Kerberos Security not used (yet?) ▪ Need for pluggable security mechanism, using SSH signed tokens
  • 21. We are a Piggy bank ... Attribution: www.seniorliving.org
  • 22. Why Pig? ▪ We <3 the '1 relation per line' approach, « no SQHell™ » ▪ No metadata service to maintain ▪ Ability to add UDFs ▪ A whole lot already added, more on this later... ▪ Batch scheduling ▪ Can handle all the data we store in HDFS ▪ Still open to other tools (Hive, Zohmg, ...)
  • 23. com.arkea.commons.pig.SequenceFileLoadFunc ▪ Generic load function for our BytesWritable SequenceFiles ▪ Relies on Helper classes to interpret the record bytes SequenceFileLoadFunc('HelperClass', 'param', ...) ▪ Helper classes can also be used in regular MapReduce jobs ▪ SequenceFileLoadFunc outputs the following schema { key: bytearray, value: bytearray, parsed: ( Helper dependent schema ) }
  • 24. Helper Classes ▪ COBOL – com.arkea.commons.pig.COBOLBinaryHelper ▪ COPYBOOK ▪ Thrift – com.arkea.comons.pig.ThriftBinaryHelper ▪ .class ▪ DB2 Unload – com.arkea.commons.pig.DB2UnloadBinaryHelper ▪ DDL + load script ▪ MySQL – com.arkea.commons.pig.MySQLBinaryHelper ▪ DDL ▪ ...
  • 25. Initial Pig Target 'proc sql' SAS Corpus from sample to population Need to give users tools that can reproduce what they did in their scripts
  • 26. Groovy Closure Pig UDF DEFINE InlineGroovyUDF cac.pig.udf.GroovyClosure(SCHEMA, CODE); DEFINE FileGroovyUDF cac.pig.udf.GroovyClosure(SCHEMA, '/path/to/closure.groovy'); SCHEMA uses the standard Pig Schema syntax, i.e. 'str: chararray' CODE is a short Groovy Closure, i.e. '{ a,b,c -> return a.replaceAll(b,c); }' closure.groovy must be in a REGISTERed jar under path/to
  • 27. // // Import statements // import ....; // // Constants definitions // /** * Documentation for XXX */ final def XXX = ....; // // Closure definition // /** * Documentation for CLOSURE * * @param a ... * @param b ... * @param ... * * @return ... */ final def CLOSURE = { a,b,... -> ... ... return ...; } // // Unit Tests // // Test specific comment ... assert CLOSURE('A') == ...; // // Return Closure for usage in Pig // return CLOSURE;
  • 28. Pig to Groovy bag -> java.util.List tuple -> Object[] map -> java.util.Map int -> int long -> long float -> float double -> double chararray -> java.lang.String bytearray -> byte[] Groovy to Pig groovy.lang.Tuple -> tuple Object[] -> tuple java.util.List -> bag java.util.Map -> map byte/short/int -> int long/BigInteger -> long float -> float double/BigDecimal -> double java.lang.String -> chararray byte[] -> bytearray
  • 30. ⊕ ▪ Fast and rich data pipeline between z/OS and Hadoop ▪ Pig Toolbox to analyze COBOL/DB2 data alongside Thrift/MySQL/xSV/... ▪ Groovy Closure support for rapid extension ▪ Still some missing features Pure Java compression codecs (JNI on z/OS anyone?) Pig support for BigInteger / BigDecimal (245 might not be enough) SSH(RSA) based auth tokens ▪ And yet another hard challenge: Cultural Change
  • 33. com.arkea.commons.pig.COBOLBinaryHelper REGISTER copybook.jar; A = LOAD '$data' USING cacp.SequenceFileLoadFunc('cacp.COBOLBinaryHelper','[PREFIX:]COPYBOOK'); 000010*GAR* OS Y7XRRDC DESCRIPTION RRDC NOUVAU FORMAT 30000020 000020* LG=00328, ESD MAJ LE 04/12/98, ELS MAJ LE 26/01/01 PAR C98310 30000030 000030* GENERE LE 26/01/01 A 17H01, PFX : Y7XRRD- MEMBRE : Y7XRRDC 30000040 000040 01 Y7XRRD-Y7XRRDC. 30000050 000050* DESCRIPTION RRDC NOUVAU FORMAT 1 04/12/98 30000060 000060 03 Y7XRRD-ARTDS-CLE-SECD. 30000070 A: { 000070* CLE SECONDAIRE ARCHIVAGE TENU DE SOLDE 1 11/02/98 30000080 key: bytearray, 000080 05 Y7XRRD-NO-CCM PIC X(4). 30000090 000090* NUMERO CAISSE 1 28/12/94 30000100 value: bytearray, 000100 05 Y7XRRD-NO-PSE PIC X(8). 30000110 parsed: ( 000110* NUMERO PERSONNE 5 10/07/97 30000120 Y7XRRD_Y7XRRDC: bytearray, 000120 05 Y7XRRD-CATEGORIE PIC X(2). 30000130 000130* CATéGORIE DU COMPTE 13 09/01/01 30000140 Y7XRRD_ARTDS_CLE_SECD: bytearray, 000140 05 Y7XRRD-RANG PIC X(2). 30000150 Y7XRRD_NO_CCM: chararray, 010010* RANG 15 22/01/01 30000160 010020 05 Y7XRRD-NO-ORDRE PIC X(2). 30000170 Y7XRRD_NO_PSE: chararray, 010030* Numéro d'ordre 17 28/12/94 30000180 Y7XRRD_CATEGORIE: chararray, 010040 05 Y7XRRD-DA-TT-C2 PIC X(8). 30000190 010050* DATE TRAITEMENT SX:-C2 19 - - 30000200 Y7XRRD_RANG: chararray, 010060 05 Y7XRRD-NO-ORDRE-ENR-C2 PIC 9(6). 30000210 Y7XRRD_NO_ORDRE: chararray, 010070* Numéro d'ordre enregistrement SX:-C2 27 - - 30000220 010080 03 Y7XRRD-MT-OPE-TDS PIC S9(13)V9(2) COMP-3. 30000230 Y7XRRD_DA_TT_C2: chararray, 010090* MONTANT OPERATION TENUE-DE-SOLDE 33 03/02/98 30000240 Y7XRRD_NO_ORDRE_ENR_C2: long, 010100 03 Y7XRRD-CD-DVS-ORI-OPE PIC X(4). 30000250 010110* CODE DEVISE ORIGINE OPERATION 41 - - 30000260 Y7XRRD_MT_OPE_TDS: double, 010120 03 Y7XRRD-CD-DVS-GTN-TDS PIC X(4). 30000270 Y7XRRD_CD_DVS_ORI_OPE: chararray, 010130* CODE DEVISE GESTION TENUE-DE-SOLDE 45 - - 30000280 Y7XRRD_CD_DVS_GTN_TDS: chararray, 010140 03 Y7XRRD-MT-CNVS-OPE PIC S9(13)V9(2) COMP-3. 30000290 020010* MONTANT CONVERTI OPERATION 49 - - 30000300 Y7XRRD_MT_CNVS_OPE: double, 020020 03 Y7XRRD-IDC-ATN-ORI-MT PIC X(1). 30000310 Y7XRRD_IDC_ATN_ORI_MT: chararray, 020030* INDICATEUR AUTHENTICITE ORIGINE MONTAN 57 05/12/97 30000320 020040 03 Y7XRRD-SLD-AV-IMPT PIC S9(13)V9(2) COMP-3. 30000330 Y7XRRD_SLD_AV_IMPT: double, 020050* SOLDE AVANT IMPUTATION 58 03/02/98 30000340 Y7XRRD_DA_OPE_TDS: chararray, 020060 03 Y7XRRD-DA-OPE-TDS PIC X(8). 30000350 020070* DATE OPERATION TENUE-DE-SOLDE 66 - - 30000360 Y7XRRD_DA_VLR: chararray, 020080 03 Y7XRRD-DA-VLR PIC X(8). 30000370 Y7XRRD_DA_ARR: chararray, 020090* DATE VALEUR 74 28/12/94 30000380 020100 03 Y7XRRD-DA-ARR PIC X(8). 30000390 Y7XRRD_NO_STR_OPE: chararray, 020110* DATE ARRETE 82 - - 30000400 Y7XRRD_NO_REF_TNL_MED: chararray, 020120 03 Y7XRRD-NO-STR-OPE PIC X(6). 30000410 Y7XRRD_NO_LOT: chararray, 020130* NUMERO STRUCTURE OPERATIONNELLE 90 - - 30000420 020140 03 Y7XRRD-NO-REF-TNL-MED PIC X(4). 30000430 Y7XRRD_TDS_LIBELLES: bytearray, 030010* NUMERO REFERENCE TERMINAL MEDIA 96 03/02/98 30000440 Y7XRRD_LIB_CLI_OPE_1: chararray, 030020 03 Y7XRRD-NO-LOT PIC X(3). 30000450 030030* NUMéRO DE LOT 100 13/10/97 30000460 Y7XRRD_LIB_ITE_OPE: chararray, 030040 03 Y7XRRD-TDS-LIBELLES. 30000470 Y7XRRD_LIB_CT_CLI: chararray, 030050* FAMILLE MONTANTS OPERATION T.DE.SOLDE 103 05/02/98 30000480 030060 05 Y7XRRD-LIB-CLI-OPE-1 PIC X(50). 30000490 Y7XRRD_CD_UTI_LIB_CPL: chararray, 030070* LIBELLE CLIENT OPERATION SX:-1 103 03/02/98 30000500 Y7XRRD_IDC_COM_OPE: chararray, 030080 05 Y7XRRD-LIB-ITE-OPE PIC X(32). 30000510 030090* LIBELLE INTERNE OPERATION 153 - - 30000520 Y7XRRD_CD_TY_OPE_NIV_1: chararray, 030100 05 Y7XRRD-LIB-CT-CLI PIC X(32). 30000530 Y7XRRD_CD_TY_OPE_NIV_2: chararray, 030110* LIBELLE COURT CLIENT 185 - - 30000540 FILLER: chararray, 030120 03 Y7XRRD-CD-UTI-LIB-CPL PIC X(1). 30000550 030130* Code utilisation libellés compl. 217 28/12/94 30000560 Y7XRRD_TDS_LIB_SUPPL: bytearray, 030140 03 Y7XRRD-IDC-COM-OPE PIC X(1). 30000570 Y7XRRD_LIB_CLI_OPE_02: chararray, 040010* INDICATEUR COMMISSION OPERATION 218 03/02/98 30000580 040020 03 Y7XRRD-CD-TY-OPE-NIV-1 PIC X(1). 30000590 Y7XRRD_LIB_CLI_OPE_03: chararray 040030* CODE TYPE OPERATION NIVEAU UN 219 - - 30000600 ) 040040 03 Y7XRRD-CD-TY-OPE-NIV-2 PIC X(2). 30000610 040050* CODE TYPE OPERATION NIVEAU DEUX 220 - - 30000620 } 040060 03 FILLER PIC X(7). 30000630 040070* 222 30000640 040080 03 Y7XRRD-TDS-LIB-SUPPL. 30000650 040090* FAMILLE LIBELLES COMPLEMENTAIRES T.D.S 229 17/02/98 30000660 040100 05 Y7XRRD-LIB-CLI-OPE-02 PIC X(50). 30000670 040110* LIBELLE CLIENT OPERATION SX:-02 229 03/02/98 30000680 040120 05 Y7XRRD-LIB-CLI-OPE-03 PIC X(50). 30000690 040130* LIBELLE CLIENT OPERATION SX:-03 279 - - 30000700
  • 34. com.arkea.commons.pig.DB2UnloadBinaryHelper REGISTER ddl-load.jar; A = LOAD '$data' USING cacp.SequenceFileLoadFunc('cacp.DB2UnloadBinaryHelper','[PREFIX:]TABLE'); CREATE TABLE SHDBA.TBDCOLS (COL_CHAR CHAR(4) FOR SBCS DATA WITH DEFAULT NULL, COL_DECIMAL DECIMAL(15, 2) WITH DEFAULT NULL, COL_NUMERIC DECIMAL(15, 0) WITH DEFAULT NULL, .ddl COL_SMALLINT SMALLINT WITH DEFAULT NULL, COL_INTEGER INTEGER WITH DEFAULT NULL, A: { COL_VARCHAR VARCHAR(50) FOR SBCS DATA WITH DEFAULT NULL, key: bytearray, COL_DATE DATE WITH DEFAULT NULL, value: bytearray, COL_TIME TIME WITH DEFAULT NULL, parsed: ( COL_TIMESTAMP TIMESTAMP WITH DEFAULT NULL) ; COL_CHAR: chararray, COL_DECIMAL: double, COL_NUMERIC: long, COL_SMALLINT: long, TEMPLATE DFEM8ERT COL_INTEGER: long, DSN('XXXXX.PPSDR.B99BD02.SBDCOLS.REC') COL_VARCHAR: chararray, DISP(OLD,KEEP,KEEP) COL_DATE: chararray, LOAD DATA INDDN DFEM8ERT LOG NO RESUME YES COL_TIME: chararray, EBCDIC CCSID(01147,00000,00000) COL_TIMESTAMP: INTO TABLE "SHDBA"."TBDCOLS" chararray WHEN(00001:00002) = X'003F' ) .load ( "COL_CHAR" POSITION( 00004:00007) CHAR(00004) NULLIF(00003)=X'FF', } "COL_DECIMAL" POSITION( 00009:00016) DECIMAL NULLIF(00008)=X'FF', "COL_NUMERIC" POSITION( 00018:00025) DECIMAL NULLIF(00017)=X'FF', "COL_SMALLINT" POSITION( 00027:00028) SMALLINT NULLIF(00026)=X'FF', "COL_INTEGER" POSITION( 00030:00033) INTEGER NULLIF(00029)=X'FF', "COL_VARCHAR" POSITION( 00035:00086) VARCHAR NULLIF(00034)=X'FF', "COL_DATE" POSITION( 00088:00097) DATE EXTERNAL NULLIF(00087)=X'FF', "COL_TIME" POSITION( 00099:00106) TIME EXTERNAL NULLIF(00098)=X'FF', "COL_TIMESTAMP" POSITION( 00108:00133) TIMESTAMP EXTERNAL NULLIF(00107)=X'FF' ) Can also handle DB2 UDB unloads (done using hpu)
  • 35. we're Thrifty too... REGISTER thrift-generated.jar; A = LOAD '$data' USING cacp.SequenceFileLoadFunc('cacp.ThiftBinaryHelper','CLASS'); struct Redirection{ A: { 1: string alias, key: bytearray, 2: string url, value: bytearray, 3: string email, parsed: ( 4: i64 timestamp, alias: chararray, 5: i64 lastupdate, url: chararray, 6: list<string> params, email: chararray, 7: bool external = 1, timestamp: long, 8: i64 owner, lastupdate: long, 9: string user, params: (), } external: long, owner: long, user: chararray ) } ... and also use MySQL ... REGISTER mysql-ddl.jar; A = LOAD '$data' USING cacp.SequenceFileLoadFunc('cacp.MySQLBinaryHelper','TABLE'); ... etc etc etc ...