Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData

Hadoop’s Life in Enterprise Systems Y Masatani OSS Professional Services System Platform Sector NTT DATA CORPORATION Hadoop World 2011 Nov 8 th

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Agenda

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Company Overview

Size of IT Services Market by Sectors <FY ended March 31,2011> ［ Moderate Case ］ <2010> Source: Gartner, "Forecast: IT Services Japan by Industry, 1Q 2011" Tsuyoshi Ebina, 20 May 2011 Note: Chart created by NTT Data based on Gartner data 42.2% 20.4% Government and healthcare Financial Enterprise, services, etc. 31.7% Other 5.7% Government and healthcare-related 15.2% 23.4% Financial Enterprise, services, etc. 61.5% Approx. 15.9% Our Shares in Markets IT Services Market in Japan NTT DATA’s Consolidated Net Sales JPY 9.83 trillion JPY 1.16 trillion Percent of our net sales accounted for by each customer field /service when results are totaled using the criteria below Government and healthcare: Central Government and Related Agencies, Overseas Public Institutions, etc. / Local Government and Community-based Business/Healthcare Financial: Banks/Financial Unions/Insurance, Security and Credit Corporations/Settlement Services Enterprise, services, etc.: Global IT Services Company Other: Sales not included in the above : (JPY Trillion) Approx. 6.1% Approx. 21.3%

Positioning in NTT Group NTT DATA IT solutions and Integration company USD 11 billion ,[object Object],[object Object],[object Object],* “Fortune Global 500 July 2010” (USD 1 = JPY 100) Sales Breakdown of NTT Group NTT Holdings USD 103 billion NTT EAST Regional Telephone company USD 20 billion NTT WEST Regional Telephone company USD 18 billion NTT COMMUNICATIONS Network, International Telecommunications company USD 10 billion NTT DOCOMO Mobile/Network Company USD 42 billion : Dimension Data IT communication of enterprises and service providers

[object Object],[object Object],[object Object],[object Object],Hadoop and NTT DATA

[object Object],Hadoop is Getting Hot in Japan

Popularity of Hadoop ~ 2011 Fall 3+ years none < 3 months 3 < 6 6 < 12 months 1 < 3 years ~50% attendees are still under research ~30% just started within 6 months

[object Object],[object Object],[object Object],[object Object],Archetype of Enterprise Hadoop

Data Processing Domains and Engines ,[object Object],Latency Size GB TB PB RDBMS Hadoop Low-Latency Serving Systems DWH, Search Engine, etc sec min hour day Big Data Processing Online Processing Enterprise Batch Processing Online Batch Processing Query & Search Processing

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Fit also to “Enterprise Batch Processing” * http://www.asakusafw.com/

Data Processing Domains and Engines “Revised” ,[object Object],[object Object],Big Data Processing Latency Size GB TB PB RDBMS Low-Latency Serving Systems DWH, Search Engine, etc Hadoop Enterprise Batch Processing sec min hour day Online Processing Online Batch Processing Query & Search Processing

Customers Fit into Two Areas ,[object Object],Big Data Processing Latency Size GB TB PB Enterprise Batch Processing financial media public media telcom telcom public telcom sec min hour day Online Processing Online Batch Processing Query & Search Processing

Hadoop Cluster’s Life-Cycle ,[object Object],Big Data Processing Latency Size GB TB PB Enterprise Batch Processing financial media public media telcom telcom public telcom Expansion Involvement sec min hour day Online Processing Online Batch Processing Query & Search Processing

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],“ Expansion”

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],“ Involvement” Conversion from HDFS to POSIX Conversion from HDFS to POSIX Data Processing Data Processing Hadoop

Archetype of Integration between Engines Big Data Processing Latency Size GB TB PB Enterprise Batch Processing financial media public media telcom telcom public telcom RDBMS Low-Latency Serving Systems DWH, Search Engine, etc Hadoop Raw Data Source Input Coherent Import and Export Reduction sec min hour day Online Processing Online Batch Processing Query & Search Processing

[object Object],[object Object],[object Object],[object Object],[object Object],Large Raw Data Source Input

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Coherent Import and Export with RDBMS

[object Object],[object Object],Enhanced PostgreSQL connecter for Sqoop HDFS RDBMS Map Task Map Task Map Task HDFS RDBMS Optional Map Task Map Task Map Task pg_bulkload pg_bulkoad pg_bulkload Reduce Task BEGIN INSERT INTO dest ( SELECT * FROM tmp1 ) DROP TABLE tmp1 INSERT INTO dest ( SELECT * FROM tmp2 ) DROP TABLE tmp2 INSERT INTO dest ( SELECT * FROM tmp3 ) DROP TABLE tmp3 COMMIT Issue INSERT for each chunk of records: INSERT INTO stg VALUES (?, ?), (?, ?), ... INSERT INTO stg VALUES (?, ?), (?, ?), ... ... INSERT INTO dest ( SELECT * FROM stg) CREATE TABLE tmp3(LIKE dest INCLUDING CONSTRAINTS) Sqoop (baseline implementation) Specialized implementation for PostgreSQL Exclude error records into a separate file Staging Table File Split File Split File Split Destination Table File Split File Split File Split Destination Table tmp1 tmp2 tmp3

Feature of Sqoop PostgreSQL Connector 1 Robust and efficient direct export using “pg_bulkload” ,[object Object],[object Object],[object Object],2 Tune export using PostgreSQL COPY ,[object Object],[object Object],3 Import using “ctid” for string type key value ,[object Object],5 Balanced import using statistical information ,[object Object],4 Tune deletion method for staging table ,[object Object]

[object Object],[object Object],[object Object],[object Object],Integration with Low-Latency Serving System * http://www.ntt.co.jp/RD/OFIS/index_en.html

[object Object],[object Object],Prototype of Hadoop and GPGPU Integration Data Collection Feature Data Extraction Kmeans Clustering Result of Clustering Feature Data Compression Compressed Feature Data: - ROWs: 1000~100000 - COLs: 100~1000 - SIZE: order of ~GB Feature Data: - ROWs: 1000~100000 - COLs: 10000~100000 - SIZE: order of ~10GB Input Data (Query Log): - Unique User: 30,000[UU/H] - SIZE: order of ~TB Hadoop Slave Hadoop Master Flume Collector Flume Master /ZooKeeper GPU Server Raw Data Source

Breakdown of Elapsed Time for K-means 24 cores 3 nodes 256 cores 1 node

[object Object],[object Object],Connector Integration Beyond ,[object Object],[object Object],[object Object],[object Object],[object Object],Common development in and Hadoop Cluster for Enterprise Batch Processing Backup and Recovery POSIX HDFS APIs System B System A

Copyright 2011 FUJITSU LIMITED Enhanced Storage Architecture Established storage management technology (memory caching and disk I/O scheduling) and enhanced dedicated network enables boosted HDFS performance Local FS Mem CPU Extract Disk I/O bandwidth as of Locality Local FS Mem CPU Local FS Mem CPU Mem CPU Mem CPU Mem CPU Meshed network (40Gb b/w) Pros: Achieve Read 5x and Write 10x performance based on a financial enterprise batch benchmark case compared to local disk HDFS. Cons: Limited scalability (up to 40~50 nodes based on the prototype configuration, will be extended to ~120) Enhanced Bandwidth between Nodes and Storage Storage File system supports HDFS APIs

[object Object],Connector Integration Beyond ,[object Object],[object Object],Common development in and Can Eliminate This Overhead Conversion from HDFS to POSIX Hadoop

Hadoop with Enterprise Market ,[object Object],[object Object]

Thank you contact: hadoop at kits.nttdata.co.jp

Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (14)

Similaire à Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData

Similaire à Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData (20)

Plus de Cloudera, Inc.

Plus de Cloudera, Inc. (20)

Dernier

Dernier (20)

Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData

Notes de l'éditeur