SlideShare une entreprise Scribd logo
1  sur  33
Hadoop –
Lessons Learned from Enterprise Clusters
Shinichi Yamada
EVP & CTO NTT DATA CORPORATION
Copyright © 2010 NTT DATA CORPORATION
Company Overview
•Name: NTT DATA CORPORATION
•Headquarters: Tokyo, Japan
•Revenue: USD 11.4 billion
(March, 2010 ; USD 1 = JPY 100)
•Employees: 34,543 (March, 2010)
•Business Areas: Broad range of IT services
•Systems integration
•IT consulting
•IT outsourcing
•History:
•1967 - established as a division of NTT
•1988 - spun off from NTT and incorporated (May 23, 1988)
•1995 - went public (Tokyo Stock Exchange: 9613)
Copyright © 2010 NTT DATA CORPORATION
Net Sales by Sector07
Transition
Consolidated (USD million) (USD 1 = JPY 100)
Ratio
Public
Administration
Sector
18%
Financial sector
43%
Industrial sector
33%
Others
6%
3265 3442
3005
2564 2327 2160
2745
3245 4210
4737 4942 5110
2382
3482
3248
3774 3826 3990
679
278
279
313 332
740
0
2,000
4,000
6,000
8,000
10,000
12,000
FY2005 FY2006 FY2007 FY2008 FY2009 FY2010
(forecast)
Consolidated FY ended March 31,2011
(USD 1 = JPY 100)
Public Administration Sector
Financial Sector
Industrial Sector
Others (maintenance and operations, etc.)
(FY)
Copyright © 2010 NTT DATA CORPORATION
Positioning in NTT Group
• NTT Group is one of the 50 largest companies in the world*,
specializing in IT & Telecommunications with USD 104 billion in revenue.
• NTT DATA is the IT solutions arm of the NTT Group,
specializing in IT solutions and systems integration services.
• NTT Group regards IT business as
one of its most important domains,
and emphasizes on NTT DATA’s growth
as the telecom industry faces
commoditization.
Sales Breakdown of NTT Group
NTT Holdings
USD 104 bil
NTT
EAST
Regional
telephone
company
USD 20 bil
NTT
WEST
Regional
telephone
company
USD 18 bil
NTT
DATA
IT solutions
and
integration
company
USD 11 bil
NTT
COMMUNI
CATIONS
Network,
International
telecommunic
ations
company
USD 11 bil
NTT
DOCOMO
Mobile /
Network
company
USD 44 bil
・・・
Copyright © 2010 NTT DATA CORPORATION
Best Fitting Strategic Partnership
 NTT DATA is a leading IT service provider and already has over 3 years
experience and production cases on Hadoop
 Help enterprise customer design, integrate, deploy and run large clusters at the
range of 100 ~ 1000+ nodes
 Deep and wide experience introducing Open Source Software technologies for
enterprise customers. For the data management 8 years with PostgreSQL
including mission critical cases
 Cloudera is the leading provider of Hadoop-based software, services and
education, and CDH is the best qualified Hadoop distribution
 Have a strong relationship with Hadoop OSS community and aggressively
promote Hadoop’s ecosystem
Copyright © 2010 NTT DATA CORPORATION
The Objective of Partnership
Jointly Promote and Accelerate Hadoop
Business in Japan /APAC
Copyright © 2010 NTT DATA CORPORATION
 NTT DATA Delivers Cloudera’s Product in Japan
 Promote CDH and provide support in Japanese and with local staff
 Promote Cloudera’s training in Japan and provide knowledge-base in Japanese
 Qualified Professional Services for Hadoop
 Enhance and extend NTT DATA’s Hadoop professional services by sharing experience
and resources with Cloudera’s team
 Common Development and Feedback from NTT DATA’s Enhancement
 Utilize open source tools (Heartbeat, Puppet etc) to improve reliability and to
optimize cluster operation
Some enhancements are publicly available via:
http://www.meti.go.jp/policy/mono_info_service/joho/downloadfiles/2010software_res
earch/clou_dist_software.pdf (only Japanese yet)
Deliverables of Partnership
Copyright © 2010 NTT DATA CORPORATION
Construction of the Hadoop Environment
 Established fully automated Hadoop environment construction system by OSS
 Utilize Puppet and Kickstart (based on commodity functions)
 Developed scripts to set up a cluster consist of heterogeneous hardware.
 IP address and hostname are assigned to fulfill operational and maintenance rules
For example: Each hostname represents the server’s topological location of the rack and the
port of the switch connected.
 Install 100 servers: 90minutes / Update 100 servers configurations: 3 minutes
DHCP Server
TFTP Server(1) Install OS and packages
(2) Configure servers
HTTP Server
Slave Servers
PhasesOperators
Give IP and stage_1 boot loader
Get stage_1 boot loader
Get OS installer and config files
Get install packages
DHCP ServerNotify hostname made from topology/location
DNS Server
Register name
Puppet Server
Notify machine spec
Give config files according to spec
Wire &
Power-on
No Human
intervention
(3) Configure applications
Detailed Flow of Construction
Copyright © 2010 NTT DATA CORPORATION
Master Server Redundancy of the Hadoop
Environment
 The Heartbeat-DRBD method is already known to Hadoop community.
 Having down-time to failover from active to slave.
 It needs to retry the job after the failover
 The Kemari-DRBD method (Experimental)
 Kemari is a software for Fault Tolerant and is developed by NTT Laboratories.
 No down-time and no need to retry job
System
Disk
Data Disk
(VM Image)
OS (Dom-0) DRBD
Heartbeat
Kemari RA
OS (Dom-U)
NameNode
KemariProcess
xc_kemari_save
Xen
Virtual Machine
Active
System
Disk
Data Disk
(VM Image)
OS (Dom-0)DRBD
Heartbeat
KemariProcess
xc_kemari_restore
OS (Dom-U)
NameNode
Xen
Virtual Machine
Stand-by
Storage Sync
Memory Sync
between Virtual Machines
Monitoring each nodes
Start Stand-by machine
 Kemari synchronizes
state of Dom-U, such as
memory
 Kemari preliminary
prototype was
implemented on Xen
 It is under development
to KVM / Qemu now
Copyright © 2010 NTT DATA CORPORATION
 Early Adapters, i.e. Web/Internet Service Companies
 Process various types of phenomenal data daily and those are growing steadily
 have in-house engineering resources and start Hadoop project as a skunk work
 Clusters are typically around 20~50 nodes, then in these days, experienced
companies are going to consolidate scattered clusters
 Optimistic Attitudes is not Majority
 Japanese Enterprises are sophisticated on emerging technology and have high
expectation, however conservative on deployment
 Wants “Best Practices” from the beginning in every scope on quality, robustness,
sustainability, economy of platform
 What is the “Best Practices” in Hadoop ?
Working with Japanese Enterprises, we observes two types of opportunities, from
system integrator’s viewpoint
Hadoop in Japan
Copyright © 2010 NTT DATA CORPORATION
“Frontiers” expects “Scalability is an Objective”
 There are several enterprises, who already have Excessive Amount of Data
not being effectively and economically analyzed yet. Typically in telecom,
telemetries industries
 Hadoop is inevitable choice for scalability on their big data, thus deployment
immediately goes over 100 nodes clusters, then System Integration on top of
Hadoop cluster will be major concern
 “Best Practice” expects knowledge and experience for
- tuned integration with data collectors/sensors, i.e. custom Hadoop cluster
- specialized custom analytic application,
- and design for operational economies for reducing management complexity
Lessons Learned from Enterprise Customers
Copyright © 2010 NTT DATA CORPORATION
“Establishment” expects “Scalability is a Requirement”
 Growing amount of data becomes a burden typically on large batch jobs,
which has been processed by mainframes or UNIX enterprise servers
 Starts from small clusters, then need consulting starting from evaluating
POC, comparing with other technologies, then planning for migration
 “Best Practice” expects handy deployment (up to 20 nodes) and standard
tools, which support planning off-load and migrating existing applications
 Scalability means elastic deployment from user’s viewpoint
 Challenge is the migration of application, which sometimes require re-factoring
data and algorithm. It shall be minimal but bold changes
Lessons Learned from Enterprise Customers
Hadoop in RECRUIT
Oct 12, 2010
RECRUIT CO.,LTD.
Executive Manager, Osamu YONETANI
Company Information and Data
RECRUIT CO.,LTD
Founded: March 31, 1960 (incorporated August 26, 1963)
Financial Information:
 Recruit Group
Consolidated Sales: about 9 billion dollars (※1)
Consolidated Ordinary Income: about 831 million dollars (※1)
 Recruit Co., Ltd.
Capital: 30 million dollars (since March 1, 1995)
Number of Employees: 5,929 (male: 2,659, female: 3,270)
Sales: about 3.7 billion dollars (※1)
Ordinary Income: about 623 million dollars (※1)
(※1) April 1 2009 - March 31, 2010
Affiliated Companies: 86 (as of March 31, 2010)
Web site: http://www.recruit.co.jp/corporate/english/
Company Information and Data
Products & Services
Human Resources
When you want to get a job!
We provide a large amount of top-quality job information through various media
such as information magazines and websites.
For Clients.
We support "Strategic Human Resources Management" from recruitment through
evaluation, remuneration, and staff training to placement.
In the area of "Human Resources Recruitment," we offer business solutions such as
human resource arrangement and effective staffing by outsourcing .
Products & Services
Coupons
Support ladies in their 20s and 30s.
We provide a service based on the respective local areas and that target mainly
women, encouraging them to try different shops and restaurants.
For Clients.
Our staff members visit each participating business to gather information and
suggest the most effective coupon approach.
Products & Services
Housing
Publication and sales of "SUUMO", "HOUSING" etc.
Operation of "SUUMO", "SUUMO mobile," etc.
Further education and Learning
Publication and sales of "KEIKO TO MANABU", "RECRUIT SHINGAKU BOOK",
"COLLEGE MANAGEMENT," etc.
Operation of "KEIKO TO MANABU.net", "Career Guidance.net," etc.
Products & Services
Travel
Publication and sales of "JALAN" etc.
Operation of "jalan.net", "AB-ROAD" and mobile sites etc.
Bridal
Publication and sales of " ZEXY", "ZEXY INTERIOR", "ZEXY Anhelo, " etc.
Operation of "ZEXY net, ", "ZEXY net mobile," etc.
Our division
MIT = "Marketing and IT" Division.
Information Systems division for all company.
Cost management
Checking project budget spending.
Project Solution Group (a.k.a PMO)
Reviewing major development projects of web sites.
Infrastructure Solution Group
Sharing Infrastructure.
Operate over 1500 servers.
The group of exploring new technology is here!
Board
CEO
Job Div. MITCar Div. ・・・
Comparison of 4 DWH Middlewares
Needs
Prolonged process time and growing needs for analysis.
From increasing access and actions, our data size increases.
Evolution of shared-nothing technology.
Shared-everything technology has the tendency to be expensive..
Products verified.
 Proprietary RDBMS (DWH version)
 Proprietary RDBMS with RAM disk
 Brand new Commercial RDBMS (like PostgreSQL cluster)
 Hadoop + HIVE
Comparison of 4 DWH Middlewares
I
Hadoop
HIVE
O
G
I
Hadoop
HIVE
O
G
Offline Perf.
Reliability
Scalability
Serv. for Dev.
Economy
Graph with their features
Comparison of 4 DWH Middlewares
Serv. for Ope.
Availability
Flex./Opp.
Ease of Migr.
Online Perf.
Model 1:Short Term Target (For EUC platform)
Without changing programs codes.
Focus on Availability and Ease of Migration.
Online performance is needed.
Model 2:Short / Middle Term Target (For offline processing)
Small change is acceptable.
Focus on Reliability.
Offline performance is needed.
Model 3:Long Term Target (For new needs)
Can make with zero base.
Focus on Economy, Scalability and Flexibility.
TB or PB class data size.
Comparison of 4 DWH Middlewares
Evaluation model
0
10
20
30
40
50
60
70
80
90
100
配点 Greenplum InfoSphere Hadoop+HIVE RailGun
製品別得点比較
バッチ処理性能
基盤運用容易性
製品信頼性
拡張性
可用性
AP開発容易性
移行容易性
オンライン処理適合性
経済性
先進性/将来性
47p
71p
26p
79p
I
Hadoop
HIVE
OG
points
distribution
Comparison of 4 DWH Middlewares
Model 1:Short Term Target (For EUC platform)
0
10
20
30
40
50
60
70
80
90
100
配点 Greenplum InfoSphere Hadoop+HIVE RailGun
製品別得点比較
バッチ処理性能
基盤運用容易性
製品信頼性
拡張性
可用性
AP開発容易性
移行容易性
オンライン処理適合性
経済性
先進性/将来性
Offline Perf.
Serv. for Ope.
Reliability
Scalability
Availability
Serv. for Dev.
Ease of Migr.
Online Perf.
Economy
Flex./Opp.
0
10
20
30
40
50
60
70
80
90
100
配点 Greenplum InfoSphere Hadoop+HIVE RailGun
製品別得点比較
バッチ処理性能
基盤運用容易性
製品信頼性
拡張性
可用性
AP開発容易性
移行容易性
オンライン処理適合性
経済性
先進性/将来性
54p
46p
62p62p
Comparison of 4 DWH Middlewares
I
Hadoop
HIVE
OG
points
distribution
Model 2:Short / Middle Term Target (For offline processing)
0
10
20
30
40
50
60
70
80
90
100
配点 Greenplum InfoSphere Hadoop+HIVE RailGun
製品別得点比較
バッチ処理性能
基盤運用容易性
製品信頼性
拡張性
可用性
AP開発容易性
移行容易性
オンライン処理適合性
経済性
先進性/将来性
Offline Perf.
Serv. for Ope.
Reliability
Scalability
Availability
Serv. for Dev.
Ease of Migr.
Online Perf.
Economy
Flex./Opp.
0
10
20
30
40
50
60
70
80
90
100
配点 Greenplum InfoSphere Hadoop+HIVE RailGun
製品別得点比較
バッチ処理性能
基盤運用容易性
製品信頼性
拡張性
可用性
AP開発容易性
移行容易性
オンライン処理適合性
経済性
先進性/将来性
66p
53p
35p
69p
Comparison of 4 DWH Middlewares
I
Hadoop
HIVE
OG
points
distribution
Model 3:Long Term Target (For new needs)
0
10
20
30
40
50
60
70
80
90
100
配点 Greenplum InfoSphere Hadoop+HIVE RailGun
製品別得点比較
バッチ処理性能
基盤運用容易性
製品信頼性
拡張性
可用性
AP開発容易性
移行容易性
オンライン処理適合性
経済性
先進性/将来性
Offline Perf.
Serv. for Ope.
Reliability
Scalability
Availability
Serv. for Dev.
Ease of Migr.
Online Perf.
Economy
Flex./Opp.
Next Step
Next Step
To start Hadoop...
With small, but real data.
Replace some small part of our system with Hadoop one.
Same output with new distributed architecture logic.
To take advantage of Hadoop...
For software development
Getting know-how and tips through small project.
To enable applying an some projects, sharing this knowledge with other teams.
For operating Hadoop
Must have an infra engineer familiar with Hadoop architecture.
To save cost, shared infrastructure and engineers on some Hadoop project.
Future Challenges
Future Challenges
To make Value for relevant business...
Improve our business.
Our thinking limits will be released by the power of Hadoop.
Example: Through our web page for clients.
1. Better suggestions to sell their product with recommendation logic.
2. Near realtime reports for specific markets.
Contribute community.
Share our experiences to making systems for non-special users.
Share our library and operation tools (maybe!).
Copyright © 2010 NTT DATA CORPORATION
Elephant Ear Cookies
contact: hadoop at kits.nttdata.co.jp
Copyright © 2010 NTT DATA CORPORATION
Thank you

Contenu connexe

Tendances

Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveGeekNightHyderabad
 
Scalable data pipeline
Scalable data pipelineScalable data pipeline
Scalable data pipelineGreenM
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storagehybrid cloud
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsCloudera, Inc.
 
Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...
Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...
Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...Patrick Van Renterghem
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXBMC Software
 
Breakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data StoreBreakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data StoreCloudera, Inc.
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionDataWorks Summit
 
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonUsing Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonMapR Technologies
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architectureMilos Milovanovic
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dwelephantscale
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHumza Naseer
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesDataWorks Summit
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Ontico
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Partners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop
Partners 2013 LinkedIn Use Cases for Teradata Connectors for HadoopPartners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop
Partners 2013 LinkedIn Use Cases for Teradata Connectors for HadoopEric Sun
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopCloudera, Inc.
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
 

Tendances (20)

Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's Perspective
 
Scalable data pipeline
Scalable data pipelineScalable data pipeline
Scalable data pipeline
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
 
Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...
Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...
Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
 
Breakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data StoreBreakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data Store
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonUsing Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architecture
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dw
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Partners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop
Partners 2013 LinkedIn Use Cases for Teradata Connectors for HadoopPartners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop
Partners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 

En vedette

20150630_データ分析に最適な基盤とは? -コスト/スピードでビジネスバリューを得るために- by 株式会社インサイトテクノロジー CTO 石川雅也
20150630_データ分析に最適な基盤とは? -コスト/スピードでビジネスバリューを得るために- by 株式会社インサイトテクノロジー CTO 石川雅也20150630_データ分析に最適な基盤とは? -コスト/スピードでビジネスバリューを得るために- by 株式会社インサイトテクノロジー CTO 石川雅也
20150630_データ分析に最適な基盤とは? -コスト/スピードでビジネスバリューを得るために- by 株式会社インサイトテクノロジー CTO 石川雅也Insight Technology, Inc.
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseDataWorks Summit
 

En vedette (6)

Queues, Pools and Caches paper
Queues, Pools and Caches paperQueues, Pools and Caches paper
Queues, Pools and Caches paper
 
Big disasters
Big disastersBig disasters
Big disasters
 
Visualization
VisualizationVisualization
Visualization
 
Big data rmoug
Big data rmougBig data rmoug
Big data rmoug
 
20150630_データ分析に最適な基盤とは? -コスト/スピードでビジネスバリューを得るために- by 株式会社インサイトテクノロジー CTO 石川雅也
20150630_データ分析に最適な基盤とは? -コスト/スピードでビジネスバリューを得るために- by 株式会社インサイトテクノロジー CTO 石川雅也20150630_データ分析に最適な基盤とは? -コスト/スピードでビジネスバリューを得るために- by 株式会社インサイトテクノロジー CTO 石川雅也
20150630_データ分析に最適な基盤とは? -コスト/スピードでビジネスバリューを得るために- by 株式会社インサイトテクノロジー CTO 石川雅也
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 

Similaire à NTT Data - Shinichi Yamada - Hadoop World 2010

Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData
Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTDataHadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData
Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTDataCloudera, Inc.
 
Sourav_Giri_Resume_2015
Sourav_Giri_Resume_2015Sourav_Giri_Resume_2015
Sourav_Giri_Resume_2015sourav giri
 
Toyota tsusho africa case study
Toyota tsusho africa case studyToyota tsusho africa case study
Toyota tsusho africa case studyCisco Case Studies
 
OpenStack in the Enterprise - Interop Las Vegas 2014
OpenStack in the Enterprise - Interop Las Vegas 2014OpenStack in the Enterprise - Interop Las Vegas 2014
OpenStack in the Enterprise - Interop Las Vegas 2014Seth Fox
 
The Implacable advance of the data
The Implacable advance of the dataThe Implacable advance of the data
The Implacable advance of the dataDataWorks Summit
 
Shanish_SQL_PLSQL_Profile
Shanish_SQL_PLSQL_ProfileShanish_SQL_PLSQL_Profile
Shanish_SQL_PLSQL_ProfileShanish Jain
 
Cisco Domain Ten solution overview
Cisco Domain Ten solution overviewCisco Domain Ten solution overview
Cisco Domain Ten solution overviewCharles Malkiel
 
Cloud Integration and Management - CA World 2013
Cloud Integration and Management - CA World 2013Cloud Integration and Management - CA World 2013
Cloud Integration and Management - CA World 2013Fujitsu Global
 
Valuing Information Management and IT Architecture
Valuing Information Management and IT ArchitectureValuing Information Management and IT Architecture
Valuing Information Management and IT ArchitectureGoutama Bachtiar
 
MS Inspire Update | Data Flex and Return to Workspace Solution
MS Inspire Update | Data Flex and Return to Workspace SolutionMS Inspire Update | Data Flex and Return to Workspace Solution
MS Inspire Update | Data Flex and Return to Workspace SolutionManoj Mittal
 
Q3 fy12 company presentation_red hat
Q3 fy12 company presentation_red hatQ3 fy12 company presentation_red hat
Q3 fy12 company presentation_red hatArmstrong WANG
 
Challenges of applying Blockchain to enterprise systems in NTTDATA
Challenges of applying Blockchain to enterprise systems in NTTDATAChallenges of applying Blockchain to enterprise systems in NTTDATA
Challenges of applying Blockchain to enterprise systems in NTTDATAHyperleger Tokyo Meetup
 
What and How to Cloud - A new way to plan and migrate apps and servers to cl...
What and How to Cloud -  A new way to plan and migrate apps and servers to cl...What and How to Cloud -  A new way to plan and migrate apps and servers to cl...
What and How to Cloud - A new way to plan and migrate apps and servers to cl...SoftwareONEPresents
 
Running Cognos on Hadoop
Running Cognos on HadoopRunning Cognos on Hadoop
Running Cognos on HadoopSenturus
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsjdijcks
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Barijaxconf
 

Similaire à NTT Data - Shinichi Yamada - Hadoop World 2010 (20)

Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData
Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTDataHadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData
Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData
 
Sourav_Giri_Resume_2015
Sourav_Giri_Resume_2015Sourav_Giri_Resume_2015
Sourav_Giri_Resume_2015
 
Soma_Chakraborty (1)
Soma_Chakraborty (1)Soma_Chakraborty (1)
Soma_Chakraborty (1)
 
Toyota tsusho africa case study
Toyota tsusho africa case studyToyota tsusho africa case study
Toyota tsusho africa case study
 
OpenStack in the Enterprise - Interop Las Vegas 2014
OpenStack in the Enterprise - Interop Las Vegas 2014OpenStack in the Enterprise - Interop Las Vegas 2014
OpenStack in the Enterprise - Interop Las Vegas 2014
 
The Implacable advance of the data
The Implacable advance of the dataThe Implacable advance of the data
The Implacable advance of the data
 
Shivaprasada_Kodoth
Shivaprasada_KodothShivaprasada_Kodoth
Shivaprasada_Kodoth
 
Shanish_SQL_PLSQL_Profile
Shanish_SQL_PLSQL_ProfileShanish_SQL_PLSQL_Profile
Shanish_SQL_PLSQL_Profile
 
Cisco Domain Ten solution overview
Cisco Domain Ten solution overviewCisco Domain Ten solution overview
Cisco Domain Ten solution overview
 
Cloud Integration and Management - CA World 2013
Cloud Integration and Management - CA World 2013Cloud Integration and Management - CA World 2013
Cloud Integration and Management - CA World 2013
 
Tamilarasu_Uthirasamy_10Yrs_Resume
Tamilarasu_Uthirasamy_10Yrs_ResumeTamilarasu_Uthirasamy_10Yrs_Resume
Tamilarasu_Uthirasamy_10Yrs_Resume
 
Valuing Information Management and IT Architecture
Valuing Information Management and IT ArchitectureValuing Information Management and IT Architecture
Valuing Information Management and IT Architecture
 
MS Inspire Update | Data Flex and Return to Workspace Solution
MS Inspire Update | Data Flex and Return to Workspace SolutionMS Inspire Update | Data Flex and Return to Workspace Solution
MS Inspire Update | Data Flex and Return to Workspace Solution
 
Q3 fy12 company presentation_red hat
Q3 fy12 company presentation_red hatQ3 fy12 company presentation_red hat
Q3 fy12 company presentation_red hat
 
Challenges of applying Blockchain to enterprise systems in NTTDATA
Challenges of applying Blockchain to enterprise systems in NTTDATAChallenges of applying Blockchain to enterprise systems in NTTDATA
Challenges of applying Blockchain to enterprise systems in NTTDATA
 
What and How to Cloud - A new way to plan and migrate apps and servers to cl...
What and How to Cloud -  A new way to plan and migrate apps and servers to cl...What and How to Cloud -  A new way to plan and migrate apps and servers to cl...
What and How to Cloud - A new way to plan and migrate apps and servers to cl...
 
Running Cognos on Hadoop
Running Cognos on HadoopRunning Cognos on Hadoop
Running Cognos on Hadoop
 
Atos.pptx
Atos.pptxAtos.pptx
Atos.pptx
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
 

Plus de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Plus de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Dernier

VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 

Dernier (20)

VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 

NTT Data - Shinichi Yamada - Hadoop World 2010

  • 1. Hadoop – Lessons Learned from Enterprise Clusters Shinichi Yamada EVP & CTO NTT DATA CORPORATION
  • 2. Copyright © 2010 NTT DATA CORPORATION Company Overview •Name: NTT DATA CORPORATION •Headquarters: Tokyo, Japan •Revenue: USD 11.4 billion (March, 2010 ; USD 1 = JPY 100) •Employees: 34,543 (March, 2010) •Business Areas: Broad range of IT services •Systems integration •IT consulting •IT outsourcing •History: •1967 - established as a division of NTT •1988 - spun off from NTT and incorporated (May 23, 1988) •1995 - went public (Tokyo Stock Exchange: 9613)
  • 3. Copyright © 2010 NTT DATA CORPORATION Net Sales by Sector07 Transition Consolidated (USD million) (USD 1 = JPY 100) Ratio Public Administration Sector 18% Financial sector 43% Industrial sector 33% Others 6% 3265 3442 3005 2564 2327 2160 2745 3245 4210 4737 4942 5110 2382 3482 3248 3774 3826 3990 679 278 279 313 332 740 0 2,000 4,000 6,000 8,000 10,000 12,000 FY2005 FY2006 FY2007 FY2008 FY2009 FY2010 (forecast) Consolidated FY ended March 31,2011 (USD 1 = JPY 100) Public Administration Sector Financial Sector Industrial Sector Others (maintenance and operations, etc.) (FY)
  • 4. Copyright © 2010 NTT DATA CORPORATION Positioning in NTT Group • NTT Group is one of the 50 largest companies in the world*, specializing in IT & Telecommunications with USD 104 billion in revenue. • NTT DATA is the IT solutions arm of the NTT Group, specializing in IT solutions and systems integration services. • NTT Group regards IT business as one of its most important domains, and emphasizes on NTT DATA’s growth as the telecom industry faces commoditization. Sales Breakdown of NTT Group NTT Holdings USD 104 bil NTT EAST Regional telephone company USD 20 bil NTT WEST Regional telephone company USD 18 bil NTT DATA IT solutions and integration company USD 11 bil NTT COMMUNI CATIONS Network, International telecommunic ations company USD 11 bil NTT DOCOMO Mobile / Network company USD 44 bil ・・・
  • 5. Copyright © 2010 NTT DATA CORPORATION Best Fitting Strategic Partnership  NTT DATA is a leading IT service provider and already has over 3 years experience and production cases on Hadoop  Help enterprise customer design, integrate, deploy and run large clusters at the range of 100 ~ 1000+ nodes  Deep and wide experience introducing Open Source Software technologies for enterprise customers. For the data management 8 years with PostgreSQL including mission critical cases  Cloudera is the leading provider of Hadoop-based software, services and education, and CDH is the best qualified Hadoop distribution  Have a strong relationship with Hadoop OSS community and aggressively promote Hadoop’s ecosystem
  • 6. Copyright © 2010 NTT DATA CORPORATION The Objective of Partnership Jointly Promote and Accelerate Hadoop Business in Japan /APAC
  • 7. Copyright © 2010 NTT DATA CORPORATION  NTT DATA Delivers Cloudera’s Product in Japan  Promote CDH and provide support in Japanese and with local staff  Promote Cloudera’s training in Japan and provide knowledge-base in Japanese  Qualified Professional Services for Hadoop  Enhance and extend NTT DATA’s Hadoop professional services by sharing experience and resources with Cloudera’s team  Common Development and Feedback from NTT DATA’s Enhancement  Utilize open source tools (Heartbeat, Puppet etc) to improve reliability and to optimize cluster operation Some enhancements are publicly available via: http://www.meti.go.jp/policy/mono_info_service/joho/downloadfiles/2010software_res earch/clou_dist_software.pdf (only Japanese yet) Deliverables of Partnership
  • 8. Copyright © 2010 NTT DATA CORPORATION Construction of the Hadoop Environment  Established fully automated Hadoop environment construction system by OSS  Utilize Puppet and Kickstart (based on commodity functions)  Developed scripts to set up a cluster consist of heterogeneous hardware.  IP address and hostname are assigned to fulfill operational and maintenance rules For example: Each hostname represents the server’s topological location of the rack and the port of the switch connected.  Install 100 servers: 90minutes / Update 100 servers configurations: 3 minutes DHCP Server TFTP Server(1) Install OS and packages (2) Configure servers HTTP Server Slave Servers PhasesOperators Give IP and stage_1 boot loader Get stage_1 boot loader Get OS installer and config files Get install packages DHCP ServerNotify hostname made from topology/location DNS Server Register name Puppet Server Notify machine spec Give config files according to spec Wire & Power-on No Human intervention (3) Configure applications Detailed Flow of Construction
  • 9. Copyright © 2010 NTT DATA CORPORATION Master Server Redundancy of the Hadoop Environment  The Heartbeat-DRBD method is already known to Hadoop community.  Having down-time to failover from active to slave.  It needs to retry the job after the failover  The Kemari-DRBD method (Experimental)  Kemari is a software for Fault Tolerant and is developed by NTT Laboratories.  No down-time and no need to retry job System Disk Data Disk (VM Image) OS (Dom-0) DRBD Heartbeat Kemari RA OS (Dom-U) NameNode KemariProcess xc_kemari_save Xen Virtual Machine Active System Disk Data Disk (VM Image) OS (Dom-0)DRBD Heartbeat KemariProcess xc_kemari_restore OS (Dom-U) NameNode Xen Virtual Machine Stand-by Storage Sync Memory Sync between Virtual Machines Monitoring each nodes Start Stand-by machine  Kemari synchronizes state of Dom-U, such as memory  Kemari preliminary prototype was implemented on Xen  It is under development to KVM / Qemu now
  • 10. Copyright © 2010 NTT DATA CORPORATION  Early Adapters, i.e. Web/Internet Service Companies  Process various types of phenomenal data daily and those are growing steadily  have in-house engineering resources and start Hadoop project as a skunk work  Clusters are typically around 20~50 nodes, then in these days, experienced companies are going to consolidate scattered clusters  Optimistic Attitudes is not Majority  Japanese Enterprises are sophisticated on emerging technology and have high expectation, however conservative on deployment  Wants “Best Practices” from the beginning in every scope on quality, robustness, sustainability, economy of platform  What is the “Best Practices” in Hadoop ? Working with Japanese Enterprises, we observes two types of opportunities, from system integrator’s viewpoint Hadoop in Japan
  • 11. Copyright © 2010 NTT DATA CORPORATION “Frontiers” expects “Scalability is an Objective”  There are several enterprises, who already have Excessive Amount of Data not being effectively and economically analyzed yet. Typically in telecom, telemetries industries  Hadoop is inevitable choice for scalability on their big data, thus deployment immediately goes over 100 nodes clusters, then System Integration on top of Hadoop cluster will be major concern  “Best Practice” expects knowledge and experience for - tuned integration with data collectors/sensors, i.e. custom Hadoop cluster - specialized custom analytic application, - and design for operational economies for reducing management complexity Lessons Learned from Enterprise Customers
  • 12. Copyright © 2010 NTT DATA CORPORATION “Establishment” expects “Scalability is a Requirement”  Growing amount of data becomes a burden typically on large batch jobs, which has been processed by mainframes or UNIX enterprise servers  Starts from small clusters, then need consulting starting from evaluating POC, comparing with other technologies, then planning for migration  “Best Practice” expects handy deployment (up to 20 nodes) and standard tools, which support planning off-load and migrating existing applications  Scalability means elastic deployment from user’s viewpoint  Challenge is the migration of application, which sometimes require re-factoring data and algorithm. It shall be minimal but bold changes Lessons Learned from Enterprise Customers
  • 13. Hadoop in RECRUIT Oct 12, 2010 RECRUIT CO.,LTD. Executive Manager, Osamu YONETANI
  • 15. RECRUIT CO.,LTD Founded: March 31, 1960 (incorporated August 26, 1963) Financial Information:  Recruit Group Consolidated Sales: about 9 billion dollars (※1) Consolidated Ordinary Income: about 831 million dollars (※1)  Recruit Co., Ltd. Capital: 30 million dollars (since March 1, 1995) Number of Employees: 5,929 (male: 2,659, female: 3,270) Sales: about 3.7 billion dollars (※1) Ordinary Income: about 623 million dollars (※1) (※1) April 1 2009 - March 31, 2010 Affiliated Companies: 86 (as of March 31, 2010) Web site: http://www.recruit.co.jp/corporate/english/ Company Information and Data
  • 16. Products & Services Human Resources When you want to get a job! We provide a large amount of top-quality job information through various media such as information magazines and websites. For Clients. We support "Strategic Human Resources Management" from recruitment through evaluation, remuneration, and staff training to placement. In the area of "Human Resources Recruitment," we offer business solutions such as human resource arrangement and effective staffing by outsourcing .
  • 17. Products & Services Coupons Support ladies in their 20s and 30s. We provide a service based on the respective local areas and that target mainly women, encouraging them to try different shops and restaurants. For Clients. Our staff members visit each participating business to gather information and suggest the most effective coupon approach.
  • 18. Products & Services Housing Publication and sales of "SUUMO", "HOUSING" etc. Operation of "SUUMO", "SUUMO mobile," etc. Further education and Learning Publication and sales of "KEIKO TO MANABU", "RECRUIT SHINGAKU BOOK", "COLLEGE MANAGEMENT," etc. Operation of "KEIKO TO MANABU.net", "Career Guidance.net," etc.
  • 19. Products & Services Travel Publication and sales of "JALAN" etc. Operation of "jalan.net", "AB-ROAD" and mobile sites etc. Bridal Publication and sales of " ZEXY", "ZEXY INTERIOR", "ZEXY Anhelo, " etc. Operation of "ZEXY net, ", "ZEXY net mobile," etc.
  • 20. Our division MIT = "Marketing and IT" Division. Information Systems division for all company. Cost management Checking project budget spending. Project Solution Group (a.k.a PMO) Reviewing major development projects of web sites. Infrastructure Solution Group Sharing Infrastructure. Operate over 1500 servers. The group of exploring new technology is here! Board CEO Job Div. MITCar Div. ・・・
  • 21. Comparison of 4 DWH Middlewares
  • 22. Needs Prolonged process time and growing needs for analysis. From increasing access and actions, our data size increases. Evolution of shared-nothing technology. Shared-everything technology has the tendency to be expensive.. Products verified.  Proprietary RDBMS (DWH version)  Proprietary RDBMS with RAM disk  Brand new Commercial RDBMS (like PostgreSQL cluster)  Hadoop + HIVE Comparison of 4 DWH Middlewares I Hadoop HIVE O G
  • 23. I Hadoop HIVE O G Offline Perf. Reliability Scalability Serv. for Dev. Economy Graph with their features Comparison of 4 DWH Middlewares Serv. for Ope. Availability Flex./Opp. Ease of Migr. Online Perf.
  • 24. Model 1:Short Term Target (For EUC platform) Without changing programs codes. Focus on Availability and Ease of Migration. Online performance is needed. Model 2:Short / Middle Term Target (For offline processing) Small change is acceptable. Focus on Reliability. Offline performance is needed. Model 3:Long Term Target (For new needs) Can make with zero base. Focus on Economy, Scalability and Flexibility. TB or PB class data size. Comparison of 4 DWH Middlewares Evaluation model
  • 25. 0 10 20 30 40 50 60 70 80 90 100 配点 Greenplum InfoSphere Hadoop+HIVE RailGun 製品別得点比較 バッチ処理性能 基盤運用容易性 製品信頼性 拡張性 可用性 AP開発容易性 移行容易性 オンライン処理適合性 経済性 先進性/将来性 47p 71p 26p 79p I Hadoop HIVE OG points distribution Comparison of 4 DWH Middlewares Model 1:Short Term Target (For EUC platform) 0 10 20 30 40 50 60 70 80 90 100 配点 Greenplum InfoSphere Hadoop+HIVE RailGun 製品別得点比較 バッチ処理性能 基盤運用容易性 製品信頼性 拡張性 可用性 AP開発容易性 移行容易性 オンライン処理適合性 経済性 先進性/将来性 Offline Perf. Serv. for Ope. Reliability Scalability Availability Serv. for Dev. Ease of Migr. Online Perf. Economy Flex./Opp.
  • 26. 0 10 20 30 40 50 60 70 80 90 100 配点 Greenplum InfoSphere Hadoop+HIVE RailGun 製品別得点比較 バッチ処理性能 基盤運用容易性 製品信頼性 拡張性 可用性 AP開発容易性 移行容易性 オンライン処理適合性 経済性 先進性/将来性 54p 46p 62p62p Comparison of 4 DWH Middlewares I Hadoop HIVE OG points distribution Model 2:Short / Middle Term Target (For offline processing) 0 10 20 30 40 50 60 70 80 90 100 配点 Greenplum InfoSphere Hadoop+HIVE RailGun 製品別得点比較 バッチ処理性能 基盤運用容易性 製品信頼性 拡張性 可用性 AP開発容易性 移行容易性 オンライン処理適合性 経済性 先進性/将来性 Offline Perf. Serv. for Ope. Reliability Scalability Availability Serv. for Dev. Ease of Migr. Online Perf. Economy Flex./Opp.
  • 27. 0 10 20 30 40 50 60 70 80 90 100 配点 Greenplum InfoSphere Hadoop+HIVE RailGun 製品別得点比較 バッチ処理性能 基盤運用容易性 製品信頼性 拡張性 可用性 AP開発容易性 移行容易性 オンライン処理適合性 経済性 先進性/将来性 66p 53p 35p 69p Comparison of 4 DWH Middlewares I Hadoop HIVE OG points distribution Model 3:Long Term Target (For new needs) 0 10 20 30 40 50 60 70 80 90 100 配点 Greenplum InfoSphere Hadoop+HIVE RailGun 製品別得点比較 バッチ処理性能 基盤運用容易性 製品信頼性 拡張性 可用性 AP開発容易性 移行容易性 オンライン処理適合性 経済性 先進性/将来性 Offline Perf. Serv. for Ope. Reliability Scalability Availability Serv. for Dev. Ease of Migr. Online Perf. Economy Flex./Opp.
  • 29. Next Step To start Hadoop... With small, but real data. Replace some small part of our system with Hadoop one. Same output with new distributed architecture logic. To take advantage of Hadoop... For software development Getting know-how and tips through small project. To enable applying an some projects, sharing this knowledge with other teams. For operating Hadoop Must have an infra engineer familiar with Hadoop architecture. To save cost, shared infrastructure and engineers on some Hadoop project.
  • 31. Future Challenges To make Value for relevant business... Improve our business. Our thinking limits will be released by the power of Hadoop. Example: Through our web page for clients. 1. Better suggestions to sell their product with recommendation logic. 2. Near realtime reports for specific markets. Contribute community. Share our experiences to making systems for non-special users. Share our library and operation tools (maybe!).
  • 32. Copyright © 2010 NTT DATA CORPORATION Elephant Ear Cookies contact: hadoop at kits.nttdata.co.jp
  • 33. Copyright © 2010 NTT DATA CORPORATION Thank you