Apache Hadoop runs best on bare metal but there are many cases when it makes sense to run clusters on cloud infrastructure. In this talk I discuss advantages, disadvantages and common problems.
At Axemblr.com our mission is to make the process of provisioning a fully functional cluster on any cloud provider as simple as possible. We are building on Cloudera Manager and the API we provide is fully compatible with Amazon Elastic Map Reduce.
Get in touch with us at hello@axemblr.com
Apache Hadoop On-Demand on Cloud Infrastructure at Java2Days
1. Apache™ Hadoop® On-Demand
on Cloud Infrastructure
Why? How? Tools?
Andrei Savu @ Cloud2Days 2012
asavu@axemblr.com
2. Me
• Co-Founder of Axemblr.com
• Co-Organiser of Bucharest JUG (bjug.ro)
• OSS contributor (Apache Whirr, jclouds)
• Worked at Adobe, Facebook
• Connect with me on LinkedIn
4. Apache™ Hadoop®
• Stack of Components (HDFS,YARN, MR,
Hive, Pig, Mahout etc.)
• Distributed Data Storage & Processing
• Designed to Scale to 1000s nodes
• Highly available at application level
5. Useful for
• Click stream analysis, transactions etc.
• General Business Intelligence
• Advanced Analytics
• Machine Learning (data driven features)
and many more (data crunching)
6. IaaS
• Infrastructure As A Service
• Programatic access to virtual machines,
servers, storage, load balancers, network
etc.
• PAYG. Good hardware utilisation. Flexible
• Public & Private
7. IaaS + Hadoop = HaaS
Hadoop-As-A-Service
Parallel Data Processing as a Service
29. Resources
• Hadoop in Cloud Infrastructure (Steve
Loughran)
http://steveloughran.blogspot.ro/2012/03/
hadoop-in-cloud-infrastructures.html
• What factors justify the use of Hadoop
http://redmonk.com/sogrady/2011/01/13/
apache-hadoop/