4. What we’re not going to talk about.
• Replacing your existing servers with hadoop
• How Hadoop compares to other databases
• How to write Map Reduce or Java
6. What is Hadoop?
• Open source Apache project
• Written in Java
• Distributed system:
– Shares large workloads
– Commodity servers
– Scales effectively
7. Map YARN
Reduce
(Java (Yet another based distributed
resource
programming negotiator)
model)
Storage
HDFS
(Hadoop Distributed File
System)
Compute
16. How to get started now:
• Download & Install a sandbox:
– Hortonworks Sandbox - http://bit.ly/1gkkCte
– Cloudera QuickStart VM - http://bit.ly/19eOwR3
– Map R Sandbox - http://bit.ly/TWZynR
• Fire it up, import some data with HDFS
Explorer - http://bit.ly/1ivuSz5
• Create a table
• Run a query…
17. To sum up…
• Hadoop is a distributed data storage and
computation engine
• Hadoop enables you to do things which were
impossible with SQL Server… (and get paid
more!)
• Get started by downloading a Sandbox – it’s
easy!