I've had a lot of people reaching out asking for resources and pointers to get started with concepts in Hadoop and BigData. This presentation is a brain dump of some of the resources and links I have gathered over the years. Hope it's helpful to you and you find it valuable.
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Dump of Hadoop and BigData Resources to get you started
1. Hadoop and BigData Dump
Links, Articles, Books, Blogs, Videos
and Other resources for getting
started with Hadoop and BigData
Compiled by: Fru N.
@FruLouis
https://www.linkedin.com/in/frulouis
Spring 2015
2. Read – Hadoop online tutorials <Beginners>
Hadoop - YDN - Yahoo! Developer Network
Hadoop-Skills: Big Data and Hadoop Essentials
MapR Academy: Hadoop Tutorial and Training
Videos Online
Big Data University: Learn Hadoop & Big Data with
Free Courses
IBM Hadoop tutorial
Coreservlets – hadoop tutorial
3. Read – Hadoop online tutorials . Cont.
Hadoop in General
– History of Hadoop - a lot has changed since this article was written but it’s a
great history lesson.
• http://www.wired.com/wiredenterprise/2011/10/how-yahoo-spawned-hadoop/
– Video - MinneAnalytics Conference Presentation - January 2013
– It is now 1.5 years old (ancient) - it does not deal with YARN, but is still useful
to understand the concepts of Hadoop.
• http://mediasite.csom.umn.edu/Mediasite/Play/66945ed10c2f42399dcfc51468cea4b11d
?catalog=9acfe417-9ead-4a1e-8707-d35d315afd66
– HortonWorks Whitepaper – Apache Hadoop Patterns of Use
• http://hortonworks.com/resources/?did=71&cat=1
– HortonWorks Whitepaper – Business Value of Hadoop
• http://hortonworks.com/resources/?did=94&cat=1
– Brad Hedlund - Understanding Hadoop Clusters and the
Network (Technical) *** I love his web page
• http://bradhedlund.com/category/hadoop/
• http://bradhedlund.com/2011/09/10/understanding-hadoop-clusters-and-the-network
4. Books and Literature
• If there was to be a Bible for Hadoop, I was
told this would be it.
– Hadoop: The Definitive Guide – Tom White. A
must read
• Others
– Data-Intensive Text Processing with MapReduce
by Jimmy Lin and Chris Dyer.
– Great free book (PDF download)
– Alex Holmes’s Hadoop in Practice
5. Hadoop Ecosystem and Specific Components
• Kafka -
– https://www.packtpub.com/big-data-and-business-inteliigence/apache-kafka
• Hive -
– http://shop.oreilly.com/product/0636920023555.do
• Cascading -
– http://docs.cascading.org/cascading/2.1/userguide/html/
• Solr -
– http://www.manning.com/grainger/?a_aid=1&a_bid=39472865
• Pig -
– http://shop.oreilly.com/product/0636920018087.do
• Scala -
– Most of the new tooling and components are being written in Scala, so you should be aware of
it/learn it -
– http://pragprog.com/book/vsscala/programming-scala
• An Introspective Survey of Data Scientists and Their Work. (Explains Data Science Job
Roles better than any other book or article I've ever seen)
– http://cdn.oreillystatic.com/oreilly/radarreport/0636920029014/Analyzing_the_Analyzers.pdf
• Data Science for Business: What you need to know about data mining and data-
analytic thinking. by Foster Provost
– http://www.amazon.com/Data-Science-Business-data-analytic-thinking/dp/1449361323
6. Hadoop In General
• HDP 2.1/YARN Resources
– Apache Hadoop 2/YARN
• http://hortonworks.com/blog/apache-hadoop-2-is-ga/
• http://hortonworks.com/blog/introducing-apache-
hadoop-yarn/ *** 5 part blog series - excellent intro
– What’s new in HDP 2.1 - contains eight 30 minute
videos on the right-hand navigation
– http://hortonworks.com/hdp/whats-new/
7. Hadoop In General - Hive
• Hive on Tez, a performance deep dive - very
technical… an excellent presentation with A
LOT of detail
– http://www.slideshare.net/t3rmin4t0r/hivetez-a-
performance-deep-dive
• Hive Cheat Sheet - a little out-of-date, but still
very useful
– http://hortonworks.com/wp-
content/uploads/downloads/2013/08/Hortonwor
ks.CheatSheet.SQLtoHive.pdf
8. Hadoop In General - Security
• Halfway down the page are links to 5 security
presentations from the Hadoop Summit 2014
– http://hortonworks.com/blog/hadoop-summit-
curated-content-apache-hadoop-security/
9. Hadoop In General - Training
• Classroom Training - Free
– Free Course - Introduction to Data Science
• https://www.coursera.org/course/datasci
• Classroom for a fee
– Hortonworks Training:
– http://hortonworks.com/training/
• Hortonworks Self-Paced Learning Classes
10. Skepticism and Challenges
• Article Challenging Hadoop. Guess not
everybody is pleased with the MapReduce
paradigm - Must Read
• http://cacm.acm.org/blogs/blog-cacm/177467-hadoop-
at-a-crossroads/fulltext
11. Articles and blogs
• Great Article: Running Hadoop on Ubuntu Linux (Multi-Node Cluster)
– http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
• HDFS Slurper
– https://github.com/alexholmes/hdfs-file-slurper
• Hadoop Tutorials
– https://developer.yahoo.com/hadoop/tutorial/
– http://www.infoq.com/articles/apps-with-hadoop
– https://www.cs.rutgers.edu/~pxk/417/notes/content/mapreduce.html
• More links
– http://prezi.com/jfpgkdbj-1if/data-in-and-data-out-using-hadoop-to-create-data-products-at-
linkedin/
– http://inmaps.linkedinlabs.com/
– http://data.linkedin.com/publications
– http://search-hadoop.com/
– https://www.openhub.net/
– www.javavids.com - Java Tutorials
– http://data.linkedin.com/
• Even more
– http://hadooped.blogspot.com/2013/06/apache-oozie-part-1-workflow-with-hdfs.html
– http://tresata.com/
– http://www.bigdataconsultants.blogspot.com/
12. Videos , Playlists and Links
• How To Become A Data Scientist -- SF Data Science
– https://www.youtube.com/watch?v=c52IOlnPw08&list=PLlE6hb-
4NUuQwtq1A97ZJEFDQD1X068Ca&index=18
• Easiest way to install / setup hadoop | Hadoop tutorial
– https://www.youtube.com/watch?v=_qLTMpdP7H4&list=PLlE6hb-
4NUuQwtq1A97ZJEFDQD1X068Ca&index=21
• More to be added to this list…
13. Hadoop and BigData Outlook
• Take a look at Apache incubator projects.
Should give you a good sense of what is
coming.
• http://incubator.apache.org/projects/
15. Job / Resume and Career
• Job and Resume
– Sample Hadoop Developer Resume. Really like
this. Take a look at skillsets present in there. Very
Diverse and heavy Linux.
• http://news.dice.com/2014/04/15/sample-resume-
hadoop-developer/
16. Subscriptions
• Subscriptions
– Hadoop Weekly
• One of the best ways to stay up to date. Subscribe to this for
weekly newsletters about everything Hadoop. Very handy.
– http://www.hadoopweekly.com/
– Allthings Hadoop Podcasts
• http://allthingshadoop.com/podcast/
• http://feeds.feedburner.com/allthingshadoop/kjGc
• Tip: There are many alternatives in the market, but I use Feedly and bring all my
subscriptions in there for easy access when I want to read. http://feedly.com/
17. NOT NOT THE END
I’ve collected these links over the years. I apologize if any of them are
broken by now.
I will for sure be updating as I get more information and as the
platforms evolve.
Also, please suggest more to me if you have some of you own. thanks.
Fru N.
@FruLouis
https://www.linkedin.com/in/frulouis