Data scientists drive data as a platform to answer previously unimaginable questions. These multi-talented data professionals are in demand like never before because they identify or create some of the most exciting and potentially profitable business opportunities across industries. However, a scarcity of existing external talent will require companies of all sizes to find, develop, and train their people with backgrounds in software engineering, statistics, or traditional business intelligence as the next generation of data scientists.
In this video, Cloudera's Senior Director of Data Science, Josh Wills, discusses what data scientists do, how they think about problems, the relationship between data science and Hadoop, and how Cloudera training can help you join this increasingly important profession. Following the video, Josh answers questions about machine learning, analytics platforms, applications of data science in different industries, and Cloudera's Introduction to Data Science course.
12. Big Data Economics
• No individual record is Value = f(Bytes)
particularly valuable
• Having every record is
incredibly valuable
• Web index
• Recommendation systems
• Sensor data
• Market basket analysis
• Online advertising
14. The Hadoop Distributed File System
• Based on the Google File
System
• Data stored in large files
• Large block size: 64MB to
256MB per block
• Blocks are replicated to
multiple nodes in the
cluster
15. Simple, Reliable, Distributed Processing: MapReduce
• Map Stage
• Embarrassingly parallel
• Shuffle Stage: Large-scale distributed sort
• Reduce Stage
• Process all the values that have the same key in a single step
• Process the data where it is stored
• Write once and you’re done.
29. Train Like a Data Scientist
Introduction
to Data
Hive and Pig Science
Training
Hadoop
Developer
Training
30. Introduction to Data Science:
Building Recommender Systems
http://university.cloudera.com/
31.
32. • Submit questions in the Q&A panel
Register now for Cloudera training at
http://university.cloudera.com
• Watch on-demand video of this webinar
at http://cloudera.com Use discount code DSvideo_10 to save
10% on new enrollments in Cloudera-
• Follow Josh on Twitter @josh_wills
delivered training classes until June 1
• Follow Cloudera University @ClouderaU
Use discount code 15off2 to save 15% on
• Thank you for attending! enrollments in two or more Cloudera-
delivered training classes until June 1