  1. 1. Data Skills for Digital Era The Top Data Skills You Need To Get Hired
  2. 2. Main Focus Data Science Business Intelligence Big Data Data Engineering Mohtat@ut.ac.ir 2
  3. 3. Data Science Math & Statistics Computer Science Subject Matter Expertise Mohtat@ut.ac.ir 4 Data Science is an interdisciplinary field about processes and systems to extract knowledge or insights from data, which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics, similar to Knowledge Discovery in Databases (KDD).
  4. 4. Types of Analytics Descriptive Diagnostic Prescriptive Predictive Mohtat@ut.ac.ir 6
  5. 5. Data Science Technology Application Mohtat@ut.ac.ir 8
  6. 6. Critical Skills for Data Scientists Python R SQL Data Mining Tools Knime , RapidMiner, IBM SPSS Modeler Excel BI Tools Tableau, Power BI, Qlik Mohtat@ut.ac.ir 9
  7. 7. Top Python Libraries in Data Science TensorFlow “TensorFlow is an open source software library for numerical computation using data flow graphs. PyTorch “PyTorch is a Python package that provides Deep neural networks built on a tape-based autograd system Numpy “NumPy is the fundamental package needed for scientific computing with Python. Scikit-Learn “scikit-learn is a Python module for machine learning built on NumPy, SciPy and matplotlib. Keras “Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. Scipy “SciPy is open-source software for mathematics, science, and engineering. Pandas “pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive Matplotlib “Matplotlib is a Python 2D plotting library which produces publication- quality figures in a variety of hardcopy formats and interactive environments across platforms. Scrapy “Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. Mohtat@ut.ac.ir 10
  8. 8. Top Skills every Data Scientist needs to Master TensorFlow Keras Hadoop Spark Hive Java Matlab Mohtat@ut.ac.ir 11
  9. 9. Most Essential Skills for Data Scientists Complex Problem Solving Team Working Emotional Intelligence Creativity Critical Thinking Negotiation Mohtat@ut.ac.ir 12
  10. 10. Applied Data Science with Python Michigan University(Coursera) Basic Data Visualization Machine Learning Text Mining SNA Applied Text Mining in Python Introduction to Data Science in Python Applied Plotting, Charting & Data Representation in Python Applied Machine Learning in Python Applied Social Network Analysis in Python Mohtat@ut.ac.ir 13LOGO HERE
  11. 11. Data Science Books 14
  12. 12. The Long Road To Become a Data Scientist
  13. 13. Business Intelligence encompasses a wide variety of tools, applications and methodologies that enable organizations to collect data from internal systems and external sources; prepare it for analysis; develop and run queries against that data; and create reports, dashboards and data visualizations to make the analytical results available to corporate decision-makers, as well as operational workers. BI Mohtat@ut.ac.ir 17 Business Skills Link to Business Strategy Define Priorities Define BI Vision Lead Organization / BPR Analytics Skills Data Mining Social BI IT Skills Infrastructure Build Technology Data Integration & Quality
  14. 14. Business Intelligence Architect Simple is what it needs in business
  15. 15. Top Business Intelligence Skills SQL Data Warehousing Data Analysis Tableau ETL 23% 85% 28% 41% 65% Mohtat@ut.ac.ir 20 28%
  16. 16. Top Business Intelligence Skills Business Analyst Oracle SQL Server BI Business Process Data Modeling 17% 85% 19% 21% 22% Mohtat@ut.ac.ir 21 19%
  17. 17. Top Business Intelligence Tools Tableau Power BI Qlik Your Choice Is Clear Mohtat@ut.ac.ir 22
  18. 18. Big Data Volume Terabyte Distribute Big Table Velocity Real-time Stream Processing Variety Structured Unstructured Text, Image, Video Mohtat@ut.ac.ir 27 Big data is a term used to refer to data sets that are too large or complex for traditional data-processing application software to adequately deal with. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
  19. 19. Hadoop Ecosystem
  20. 20. 3 Types of Big Data Jobs 1 2 3 Big Data Developer Big Data Administration Big Data Analytics Mohtat@ut.ac.ir 29
  21. 21. Top Big Data Programming Languages Not only Hadoop, many other big data analysis tools like Storm, Spark, and Kafka are written in Java and run on the JVM Java Python is a simple, open-source, general-purpose language. Hence, it is easy to learn Python for anyone.. With its rich set of utilities and libraries and easy-to-use features, it works wonder for big data processing and analysis. Python Scala is a rival of Java and Python in the world of Data Science and becoming more and more popular due to extensive use of Apache Spark in Big data Hadoop industry. Scala Mohtat@ut.ac.ir 30
  22. 22. Pathway to Success Success Apache Hadoop Apache Spark Start NoSQL Database Data Analytics Data Visualization Mohtat@ut.ac.ir 31
  23. 23. Big Data Companies & Vendors Cloudera, Inc. is a US-based software company that provides a software platform for data engineering, data warehousing, machine learning and analytics that runs in the cloud or on premises Cloudera MapR is a business software company headquartered in Santa Clara, California. MapR provides access to a variety of data sources from a single computer cluster, including big data workloads MapR Hortonworks is a data software company based in Santa Clara, California that develops, supports, and provides expertise on a set of open-source software designed to manage data and processing for things such as IOT, single view of X, and advanced analytics and machine learning Hortonworks
  24. 24. 34 ‫داده‬‫کالن‬ ‫زیرساخت‬ ‫اجرا‬ ‫و‬ ‫نصب‬ Mohtat@ut.ac.ir
  25. 25. 35 ‫داده‬‫کالن‬ ‫زیرساخت‬ ‫اجرا‬ ‫و‬ ‫نصب‬ Mohtat@ut.ac.ir
  26. 26. Big Data Specialization Michigan University(Coursera) Introduction to Big Data Big Data Modeling and Management Systems Big Data Integration and Processing Machine Learning With Big Data Graph Analytics for Big Data Mohtat@ut.ac.ir 36LOGO HERE
  27. 27. Apache Spark Berkeley University Mohtat@ut.ac.ir 37LOGO HERE
  28. 28. Big Data Book 38
  29. 29. Data Scientist VS Data Engineer Mohtat@ut.ac.ir 40 Dolor sit ametis Data Engineering Data Scientist Data Pipelines Visualization & Storytelling Programming Modeling & Advance Analytics Math & Statistics System Implementation
  30. 30. Data Engineering Data engineers develop, maintain, test and evaluate data solutions within organizations. ... A data engineer builds large-scale data processing systems, is an expert in data warehousing solutions and should be able to work with the latest (NoSQL) database technologies. Clean and wrangle data into a usable state Mohtat@ut.ac.ir 41
  31. 31. How To Become A Data Engineer Linux NoSQL & SQL Python / Java / Scala Agile Development Data Ingestion Processing Frameworks Mohtat@ut.ac.ir 42
  32. 32. Best Data Processing Frameworks MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster Apache Spark is an open- source distributed general-purpose cluster- computing framework. Apache Storm is a free and open source distributed realtime computation system. The core of Apache Flink is a distributed streaming dataflow engine written in Java and Scala 43
  33. 33. Cassandra Best NoSQL Database Mohtat@ut.ac.ir 44
  34. 34. Data Ingestion Tools Apache Kafka SSIS & ODI Apache NiFi Logstash Mohtat@ut.ac.ir 45
