SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
INTRODUCTIONTO
DATA SCIENCE
BY: ELSAYED FATHY KHALF
ABOUT DATA SCIENCE
• we can say that it is the interdisciplinary field concerned
with generating business insights and value by processing
large amounts of data. But that does not help us grasp the
complexity and nuanced nature of the discipline. Nor does
it provide us with an understanding of how it is shaping the
technological and business landscapes.
1 -The different data science fields
Traditional data: Data in the form of tables containing numeric or text
values; Data that is structured and stored in databases
Big data: Extremely large data; Humongous in terms of volume. It can
be in various formats:
- structured
- semi-structured
- unstructured
1 -The different data science fields
1 -The different data science fields
1-The different data science fields
1-The different data science fields
1-The different data science fields
2 - Analysis vs analytics
• Analysis – Dividing data into digestible components that
are easier to understand and examining how different
parts relate to each other. Performed on past data,
explaining why the story ended in the way that it did. We
want to explain ‘how’ and ‘why’ something happened.
• Analytics – Explores the future. The application of logical
and computational reasoning to the component parts
obtained in an analysis. In doing this, you are looking for
patterns and exploring what you can do with them in the
future.
3 – Qualitative vs Quantitative analytics
• Qualitative analytics – using intuition and experience in
conjunction with analysis to plan your next business move
• Quantitative analytics – applying formulas and algorithms
to numbers that you have gathered from your analysis.
4 -Business Intelligence
• Business Intelligence : can be seen as the preliminary step
of predictive analytics. First, you analyze past data and
then using these inferences would allow you to create
appropriate models that could predict the future of your
business accurately.
• Business Intelligence (BI) : is the process of analyzing and
reporting historical business data. After reports and
dashboards have been prepared, they can be used to make
informed strategic and business decisions by end-users
such as the general manager. Concisely put, business
intelligence aims to explain past events using business
data.
5 -Machine learning
• AI simulates human knowledge and decision making with
computers. Humans have managed to reach AI through
machine and deep learning.
• Machine learning is the ability of machines to predict
outcomes without being explicitly programmed to do so. It is
about creating and implementing algorithms that let
machines receive data and use this data to:
Make predictions | Analyze patterns | Give recommendations
• Symbolic reasoning is a type of AI that makes an
exception and does not use ML and deep learning.
6 -Traditional data:Techniques
• Class labelling: Labelling the data point to the correct data
type (or arranging data by category).
• Data cleansing: (‘data cleaning’, ‘data scrubbing’): deal with
inconsistent data. For example, working on a dataset
containing US states and finding that some of the names are
misspelled.
• Data shuffling: Shuffling the observations from your dataset
just like shuffling a deck of cards. This will ensure your
dataset is free from unwanted patterns caused by
problematic data collection.
6 -Traditional data:Techniques
• Data balancing: Ensuring that the sample gives equal
priority to each class. For example, if you work with a
dataset that contains 80% male and 20% female data, and
you know that the population contains approximately 50%
men and 50% women, then you need apply a balancing
technique to counteract this problem (using an equal
number of data from each group).
6.1-Traditional data: Examples
• Numerical variable: Numbers that are easily manipulated
(for ex. Added), which gives us useful information
• Categorical variable: Numbers that hold no numerical value
can be considered categorical data. Dates are also
considered categorical data.
7 - Big data:Techniques
• Text data mining: The process of deriving valuable,
unstructured data from text.
• Data masking: As a business, when you work with user
private data, you must be able to preserve confidential
information. However, this doesn’t mean that the data can’t
be touched or used for analysis. Instead, you must apply
some data masking techniques to utilize the information
without compromising private details. In essence, data
masking conceals the original data with random and false
data, allowing you to conduct analysis and keep confidential
information in a secure place.
7.1 - Big data: Examples
• We find big data in increasingly more industries and
companies.
• Probably the most notable example of a company leveraging
the true potential of big data is Facebook. The company
keeps track of its users’ names, personal data, photos,
videos, recorded messages and so on. This means their data
has a lot of variety. And with 2 billion users worldwide, the
volume of data stored on their servers is tremendous.
8 - Business IntelligenceTechniques
• Metric: refers to a value that derives from the measures you
have obtained and aims at gauging business performance or
progress. Has a business meaning attached to it.
• Measure: simple descriptive statistics of past performance
Metric = Measure + Business meaning
• KPIs: It doesn’t make sense to keep track of all metrics. So,
companies choose to focus on the most important ones.
KPIs = metrics + Business objective
Filtering out the boring metrics and turning the interesting and informative
KPIs into easily understood and comparable visualizations is an important part
of the business intelligence analyst job.
8.1 - Business Intelligence Examples
• BI allows you to adjust your strategy to past data as soon
as it is available. If done right, Business Intelligence will
help to efficiently manage your shipment logistics and, in
turn, reduce costs and increase profit.
9 - Traditional methods:Techniques
• (classical statistical methods for forecasting)
• Clustering: grouping the data in neighborhoods to analyze
meaningful patterns
• Time series: used in economics and finance, showing the
development of certain values over time, such as stock
prices or sales volume.
9.1 - Traditional methods : Examples
• The application of traditional methods is extremely broad.
Two relevant examples:
• Forecasting sales data: using time series data to predict a
firm’s future expected sales
• UX: plot customer satisfaction and customer revenue to find
that each cluster represents a different geographical
location
10 - Machine Learning: Techniques
• There are four ingredients for machine learning: data,
model, objective function, optimization algorithm
• Model: the computer uses an algorithm to recognize certain
types of patterns
• Objective function: specification of the machine learning
problem; a function to be maximized or minimized
depending on the task at hand
• Optimization algorithm: A process in which previous
solution of the problem are compared until reaching an
optimal solution
10.1- Machine Learning:Types
• Three main types of machine learning:
• Supervised learning :Training an algorithm resembles a
teacher supervising her students. Provides feedback every
step of the way. Telling students whether they did ‘good’ or
whether they need to improve their performance. When
using supervised learning you use labelled data (every data
point is categorized as ‘good’ performance or as
‘performance that needs improvement’ in our example).
10.1- Machine Learning:Types
• Three main types of machine learning:
• Unsupervised learning : In this case, the algorithm trains
itself. There isn’t a teacher who provides feedback. The
algorithm uses unlabeled data that is not categorized as
‘good’ or as ‘performance that needs improvement’. The
unsupervised ML model simply uses the data and sorts in
different groups. In our example, it will be able to show us
two groups – ‘good performing’ and ‘performance that
needs to be improved’, however the ML model would not be
able to tell us which one is which.
10.1- Machine Learning:Types
• Three main types of machine learning:
• Reinforcement learning: A reward system is introduced.
Every time a student does a task better than it used to in the
past they will receive a reward (and nothing if the task is
not performed better). Instead of minimizing an error, we
maximize a reward, or in other words, maximizing the
objective function.
• Deep learning – the modern state-of-the-art approach to
machine learning – leverages the power of neural networks
and can be placed in both categories – supervised and
unsupervised learning.
11- Common data science tools
• There are two main types of tools
• Programming languages : enable you to devise programs
that can execute specific operations. Moreover, you can
reuse these programs whenever you need to execute the
same action. the most popular programming language for
data science is Python followed by R.
- Python and R have their limitations. They are not able
to address problems specific to some domains. One
example is ‘relational database management systems’.
In these instances, SQL works best.
11- Common data science tools
• There are two main types of tools
• terms of software : Excel plays an important role. It is
able to do relatively complex computations and good
visualizations quickly. SPSS is another popular tool for
working with traditional data and applying statistical
analysis.
-There is a significant amount of software designed for
working with big data – Apache Hadoop, Apache Hbase,
and Mongo.
PowerBI, Qlik, Tableau are top-notch examples of
software designed for business intelligence visualizations.
12- Data science job positions
• Data architect – designs the way data will be retrieved
processed and consumed
• Data engineer – process the obtained data so that it is ready
for analysis
• Database administrator – handles this control of data;
works with traditional data
• BI analyst – performs analyses and reporting of past
historical data
12- Data science job positions
• BI consultant – ‘external BI analyst’
• BI developer – performs analyses specifically designed for
the company
• Data scientist – employs traditional statistical methods or
unconventional machine learning techniques for making
predictions
• Data analyst – prepare advanced analyses
• Machine learning engineer – applies state-of-the-art ML
techniques
Resource : Martin Ganchev

Contenu connexe

Similaire à Introduction to data science.pdf

Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Rohit Dubey
 
Data Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili SaghafiData Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili SaghafiProfessor Lili Saghafi
 
BA Overview.pptx
BA Overview.pptxBA Overview.pptx
BA Overview.pptxSuKuTurangi
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxShanmugasundaram M
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needGibDevs
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analyticssunnypatil1778
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxsumitkumar600840
 
Data Analytics and Big Data on IoT
Data Analytics and Big Data on IoTData Analytics and Big Data on IoT
Data Analytics and Big Data on IoTShivam Singh
 
Morden EcoSystem.pptx
Morden EcoSystem.pptxMorden EcoSystem.pptx
Morden EcoSystem.pptxpriti jadhao
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Ali Alkan
 
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxDATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxOTA13NayabNakhwa
 
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptxUnit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptxtesfkeb
 
Chapter 1 - Introduction to Business Analytics.pptx
Chapter 1 - Introduction to Business Analytics.pptxChapter 1 - Introduction to Business Analytics.pptx
Chapter 1 - Introduction to Business Analytics.pptxAhmadM65
 
Business Intelligence and Analytics .pptx
Business Intelligence and Analytics .pptxBusiness Intelligence and Analytics .pptx
Business Intelligence and Analytics .pptxRupaRani28
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptxVrishit Saraswat
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data AnalyticsUtkarsh Sharma
 

Similaire à Introduction to data science.pdf (20)

Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
 
Data Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili SaghafiData Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili Saghafi
 
BA Overview.pptx
BA Overview.pptxBA Overview.pptx
BA Overview.pptx
 
Analytics
AnalyticsAnalytics
Analytics
 
1-210217184339.pptx
1-210217184339.pptx1-210217184339.pptx
1-210217184339.pptx
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
KIT601 Unit I.pptx
KIT601 Unit I.pptxKIT601 Unit I.pptx
KIT601 Unit I.pptx
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptx
 
Data Analytics and Big Data on IoT
Data Analytics and Big Data on IoTData Analytics and Big Data on IoT
Data Analytics and Big Data on IoT
 
Morden EcoSystem.pptx
Morden EcoSystem.pptxMorden EcoSystem.pptx
Morden EcoSystem.pptx
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
 
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxDATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
 
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptxUnit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
 
Chapter 1 - Introduction to Business Analytics.pptx
Chapter 1 - Introduction to Business Analytics.pptxChapter 1 - Introduction to Business Analytics.pptx
Chapter 1 - Introduction to Business Analytics.pptx
 
Business Intelligence and Analytics .pptx
Business Intelligence and Analytics .pptxBusiness Intelligence and Analytics .pptx
Business Intelligence and Analytics .pptx
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 

Dernier

Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 

Dernier (17)

Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 

Introduction to data science.pdf

  • 2. ABOUT DATA SCIENCE • we can say that it is the interdisciplinary field concerned with generating business insights and value by processing large amounts of data. But that does not help us grasp the complexity and nuanced nature of the discipline. Nor does it provide us with an understanding of how it is shaping the technological and business landscapes.
  • 3. 1 -The different data science fields Traditional data: Data in the form of tables containing numeric or text values; Data that is structured and stored in databases Big data: Extremely large data; Humongous in terms of volume. It can be in various formats: - structured - semi-structured - unstructured
  • 4. 1 -The different data science fields
  • 5. 1 -The different data science fields
  • 6. 1-The different data science fields
  • 7. 1-The different data science fields
  • 8. 1-The different data science fields
  • 9. 2 - Analysis vs analytics • Analysis – Dividing data into digestible components that are easier to understand and examining how different parts relate to each other. Performed on past data, explaining why the story ended in the way that it did. We want to explain ‘how’ and ‘why’ something happened. • Analytics – Explores the future. The application of logical and computational reasoning to the component parts obtained in an analysis. In doing this, you are looking for patterns and exploring what you can do with them in the future.
  • 10. 3 – Qualitative vs Quantitative analytics • Qualitative analytics – using intuition and experience in conjunction with analysis to plan your next business move • Quantitative analytics – applying formulas and algorithms to numbers that you have gathered from your analysis.
  • 11. 4 -Business Intelligence • Business Intelligence : can be seen as the preliminary step of predictive analytics. First, you analyze past data and then using these inferences would allow you to create appropriate models that could predict the future of your business accurately. • Business Intelligence (BI) : is the process of analyzing and reporting historical business data. After reports and dashboards have been prepared, they can be used to make informed strategic and business decisions by end-users such as the general manager. Concisely put, business intelligence aims to explain past events using business data.
  • 12. 5 -Machine learning • AI simulates human knowledge and decision making with computers. Humans have managed to reach AI through machine and deep learning. • Machine learning is the ability of machines to predict outcomes without being explicitly programmed to do so. It is about creating and implementing algorithms that let machines receive data and use this data to: Make predictions | Analyze patterns | Give recommendations • Symbolic reasoning is a type of AI that makes an exception and does not use ML and deep learning.
  • 13. 6 -Traditional data:Techniques • Class labelling: Labelling the data point to the correct data type (or arranging data by category). • Data cleansing: (‘data cleaning’, ‘data scrubbing’): deal with inconsistent data. For example, working on a dataset containing US states and finding that some of the names are misspelled. • Data shuffling: Shuffling the observations from your dataset just like shuffling a deck of cards. This will ensure your dataset is free from unwanted patterns caused by problematic data collection.
  • 14. 6 -Traditional data:Techniques • Data balancing: Ensuring that the sample gives equal priority to each class. For example, if you work with a dataset that contains 80% male and 20% female data, and you know that the population contains approximately 50% men and 50% women, then you need apply a balancing technique to counteract this problem (using an equal number of data from each group).
  • 15. 6.1-Traditional data: Examples • Numerical variable: Numbers that are easily manipulated (for ex. Added), which gives us useful information • Categorical variable: Numbers that hold no numerical value can be considered categorical data. Dates are also considered categorical data.
  • 16. 7 - Big data:Techniques • Text data mining: The process of deriving valuable, unstructured data from text. • Data masking: As a business, when you work with user private data, you must be able to preserve confidential information. However, this doesn’t mean that the data can’t be touched or used for analysis. Instead, you must apply some data masking techniques to utilize the information without compromising private details. In essence, data masking conceals the original data with random and false data, allowing you to conduct analysis and keep confidential information in a secure place.
  • 17. 7.1 - Big data: Examples • We find big data in increasingly more industries and companies. • Probably the most notable example of a company leveraging the true potential of big data is Facebook. The company keeps track of its users’ names, personal data, photos, videos, recorded messages and so on. This means their data has a lot of variety. And with 2 billion users worldwide, the volume of data stored on their servers is tremendous.
  • 18. 8 - Business IntelligenceTechniques • Metric: refers to a value that derives from the measures you have obtained and aims at gauging business performance or progress. Has a business meaning attached to it. • Measure: simple descriptive statistics of past performance Metric = Measure + Business meaning • KPIs: It doesn’t make sense to keep track of all metrics. So, companies choose to focus on the most important ones. KPIs = metrics + Business objective Filtering out the boring metrics and turning the interesting and informative KPIs into easily understood and comparable visualizations is an important part of the business intelligence analyst job.
  • 19. 8.1 - Business Intelligence Examples • BI allows you to adjust your strategy to past data as soon as it is available. If done right, Business Intelligence will help to efficiently manage your shipment logistics and, in turn, reduce costs and increase profit.
  • 20. 9 - Traditional methods:Techniques • (classical statistical methods for forecasting) • Clustering: grouping the data in neighborhoods to analyze meaningful patterns • Time series: used in economics and finance, showing the development of certain values over time, such as stock prices or sales volume.
  • 21. 9.1 - Traditional methods : Examples • The application of traditional methods is extremely broad. Two relevant examples: • Forecasting sales data: using time series data to predict a firm’s future expected sales • UX: plot customer satisfaction and customer revenue to find that each cluster represents a different geographical location
  • 22. 10 - Machine Learning: Techniques • There are four ingredients for machine learning: data, model, objective function, optimization algorithm • Model: the computer uses an algorithm to recognize certain types of patterns • Objective function: specification of the machine learning problem; a function to be maximized or minimized depending on the task at hand • Optimization algorithm: A process in which previous solution of the problem are compared until reaching an optimal solution
  • 23. 10.1- Machine Learning:Types • Three main types of machine learning: • Supervised learning :Training an algorithm resembles a teacher supervising her students. Provides feedback every step of the way. Telling students whether they did ‘good’ or whether they need to improve their performance. When using supervised learning you use labelled data (every data point is categorized as ‘good’ performance or as ‘performance that needs improvement’ in our example).
  • 24. 10.1- Machine Learning:Types • Three main types of machine learning: • Unsupervised learning : In this case, the algorithm trains itself. There isn’t a teacher who provides feedback. The algorithm uses unlabeled data that is not categorized as ‘good’ or as ‘performance that needs improvement’. The unsupervised ML model simply uses the data and sorts in different groups. In our example, it will be able to show us two groups – ‘good performing’ and ‘performance that needs to be improved’, however the ML model would not be able to tell us which one is which.
  • 25. 10.1- Machine Learning:Types • Three main types of machine learning: • Reinforcement learning: A reward system is introduced. Every time a student does a task better than it used to in the past they will receive a reward (and nothing if the task is not performed better). Instead of minimizing an error, we maximize a reward, or in other words, maximizing the objective function. • Deep learning – the modern state-of-the-art approach to machine learning – leverages the power of neural networks and can be placed in both categories – supervised and unsupervised learning.
  • 26. 11- Common data science tools • There are two main types of tools • Programming languages : enable you to devise programs that can execute specific operations. Moreover, you can reuse these programs whenever you need to execute the same action. the most popular programming language for data science is Python followed by R. - Python and R have their limitations. They are not able to address problems specific to some domains. One example is ‘relational database management systems’. In these instances, SQL works best.
  • 27. 11- Common data science tools • There are two main types of tools • terms of software : Excel plays an important role. It is able to do relatively complex computations and good visualizations quickly. SPSS is another popular tool for working with traditional data and applying statistical analysis. -There is a significant amount of software designed for working with big data – Apache Hadoop, Apache Hbase, and Mongo. PowerBI, Qlik, Tableau are top-notch examples of software designed for business intelligence visualizations.
  • 28. 12- Data science job positions • Data architect – designs the way data will be retrieved processed and consumed • Data engineer – process the obtained data so that it is ready for analysis • Database administrator – handles this control of data; works with traditional data • BI analyst – performs analyses and reporting of past historical data
  • 29. 12- Data science job positions • BI consultant – ‘external BI analyst’ • BI developer – performs analyses specifically designed for the company • Data scientist – employs traditional statistical methods or unconventional machine learning techniques for making predictions • Data analyst – prepare advanced analyses • Machine learning engineer – applies state-of-the-art ML techniques
  • 30. Resource : Martin Ganchev