SlideShare une entreprise Scribd logo
1  sur  22
Introduction to Data Science
Lecture#1
Program: BS(DS)-Fall 2019
Instructor: Konpal Darakshan
Books:
1. Doing Data Science: Straight Talk from the Frontline
by Cathy O'Neil and Rachel Schutt.
2. The R Primer by Claus Thorn Ekstrom.
Other:
1-An Introduction to Data Science by Jeffrey M. Stanton and
Jeffrey S. Saltz.
2-Learn R for Applied Statistics: With Data Visualizations,
Regressions, and Statistics by Eric Goh Ming Hui.
3-Practical Statistics for Data Scientists: 50 Essential Concepts by
Andrew Bruce and Peter C. Bruce.
4-Data Analysis for the Life Sciences with R by Michael I. Love and
Rafael A. Irizarry.
5-R Programming for Data Science by Roger D. Peng.
Marking Scheme
• Exams
Final Exam 40 Marks
1st Hourly 15 Marks
2nd Hourly 15 Marks
• Sessional Marks
Lab Manual 5 Marks
Presentation 5 Marks
Assignments 10 Marks
Quizzes 10 Marks
Chapter # 01
Introduction: What is Data Science?
• Big Data and Data Science hype
• Getting past the hype
• Why now?
• Datafication
• Current landscape of perspectives
• Data Science Jobs
• What is data Scientist
-In Academia
-In Industry
Basic Terminologies
• Data
• It can be
-generated
-collected
-retrieved.
Simulation
Similarity Measures
Data Structures
Algorithms
• Data: facts with no meanings.
• Information: learning from facts.
• Knowledge: practical understanding of a subject.
• Understanding: the ability to absorb knowledge and learn to reason.
• Wisdom: the quality of having experience and good judgment; ability to
think and foresee.
• Validity: ways to confirm truth.
• Cross-sectional data: applied on data without time.
• Temporal data: applied on time series.
• Spatial: considers location i.e. coordinate determination in touch phones.
• Temporal cum Spatial (GIS): considers change with passage of time for example
population density.
• Measurements of Scales
There are 4 scales of measurement
• Nominal: determines classification of data i.e. male/female.
• Ordinal: determines order of data and can be numerical or non-numerical i.e. time of
day (dawn, morning, noon, afternoon, evening, night).
• Interval: gives the interval of a measurement i.e. temperature interval.
• Ratio: gives ratio of the measurement i.e. weight, height, number of children.
Big Data and Data Science Hype:
 Skeptical related to Data Sciences.
• Is data sciences only the stuff going in companies like Google, Facebook and
tech companies?
• There’s a distinct lack of respect for the researchers in academia and industry
labs who have been working on this kind of stuff for years, and whose work is
based on decades.
• The hype is crazy-In general, hype masks reality and increases the noise-to-
signal ratio.
• Statisticians already feel that they are studying and working on the “Science of
Data.”
Chapter # 01
Introduction: What is Data Science?
Getting Past the Hype
• Rachel’s experience going from getting a PhD in statistics to
working at Google. In her words:
We have a couple replies to this:
• Sure, there’s is a difference between industry and academia. But does it really
have to be that way? Why do many courses in school have to be so intrinsically
out of touch with reality?
• Even so, the gap doesn’t represent simply a difference between industry
statistics and academic statistics. The general experience of data scientists
is that, at their job, they have access to a larger body of knowledge and
methodology, as well as a process, which we now define as the data
science process, that has foundations in both statistics and computer
science.
Around all the hype, in other words, there is a ring
of truth: this is something new.
Getting Past the Hype
• We have massive amounts of data about many aspects of our lives, and
,simultaneously, What people might not know is that the “datafication” of our
offline behavior has started as well.
• On the Internet, this means Amazon recommendation systems.
• on Facebook, friend recommendations, film and music recommendations, and
so on.
• In finance, this means credit ratings, trading algorithms, and models.
• In education, this is starting to mean dynamic personalized learning and
assessments coming out of places like Knewton and Khan Academy.
• In government, this means policies based on data.
Why Now?
• In the May/June 2013 issue of Foreign Affairs, Kenneth Neil Cukier and Viktor
Mayer-Schoenberger wrote an article called “The Rise of Big Data”, In it they
discuss the concept of datafication,
They define datafication as a process of “taking all aspects of
life and turning them into data.”
• They follow up their definition in the article with a line that speaks volumes
about their perspective:
Once we datafy things, we can transform their purpose and
turn the information into new forms of value.
Datafication
Examples:
• How we quantify friendships with “likes”.
• “Google’s augmented-reality glasses datafy the gaze.
• Twitter datafies stray thoughts.
• LinkedIn datafies professional networks.
• When we “like” someone or something online, we are intending to be
datafied.
• Browse the Web, we are unintentionally through cookies.
• When we walk around in a store, or even on the street, we are being
datafied, via sensors, cameras, or Google glasses.
• Taking part in a social media experiment.
• All-out surveillance and stalking.
But it’s all datafication
Datafication
For example,
• On Quora there’s a discussion from 2010 about “What is Data Science?” and here’s
Metamarket CEO Mike Driscoll’s answer:
Data science, as it’s practiced, is a blend of Red-Bull-fueled hacking
and espresso-inspired statistics.
• Driscoll then refers to Drew Conway’s Venn diagram of data science from 2010.
Current landscape of perspectives
• Nathan Yau’s 2009 post, “Rise of the Data Scientist”, which include:
1. Statistics (traditional analysis you’re used to thinking about)
2. Data munging (parsing, scraping, and formatting data)
3. Visualization (graphs, tools, etc.)
• ASA President Nancy Geller’s 2011 Amstat News article, “Don’t shun the ‘S’
word”, in which she defends statistics:
• Then at LinkedIn and Facebook, respectively—coined the term “data scientist”
in 2008.
• Wikipedia finally gained an entry on data science in 2012.
Current landscape of perspectives
• In 2001, William Cleveland wrote a position paper about data
science called “Data Science: An action plan to expand the field of
statistics.”
• Harvard Business Review declared data scientist to be the
“Sexiest Job of the 21st Century”.
So data science existed before data scientists? Is
this semantics, or does it make sense?
Current landscape of perspectives
Data Science Jobs
• For three years running, data science has been dubbed ¨the best job in
America.¨ According to Stack Overflow, it is one of the highest paying
jobs in the software sector.
• The GDPR increased the reliance companies have on data scientists due
to the need for real-time analytics and storing data responsibly.
• There are 465 job openings in New York City alone for data scientists.
• LinkedIn recently picked data scientist as its most promising career of
2019. One of the reasons it got the top spot was that the average salary
for people in the role is $130,000.
• The January report from Indeed, one of the top job sites, showed a 29%
increase in demand for data scientists year over year and a 344%
increase since 2013 -- a dramatic upswing. But while demand -- in the
form of job postings -- continues to rise sharply, searches by job
seekers skilled in data science grew at a slower pace (14%), suggesting a
gap between supply and demand.
The growth in data scientist job postings on Indeed, from December 2016 to December 2018
OK, So What Is a Data Scientist, Really?
Perhaps the most concrete approach is to define data science is by its usage.
• In Academia
• An academic data scientist is a scientist, trained in anything from social science to
biology, who works with large amounts of data, and must grapple with
computational problems posed by the structure, size, messiness, and the
complexity and nature of the data, while simultaneously solving a real-world
problem.
• In Industry
More generally, a data scientist is someone who knows
• How to design the experiments,
• how to the process of collecting, cleaning, and munging of data.
• Skills that are also necessary for understanding biases in the data, and for
debugging logging output from code.
• Exploratory data analysis, which combines visualization and data sense.
• Find patterns, build models, and algorithms.
• Use analyses for decision making.
Data Engineers are the
data professionals who
prepare the “big data”
infrastructure to be
analyzed by Data
Scientists
Data analyst is someone
who merely curates
meaningful insights from
data.
A data scientist is a professional with the capabilities to gather large amounts of
data to analyze and synthesize the information into actionable plans for companies
and other organizations.
What Is a Data Scientist

Contenu connexe

Tendances

Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining Sulman Ahmed
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceSrishti44
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Data Visualization in Data Science
Data Visualization in Data ScienceData Visualization in Data Science
Data Visualization in Data ScienceMaloy Manna, PMP®
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataSalah Amean
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Simplilearn
 
Big data visualization
Big data visualizationBig data visualization
Big data visualizationAnurag Gupta
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 

Tendances (20)

Text MIning
Text MIningText MIning
Text MIning
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Visualization in Data Science
Data Visualization in Data ScienceData Visualization in Data Science
Data Visualization in Data Science
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
Metadata ppt
Metadata pptMetadata ppt
Metadata ppt
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Data Exploration.pptx
Data Exploration.pptxData Exploration.pptx
Data Exploration.pptx
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
Data science
Data scienceData science
Data science
 
Data analytics
Data analyticsData analytics
Data analytics
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Big data visualization
Big data visualizationBig data visualization
Big data visualization
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 

Similaire à Introduction to Data Science fundamentals

Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeJosh Cowls
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Thinkful
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data ScienceThinkful
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Thinkful
 
Nicholas Jewell MedicReS World Congress 2014
Nicholas Jewell MedicReS World Congress 2014Nicholas Jewell MedicReS World Congress 2014
Nicholas Jewell MedicReS World Congress 2014MedicReS
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...jybufgofasfbkpoovh
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptxRahulTr22
 
TL_Thompson.pptx.ppt
TL_Thompson.pptx.pptTL_Thompson.pptx.ppt
TL_Thompson.pptx.pptRGowthamRao
 
Ralph schroeder and eric meyer
Ralph schroeder and eric meyerRalph schroeder and eric meyer
Ralph schroeder and eric meyeroiisdp
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxssuser1a4f0f
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxwahiba ben abdessalem
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in EducationPhilip Piety
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfvishal choudhary
 
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...Lauri Eloranta
 
Insight white paper_2014
Insight white paper_2014Insight white paper_2014
Insight white paper_2014Lin Todd
 

Similaire à Introduction to Data Science fundamentals (20)

Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science Knowledge
 
What is Data Science
What is Data ScienceWhat is Data Science
What is Data Science
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
 
Nicholas Jewell MedicReS World Congress 2014
Nicholas Jewell MedicReS World Congress 2014Nicholas Jewell MedicReS World Congress 2014
Nicholas Jewell MedicReS World Congress 2014
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx
 
TL_Thompson.pptx.ppt
TL_Thompson.pptx.pptTL_Thompson.pptx.ppt
TL_Thompson.pptx.ppt
 
Ralph schroeder and eric meyer
Ralph schroeder and eric meyerRalph schroeder and eric meyer
Ralph schroeder and eric meyer
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in Education
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdf
 
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
 
Data literacy
Data literacyData literacy
Data literacy
 
Insight white paper_2014
Insight white paper_2014Insight white paper_2014
Insight white paper_2014
 

Dernier

Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 

Dernier (20)

Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 

Introduction to Data Science fundamentals

  • 1. Introduction to Data Science Lecture#1 Program: BS(DS)-Fall 2019 Instructor: Konpal Darakshan
  • 2. Books: 1. Doing Data Science: Straight Talk from the Frontline by Cathy O'Neil and Rachel Schutt. 2. The R Primer by Claus Thorn Ekstrom. Other: 1-An Introduction to Data Science by Jeffrey M. Stanton and Jeffrey S. Saltz. 2-Learn R for Applied Statistics: With Data Visualizations, Regressions, and Statistics by Eric Goh Ming Hui. 3-Practical Statistics for Data Scientists: 50 Essential Concepts by Andrew Bruce and Peter C. Bruce. 4-Data Analysis for the Life Sciences with R by Michael I. Love and Rafael A. Irizarry. 5-R Programming for Data Science by Roger D. Peng.
  • 3. Marking Scheme • Exams Final Exam 40 Marks 1st Hourly 15 Marks 2nd Hourly 15 Marks • Sessional Marks Lab Manual 5 Marks Presentation 5 Marks Assignments 10 Marks Quizzes 10 Marks
  • 4. Chapter # 01 Introduction: What is Data Science? • Big Data and Data Science hype • Getting past the hype • Why now? • Datafication • Current landscape of perspectives • Data Science Jobs • What is data Scientist -In Academia -In Industry
  • 5. Basic Terminologies • Data • It can be -generated -collected -retrieved. Simulation Similarity Measures Data Structures Algorithms
  • 6. • Data: facts with no meanings. • Information: learning from facts. • Knowledge: practical understanding of a subject. • Understanding: the ability to absorb knowledge and learn to reason. • Wisdom: the quality of having experience and good judgment; ability to think and foresee. • Validity: ways to confirm truth.
  • 7. • Cross-sectional data: applied on data without time. • Temporal data: applied on time series. • Spatial: considers location i.e. coordinate determination in touch phones. • Temporal cum Spatial (GIS): considers change with passage of time for example population density. • Measurements of Scales There are 4 scales of measurement • Nominal: determines classification of data i.e. male/female. • Ordinal: determines order of data and can be numerical or non-numerical i.e. time of day (dawn, morning, noon, afternoon, evening, night). • Interval: gives the interval of a measurement i.e. temperature interval. • Ratio: gives ratio of the measurement i.e. weight, height, number of children.
  • 8. Big Data and Data Science Hype:  Skeptical related to Data Sciences. • Is data sciences only the stuff going in companies like Google, Facebook and tech companies? • There’s a distinct lack of respect for the researchers in academia and industry labs who have been working on this kind of stuff for years, and whose work is based on decades. • The hype is crazy-In general, hype masks reality and increases the noise-to- signal ratio. • Statisticians already feel that they are studying and working on the “Science of Data.” Chapter # 01 Introduction: What is Data Science?
  • 9. Getting Past the Hype • Rachel’s experience going from getting a PhD in statistics to working at Google. In her words:
  • 10. We have a couple replies to this: • Sure, there’s is a difference between industry and academia. But does it really have to be that way? Why do many courses in school have to be so intrinsically out of touch with reality? • Even so, the gap doesn’t represent simply a difference between industry statistics and academic statistics. The general experience of data scientists is that, at their job, they have access to a larger body of knowledge and methodology, as well as a process, which we now define as the data science process, that has foundations in both statistics and computer science. Around all the hype, in other words, there is a ring of truth: this is something new. Getting Past the Hype
  • 11. • We have massive amounts of data about many aspects of our lives, and ,simultaneously, What people might not know is that the “datafication” of our offline behavior has started as well. • On the Internet, this means Amazon recommendation systems. • on Facebook, friend recommendations, film and music recommendations, and so on. • In finance, this means credit ratings, trading algorithms, and models. • In education, this is starting to mean dynamic personalized learning and assessments coming out of places like Knewton and Khan Academy. • In government, this means policies based on data. Why Now?
  • 12. • In the May/June 2013 issue of Foreign Affairs, Kenneth Neil Cukier and Viktor Mayer-Schoenberger wrote an article called “The Rise of Big Data”, In it they discuss the concept of datafication, They define datafication as a process of “taking all aspects of life and turning them into data.” • They follow up their definition in the article with a line that speaks volumes about their perspective: Once we datafy things, we can transform their purpose and turn the information into new forms of value. Datafication
  • 13. Examples: • How we quantify friendships with “likes”. • “Google’s augmented-reality glasses datafy the gaze. • Twitter datafies stray thoughts. • LinkedIn datafies professional networks. • When we “like” someone or something online, we are intending to be datafied. • Browse the Web, we are unintentionally through cookies. • When we walk around in a store, or even on the street, we are being datafied, via sensors, cameras, or Google glasses. • Taking part in a social media experiment. • All-out surveillance and stalking. But it’s all datafication Datafication
  • 14. For example, • On Quora there’s a discussion from 2010 about “What is Data Science?” and here’s Metamarket CEO Mike Driscoll’s answer: Data science, as it’s practiced, is a blend of Red-Bull-fueled hacking and espresso-inspired statistics. • Driscoll then refers to Drew Conway’s Venn diagram of data science from 2010. Current landscape of perspectives
  • 15. • Nathan Yau’s 2009 post, “Rise of the Data Scientist”, which include: 1. Statistics (traditional analysis you’re used to thinking about) 2. Data munging (parsing, scraping, and formatting data) 3. Visualization (graphs, tools, etc.) • ASA President Nancy Geller’s 2011 Amstat News article, “Don’t shun the ‘S’ word”, in which she defends statistics: • Then at LinkedIn and Facebook, respectively—coined the term “data scientist” in 2008. • Wikipedia finally gained an entry on data science in 2012. Current landscape of perspectives
  • 16. • In 2001, William Cleveland wrote a position paper about data science called “Data Science: An action plan to expand the field of statistics.” • Harvard Business Review declared data scientist to be the “Sexiest Job of the 21st Century”. So data science existed before data scientists? Is this semantics, or does it make sense? Current landscape of perspectives
  • 17. Data Science Jobs • For three years running, data science has been dubbed ¨the best job in America.¨ According to Stack Overflow, it is one of the highest paying jobs in the software sector. • The GDPR increased the reliance companies have on data scientists due to the need for real-time analytics and storing data responsibly. • There are 465 job openings in New York City alone for data scientists. • LinkedIn recently picked data scientist as its most promising career of 2019. One of the reasons it got the top spot was that the average salary for people in the role is $130,000. • The January report from Indeed, one of the top job sites, showed a 29% increase in demand for data scientists year over year and a 344% increase since 2013 -- a dramatic upswing. But while demand -- in the form of job postings -- continues to rise sharply, searches by job seekers skilled in data science grew at a slower pace (14%), suggesting a gap between supply and demand.
  • 18. The growth in data scientist job postings on Indeed, from December 2016 to December 2018
  • 19.
  • 20. OK, So What Is a Data Scientist, Really? Perhaps the most concrete approach is to define data science is by its usage. • In Academia • An academic data scientist is a scientist, trained in anything from social science to biology, who works with large amounts of data, and must grapple with computational problems posed by the structure, size, messiness, and the complexity and nature of the data, while simultaneously solving a real-world problem. • In Industry More generally, a data scientist is someone who knows • How to design the experiments, • how to the process of collecting, cleaning, and munging of data. • Skills that are also necessary for understanding biases in the data, and for debugging logging output from code. • Exploratory data analysis, which combines visualization and data sense. • Find patterns, build models, and algorithms. • Use analyses for decision making.
  • 21.
  • 22. Data Engineers are the data professionals who prepare the “big data” infrastructure to be analyzed by Data Scientists Data analyst is someone who merely curates meaningful insights from data. A data scientist is a professional with the capabilities to gather large amounts of data to analyze and synthesize the information into actionable plans for companies and other organizations. What Is a Data Scientist