In This Data Science course ( Graduate Program ) I will focus on understanding business intelligence systems and helping future managers use and understand analytics, Business Intelligence emphasizing the applications and implementations behind the concepts. a solid foundation of BI that is reinforced with hands-on practice. The course is also designed as an introduction to programming and statistics for students from many different majors. It teaches practical techniques that apply across many disciplines and also serves as the technical foundation for more advanced courses in data science, statistics, and computer science.
4.16.24 21st Century Movements for Black Lives.pptx
Data science unit 1 By: Professor Lili Saghafi
1. Data Science
Unit 1
By : Professor Lili Saghafi
https://professorlilisaghafiquantumcomputing.wor
dpress.com
proflilisaghafi@gmail.com
https://sites.google.com/site/professorlilisaghafi/home
@Lili_PLS
2. Introduction
• The course is designed as an introduction
to programming and statistics for students
from many different majors.
• It teaches practical techniques that apply
across many disciplines and also serves
as the technical foundation for more
advanced courses in data
science, statistics, and computer science.
3. Programming Prerequisite
• No prior programming experience is necessary,
but many of the programming techniques
covered in this course do not appear in a typical
introduction to programming.
• The programming content of this course focuses
on manipulating data tables, rather than
building software applications.
• Students who take the course after taking other
programming courses often learn a new
approach to programming that they haven't
encountered before.
4.
5. Statistic Prerequisite
• No prior statistics experience is necessary, but
many of the statistical inference
techniques covered in this course do not appear
in an undergraduate statistics curriculum.
• The techniques in this course rely heavily on
sampling and simulation, and they require
computers to carry out.
• Students who have taken statistics courses
before often learn new methods to complement
what they already know.
6. Understanding problem domains
Prerequisite
• Data science is more than just a combination
of programming and statistics.
• Effective data science requires understanding
problem domains and correctly interpreting
domain-specific approaches.
• The examples in this course are largely drawn
from real-world data sets, and one of the main
goals of this course is to develop the ability to
apply analysis and prediction techniques to real-
world scenarios.
7. NO Prerequisite
• This course is designed specifically
for those who have not previously
taken statistics or computer science
courses.
8. Equipment and Supplies
• A computer
• R Studio (https://cran.r-project.org/ )
• Math Player
• NVDA reader
• SAS or Python
• MS Azure
• A browser that supports Jupyter (Project Jupyter exists to develop
open-source software, open-standards, and services for interactive
computing across dozens of programming languages).
https://jupyter.org/
• Jupyter notebooks to complete lab assignments.
• We highly recommend using Google Chrome to complete Jupyter
notebook lab assignments. https://jupyter.org/
9. Using Jupyter Notebooks on
Microsoft Azure
• https://notebooks.azure.com
• an overview of using Jupyter Notebooks
with Python 3.
• For further information on Jupyter
Notebooks see the documentation
at http://Jupyter.org.
21. Jupyter notebook
• Jupyter notebook, and it's not running on your computer. Instead,
Google has generously donated compute cloud credits so that we
can run your code on Google's machines in order to execute
whatever examples you want, including all of the labs for the
course.So, thanks, Google!
• You'll learn about how to use this Jupyter environment in the labs.
https://jupyter.org/try OR https://jupyter.org/install.html
• For now, all you need to know is that you can run whatever
examples you want by clicking on a cell, holding down shift, and
pressing return or enter. So in this case, we told the computer to add
two and two together and that made four.
• Now the examples are going to get a lot more interesting soon, and
you'll learn how to use this environment, which is one of the most
popular environments for data science work out there in the world
today.
22. Jupyter notebook
• Thanks to Google's support, all of the
software relevant to the course is already
pre-installed on their systems that you
have access to, so you can start working
on examples right without having to install
anything. https://jupyter.org/try
27. Why Data science?
• It's about taking large data sets and trying
to make them useful or
informative,especially for understanding
the world or making informed decisions.
• We need to use ideas from computing,
ideas from statistics and also domain
knowledge that informs what the data
really represents.
28. Domain knowledge
• You can't do an analysis in the legal
domain without understanding something
about the law, so that's what we mean by
domain knowledge.
• It's that you really have to understand
when you have a data set, some big table
of numbers and descriptions, what's really
going on behind those numbers and what
they represent about the world.
29. So what is data science?
• What do you get when you combine
computing and statistics and domain
knowledge together?
• You get a science that's about drawing
useful conclusions from data using
computation as our primary tool.
31. 1-Exploration
• Exploration is figuring out what patterns exist in
the data.
• When you have many observations about some
phenomenon, what can you conclude about the
phenomenon itself?
• Instead of just looking at large tables of
numbers, we'll draw data visualizations because
it's much easier to interpret lot of information at
once if it's portrayed in some kind of visualway.
32. 2-Statistical Inference
• Once we've found a pattern, we need to perform
statistical inference, and that's because some
patterns are there just by chance and some are
there because they're a reflection of some
underlying process that's really interesting about
the world.
• The goal of statistical inference is to quantify
whether the patterns that we observe during
the exploration phase are reliable.
• If we collected more data, would we see this
pattern again or not?
33. Randomization
• The primary tool we have is
randomization because by simulating
random processes, we can see what kinds
of patterns appear just by chance.
34. 3-Prediction
• And if the pattern we observe is not the kind of
thing that could just appear by chance, then we
can conclude that it's because of some robust or
reliable pattern in the underlying phenomenon
we want to study.
• We'll perform prediction.
• This is where we have partial information about
something we want to know, and we want to
guess about the things we don't know yet.
35. Machine Learning
• We are making informed guesses, quantitative
guesses using a discipline called machine
learning.
• Normally when we write programs, we just focus
on the particular logic of what the computer
should do, but machine learning is about not
programming every detail, but instead using
the data to make decisions or choice within
that program.
36. A form of prediction
• So when we write a program, for instance, to
recognize speech or automatically translate
languages or control a car or a robot, we don't
actually write down all the details of what to do,
but instead use examples from the world to help
computers automatically learn how to behave.
• And that's a form of prediction, one that we'll
talk about in this course.
37. Three stages in this course
• And these three stages correspond to
how we'll approach the material in this
course.
1. First talk about how to identify patterns,
2. then we'll talk about quantifying whether
those patterns are reliable.
3. And finally, based on the patterns we've
discovered, the reliable ones can help us
make informed guesses about the
information that we wish we knew.
38. On the way to become a Data
scientist
• Once you can do all that, you're well on
your way to being a data scientist.
• Now in the process of doing all these
things, it's important that you learn how to
program a computer, because computing
underlies each step of the way and
learning to program is just an essential
part of participating in this discipline.