SlideShare une entreprise Scribd logo
1  sur  58
BECKER COLLEGE
Introduction to Big Data and
Data Science
Prof Feyzi R. Bagirov
Becker College
Agenda
• What is Big Data?
• What is Data Science?
• Who are Data Scientists?
• What do Data Scientists do?
• What are the job perspectives for Data Scientists?
• How happy are Data Scientists with their jobs
• Becker’s BS in Data Science
• Becker’s Big Data Analytics concentration
What is Big Data?
How much data do we use
• Everyday, people send 150 billion new email messages
• Every 4 minutes, a terabyte of data (72 hours of video) is uploaded to YouTube
• Facebook’s databases ingest 500 terabytes of new data per day
• The CERN Large Hadron Collider generates 1 petabyte per second
• Sensors from a Boeing 787 jet create 40 terabytes of data per hour
• An Oil & Gas off-shore rig operation generates 8 terabytes a day
• A self-driving car generates 1 gigabyte per second
• General Electric gas turbines generates 500 gigabytes per day
• The proposed Square Kilometer Array telescope will generate an exabyte of data
per day
• 90% of the data in the world today has been created in the last two years alone
• 80% of data captured today is unstructured
4,000,000,000,000,000,000,000 bytes
Zeta Mega KiloGigaTeraPetaExa
How much data do we use
According to IBM, 90% of the data in the world today was created in the last 2
years alone.
“Big Data: Getting Ready For The 2013 Big Bang”, Forbes Magazine, May 1, 2013
4,000,000,000,000,000,000,000 bytes
4,000,000,000,000,000,000,000 bytes
Zeta Mega KiloGigaTeraPetaExa
In 2013, the World will produce a 4 zetabytes (or 4 million petabytes) of new data.
Gatner, 2013
Definition of Big Data
• Big Data – tools that process and analyze
complex data at speeds and scales that were
previously not cost-effective.
History of Big Data
Humans use
tally sticks to
record data
for the first
time to track
trading
activity and
record
inventory
18,000
century
BCE
2,400
century
BCE
The abacus
is
developed
and the
first
libraries
are built in
Babylonia
300
century
BCE
The Library
of
Alexandria
is the
World’s
Largest
Storage
Center
100-200
century
BCE
Antikythera –
the first
mechanical
computer is
developed in
Greece
1663
John Graunt
conducts the
first
statistical
analysis
experiments
to curb the
spread of
bubonic
plague in
Europe
1865
The Term
“Business
Intelligence”
is used first
1928
Fritz
Pfleumer
creates a
method of
storing data
magnetically,
which forms
the basis of
modern
digital data
storage
1965
The US Gov
plans the
world’s first
data center
to store 742
million tax
returns and
175 million
sets of
fingerprints
on magnetic
tape
1965
Relational
Database
model
developed by
IBM
mathematici
an Edgar F.
Codd.
Everyone can
have an
ability to use
databases,
not just
computer
scientists.
1969
Early use
of term Big
Data in
magazine
article by
Erik Larson
1991
Birth of the
WWW.
Anyone
can upload
their own
data
Birth of the
ARPANET,
that later
led to the
creation of
Internet
(October
29, 1969
22:30)
1989
History of Big Data
1996
The price
of digital
storage
makes it
more cost-
effective
than paper
1997
Google
launched
the
World’s
most
popular
search
engine
1997
First use of
the term
Big Data in
an
academic
paper
2001
3 Vs of Big
Data –
Volume,
Velocity
and
Variety -
defined by
Dough
Laney
2005
Hadoop –
an open
source Big
Data
framework
is
developed
2009
The
average US
company
with over
1000
employees
is storing
more than
200 Tb of
data,
according
McKinsey
Global
Institute
Every two
days, as
much data
is being
created, as
was from
the
beginning
of human
civilization
to the year
2003 (Eric
Schmidt,
Google)
2010 2011
By 2018,
the US will
face a
shortfall of
140-
190,000
data
scientists
(McKinsey)
2014
Mobile
internet
use
overtakes
desktop
for the first
time
2015
Internet of
Things is
being
adopted
by
industries
2020
Some 30
billion
objects
may be
connected
to the
Internet of
Things
History of Big Data
4 V’s of Big Data
4 V’s of Big Data
• Volume – a Terabyte? a Petabyte? More?...
• Variety – a Web Log? A Tweeter feed? A
YouTube video?
• Velocity – New data comes every hour?
Minute? Second?
• Veracity – how much do I trust this data?
40%? 100%? 0%?
History of Big Data
IBM delivers an HDD, weighing
over a ton, storing 5 Mb of data
(September, 1956)
History of Big Data
How Big is Big?
4,000,000,000,000,000,000,000 bytes
Zeta Mega KiloGigaTeraPetaExa
Unstructured vs Structured
Unstructured Data
• Refers to information that does not have a
pre-defined data model or is not organized in
a pre-defined manner.
• Examples: social network feeds, customer
reviews or comments, YouTube videos, etc.
Structured Data
• Refers to information that does not have a
pre-defined data model or is not organized in
a pre-defined manner.
Structured or Unstructured?
Structured or Unstructured?
Structured or Unstructured?
Structured or Unstructured?
What is Data Science?
24
What is Data Science?
*http://en.wikipedia.org/wiki/Data_science
• 1960-The term "data science" (originally used interchangeably with
"datalogy") has existed for over thirty years and was used initially as a
substitute for computer science by Peter Naur in 1960.
• 2002-The International Council for Science: Committee on Data for
Science and Technology started the Data Science Journal
• 2004-Usama Fayyad became the first CDO at Yahoo.
• 2008-DJ Patil and Jeff Hammerbacher coined the term “data scientist”
to define their jobs at Linkedin and Facebook, respectively
25
http://www.datasciencecentral.com/profiles/blogs/data-scientist-core-skills
What is Data Science?
What is Data Science?
Math & Statistics
• Discrete
• Finite
• Linear Algebra
• Multivariate
Computer Science
• Programming
• Business Intelligence
Soft Skills
• Oral Communications
• Creativity
• Project Management
• Team play
• Presentation
What’s in the name?
Data Science vs Data Analytics vs …
• Business Intelligence – covers data analysis and relies heavily on aggregation, focusing on business information
• Statistics – the study of collection, analysis, interpretation, presentation and organization of data.
• Data Mining – a techniques that focuses on modeling and knowledge discovery for predictive rather than prescriptive
purposes
• Data Analytics – a process of inspecting, cleaning, transforming and modeling data with the goal of discovering useful
information, suggesting conclusions, and supporting decision-making.
• Business Analytics - practices for continuous iterative exploration and investigation of past business performance to gain
insight and drive business planning
– Descriptive Analytics – analyzes the past performance and understands that performance by mining historical data to look for the
reasons behind past success or failure
– Predictive Analytics - encompasses a variety of statistical techniques from predictive modeling, machine learning, and data mining that
analyze current and historical facts to make predictions about future or otherwise unknown events.
– Prescriptive Analytics - automatically synthesizes big data, multiple disciplines of mathematical sciences and computational sciences,
and business rules, to make predictions and then suggests decision options to take advantage of the predictions.
• Data Science – an interdisciplinary field about processes and systems to extract knowledge or insights from data in various
forms, either structured or unstructured, which is a continuation of some of the data analysis fields, such as statistics, data
mining, and predictive analytics.
• https://en.wikipedia.org/wiki/Data_science
• https://en.wikipedia.org/wiki/Data_analysis
Who are Data Scientists?
Who are Data Scientists?
Who are Data Scientists?
What Do Data Scientists Do?
What Do Data Scientists Do?
In a nutshell a data scientist creates data products. This can mean a lot of
things but we can generalize as having the ability to create interfaces for
people and machines that use data of any kind.
Responsibilities vary a lot. It can be running experiments, creating
interfaces using machine learning, providing insights from complex
datasets.
Data scientists work with hypothesis. For instance the experiments we run
at Minclip are becoming full fledged randomised controlled trials but I think
that is the most similar case. I believe the term scientist appeared when
data itself became a field of study. The way machine learning treats data is
highly empirical. The process of improving and validating a model, while
not using the traditional statistical methods of scientific research is,
nevertheless, highly empiric, skeptic and pragmatic. Sometimes more than
some papers that are published.
• Quora http://qr.ae/RUWYc8
What Do Data Scientists Do?
• “There are multiple communities of data scientists throughout the
amazon offices which are easily approachable”
• “They mostly work on the vertical like ad space optimization or marketing.
People have in depth understanding of domain and some of the best
minds in the industry”
• “There is a Data Science Toolkit, which contains almost every kind of tools
for Data Scientists… Biggest data warehouse (Datanet) to play with,
Extedned internal wiki of almost every possible topic in the universe of
Data; mentorship of data science wizards”
– Quora, http://qr.ae/RUPSv4
What Do Data Scientists Do?
• Netflix Prize – was an open competition for
the best collaborative filtering algorithm to
predict user rating for films, based on previous
ratings without any other information about
users or films.
What Do Data Scientists Do?
On 9/21/2009, $1 mln was given to the Pragmatic Chaos team that improved
prediction by 10.06%
What Do Data Scientists Do?
• We work on core ML, on computer vision, on computational photography and on language
technologies.
• In computer vision we have a system that processes every single image and video uploaded
to Facebook, totaling well over 1B items per day. We predict the content of an image for
example in order to generate captions for the blind, or to automatically detect and take down
offensive content, improve media search results, automate visual captcha among many other
use cases.
• In language technology, one thing we are trying to do is eliminate language barriers on
Facebook. In order to do this we translate over 2B posts every single day, with over 1800
language directions representing more than 40 unique languages.
• In core ML, we focus on researching and shipping large scale and realtime ML/AI algorithms
for some of the biggest ML applications in the world. Whenever a users logs into Facebook,
these models are used to rank news feed stories (1B users every day, 1.5K stories per user
per day on average), ads, search results (1B+ queries a day), trending news, friend
recommendations and even rank notifications that a user receives, or rank the comments on
a post.
– Quora (http://qr.ae/RZ3JBx)
What Do Data Scientists Do?
• There are multiple analytics teams at Facebook
• A team of Data Scientists working on Ads and is probably the largest and most centralized
analytics team at Facebook
• Our goal is to come up with data backed insights which will result in informing the product
road-map or move key metrics that our product teams track. We sometimes also build
infrastructure (less common in my world) that are used by other Data Scientists and
engineers. We work in close concert with Engineering and Product and we often wear
Engineering or Product management hats in addition to our Data Scientist responsibilities.
We spend our time in:
– Analyzing and designing experiments to optimize product features or move key metrics
– Data mining/analysis to come up with business opportunities to pursue or product
feature suggestions or sometimes to understand metric movements.
– Building production ML models (though this is mostly done by SW Engineering)
• The multidisciplinary nature of the role, access to one of the largest troves of data, brilliant
colleagues and ability to create a huge impact in a very short time period make this an
exciting job.
– Quora (http://qr.ae/RUPJbx)
What Do Data Scientists Do?
• Predicting the past – let's say you want to determine the gender of Jason Lemkin.
If you are a human, that's easy (hint: he's a man). If you are a computer, it is more
difficult. But you might have a large dataset of genders and first names and see
that 99% of Jasons are men so your algorithm says he is a man. This would be
much more difficult with me ("Auren" is a more gender neutral name) and so you
might not be confident enough to make a gender pronouncement and thus might
need more data (like doing natural language processing on articles about me that
refer to me as "he" and "him).
• Predicting the future – figuring out what posts should be shown to the right
person.
– Quora: http://qr.ae/RUgn33
What Do Data Scientists Do?
• Airbnb wrangles a lot of data—roughly 11 petabytes. Much of it, such as a guest’s
lodging preferences and whether a host likes to be continuously booked or prefers
having a few days free between visitations, helps the online marketplace’s search
algorithm determine the most likely match between guest and host.
• Preferences of this sort fall into one of four data categories:
– Behavioral, which describes user behavior as they interact with the Airbnb website;
– Dimensional, which covers user attributes including access device used, language and location;
– Sentiment, which reflects lodging reviews, ratings and survey results;
– Imputed, which infers user behaviors, such as “this guest always travels to big cities, whereas this
other guest always travels to small coastal towns.”
• To collect, process and analyze all this data, Airbnb relies on a team of about 100
people. These include around 20 engineers who support the computing
infrastructure and Newman's 80-person data science team.
– http://www.information-management.com/news/big-data-analytics/how-airbnb-uses-big-data-to-
better-match-guests-rooms-10028582-1.html
What Do Data Scientists Do?
• Data captured through all its channels – text message, Twitter, Pebble, Android, Amazon Echo – to name
just a fraction – is fed into the Domino’s Information Management Framework. There it’s combined with
enrichment data from a large number of third party sources such as the United States Postal Service as
well as geocode information, demographic and competitor data, to allow in depth customer segmentation.
• “We have the ability to not only look at a consumer as an individual and assess their buying patterns, but
also look at the multiple consumers residing within a household, understand who is the dominant buyer,
who reacts to our coupons, and, foremost, understand how they react to the channel that they’re coming
to us on.”
– http://www.forbes.com/sites/bernardmarr/2016/04/06/big-data-driven-decision-making-at-dominos-pizza/#5c668fd4647f
What Do Data Scientists Do?
(Finance)
Source: Hortonworks
What Do Data Scientists Do?
(Government)
• Fraud, Waste and Abuse (FWA)
– Fraud and Abuse occur when there loopholes
created by complex interactions between business
controls, regulatory requirements and day-to-day
process. Recognizing these control point loopholes
are hard, manual review is difficult.
Source: KPMG
What Do Data Scientists Do?
(Government)
• Fraud, Waste and Abuse (FWA)
– Fraud and Abuse occur when there loopholes
created by complex interactions between business
controls, regulatory requirements and day-to-day
process. Recognizing these control point loopholes
are hard, manual review is difficult.
Source: KPMG
What Do Data Scientists Do?
(Government)
• FWA in Other Sectors
Source: KPMG
• Data Analysts/Scientists in Games are
concerned with how to:
– Engage the gamer
– Monetize the gamer
What Do Data Scientists Do?
(Game industry)
• Pre-launch data simulation
– Simulating loot drop rules and preference in Call
of Duty before launching the game
What Do Data Scientists Do?
(Game industry)
Source: Activision
• In-Game analytics:
– Why are people leaving?
– Investigating churn, building a churn prediction
model and impact behavior before players quit
What Do Data Scientists Do?
(Game industry)
Source: Activision
• Game Feature Research:
What Do Data Scientists Do?
(Game industry)
Source: Activision
What Do Data Scientists Do?
(Non profit)
Use-case: DataKind.org
Source: DataKind
What are the job perspectives?
[By 2018] “The United States alone faces a shortage of
140,000 to 190,000 people with deep analytical skills
as well as 1.5 million managers
and analysts to analyze big data and make decisions
based on their findings.”
• http://www.mckinsey.com/business-functions/business-technology/our-insights/big-data-the-next-frontier-for-innovation
What are the job perspectives?
• http://www.indeed.com/salary?q1=%22Data+Scientist%22&l1=
What are the job perspectives?
• https://www.glassdoor.com/Best-Jobs-in-America-LST_KQ0,20.htm
What are the job perspectives?
• https://www.dezyre.com/article/data-scientist-salary-report-of-100-top-tech-companies-/218
How Happy Are Data Scientists?
Machine Learning Developers are Happy!
StackOverflow survey
Bachelor of Science in Data Science
• Building Foundations
• 120 credits
• Foundations in:
– Math
– Statistics and Multivariate Statistics
– Machine Learning
– Computer Programming
– Practicum
57
Q&A?
58

Contenu connexe

Tendances

Data Analytics and Business Intelligence
Data Analytics and Business IntelligenceData Analytics and Business Intelligence
Data Analytics and Business IntelligenceChris Ortega, MBA
 
如何快速实现数据编织架构
如何快速实现数据编织架构如何快速实现数据编织架构
如何快速实现数据编织架构Denodo
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&MDatabricks
 
(The life of a) Data engineer
(The life of a) Data engineer(The life of a) Data engineer
(The life of a) Data engineerAlex Chalini
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaScyllaDB
 
Building A Bi Strategy
Building A Bi StrategyBuilding A Bi Strategy
Building A Bi Strategylarryzagata
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Edureka!
 
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform DesigningRahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform DesigningLviv Startup Club
 
Big Data Analytics Powerpoint Presentation Slide
Big Data Analytics Powerpoint Presentation SlideBig Data Analytics Powerpoint Presentation Slide
Big Data Analytics Powerpoint Presentation SlideSlideTeam
 
Best practices data management
Best practices data managementBest practices data management
Best practices data managementSherry Lake
 
Enterprise Data Governance Framework With Change Management
Enterprise Data Governance Framework With Change ManagementEnterprise Data Governance Framework With Change Management
Enterprise Data Governance Framework With Change ManagementSlideTeam
 
Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015Carl Anderson
 

Tendances (20)

Data Analytics and Business Intelligence
Data Analytics and Business IntelligenceData Analytics and Business Intelligence
Data Analytics and Business Intelligence
 
如何快速实现数据编织架构
如何快速实现数据编织架构如何快速实现数据编织架构
如何快速实现数据编织架构
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&M
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
BI Introduction
BI IntroductionBI Introduction
BI Introduction
 
(The life of a) Data engineer
(The life of a) Data engineer(The life of a) Data engineer
(The life of a) Data engineer
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
Building A Bi Strategy
Building A Bi StrategyBuilding A Bi Strategy
Building A Bi Strategy
 
Creating a Data Driven Culture
Creating a Data Driven Culture Creating a Data Driven Culture
Creating a Data Driven Culture
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
 
Data science
Data scienceData science
Data science
 
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform DesigningRahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
 
Data Governance
Data GovernanceData Governance
Data Governance
 
Big Data Analytics Powerpoint Presentation Slide
Big Data Analytics Powerpoint Presentation SlideBig Data Analytics Powerpoint Presentation Slide
Big Data Analytics Powerpoint Presentation Slide
 
Best practices data management
Best practices data managementBest practices data management
Best practices data management
 
Enterprise Data Governance Framework With Change Management
Enterprise Data Governance Framework With Change ManagementEnterprise Data Governance Framework With Change Management
Enterprise Data Governance Framework With Change Management
 
Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 

Similaire à Introduction to Big Data and Data Science

The Ai & I at Work
The Ai & I at WorkThe Ai & I at Work
The Ai & I at WorkTarek Hoteit
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014Jisc
 
Introduction Data Science.pptx
Introduction Data Science.pptxIntroduction Data Science.pptx
Introduction Data Science.pptxAkhirulAminulloh2
 
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaBIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaMaria de la Iglesia
 
Data Analytics and Industry-Academic Partnerships: An Irish Perspective
Data Analytics and Industry-Academic Partnerships: An Irish PerspectiveData Analytics and Industry-Academic Partnerships: An Irish Perspective
Data Analytics and Industry-Academic Partnerships: An Irish PerspectiveJohn Breslin
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013University of Washington
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?Daniel S. Katz
 
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...datacite
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...jybufgofasfbkpoovh
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
 
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...Carole Goble
 
06 e science-bio diversity@ pacc 18.07.2014
06 e science-bio diversity@ pacc 18.07.201406 e science-bio diversity@ pacc 18.07.2014
06 e science-bio diversity@ pacc 18.07.2014VinothkumaR Ramu
 
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016Jisc
 

Similaire à Introduction to Big Data and Data Science (20)

Big Data
Big Data Big Data
Big Data
 
The Ai & I at Work
The Ai & I at WorkThe Ai & I at Work
The Ai & I at Work
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Introduction Data Science.pptx
Introduction Data Science.pptxIntroduction Data Science.pptx
Introduction Data Science.pptx
 
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaBIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
 
Data Analytics and Industry-Academic Partnerships: An Irish Perspective
Data Analytics and Industry-Academic Partnerships: An Irish PerspectiveData Analytics and Industry-Academic Partnerships: An Irish Perspective
Data Analytics and Industry-Academic Partnerships: An Irish Perspective
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?
 
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
Better Data for a Better World
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better World
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
 
06 e science-bio diversity@ pacc 18.07.2014
06 e science-bio diversity@ pacc 18.07.201406 e science-bio diversity@ pacc 18.07.2014
06 e science-bio diversity@ pacc 18.07.2014
 
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
 

Dernier

Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...gajnagarg
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 

Dernier (20)

Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 

Introduction to Big Data and Data Science

  • 1. BECKER COLLEGE Introduction to Big Data and Data Science Prof Feyzi R. Bagirov Becker College
  • 2. Agenda • What is Big Data? • What is Data Science? • Who are Data Scientists? • What do Data Scientists do? • What are the job perspectives for Data Scientists? • How happy are Data Scientists with their jobs • Becker’s BS in Data Science • Becker’s Big Data Analytics concentration
  • 3. What is Big Data?
  • 4. How much data do we use • Everyday, people send 150 billion new email messages • Every 4 minutes, a terabyte of data (72 hours of video) is uploaded to YouTube • Facebook’s databases ingest 500 terabytes of new data per day • The CERN Large Hadron Collider generates 1 petabyte per second • Sensors from a Boeing 787 jet create 40 terabytes of data per hour • An Oil & Gas off-shore rig operation generates 8 terabytes a day • A self-driving car generates 1 gigabyte per second • General Electric gas turbines generates 500 gigabytes per day • The proposed Square Kilometer Array telescope will generate an exabyte of data per day • 90% of the data in the world today has been created in the last two years alone • 80% of data captured today is unstructured 4,000,000,000,000,000,000,000 bytes Zeta Mega KiloGigaTeraPetaExa
  • 5. How much data do we use According to IBM, 90% of the data in the world today was created in the last 2 years alone. “Big Data: Getting Ready For The 2013 Big Bang”, Forbes Magazine, May 1, 2013 4,000,000,000,000,000,000,000 bytes
  • 6. 4,000,000,000,000,000,000,000 bytes Zeta Mega KiloGigaTeraPetaExa In 2013, the World will produce a 4 zetabytes (or 4 million petabytes) of new data. Gatner, 2013
  • 7. Definition of Big Data • Big Data – tools that process and analyze complex data at speeds and scales that were previously not cost-effective.
  • 8. History of Big Data Humans use tally sticks to record data for the first time to track trading activity and record inventory 18,000 century BCE 2,400 century BCE The abacus is developed and the first libraries are built in Babylonia 300 century BCE The Library of Alexandria is the World’s Largest Storage Center 100-200 century BCE Antikythera – the first mechanical computer is developed in Greece 1663 John Graunt conducts the first statistical analysis experiments to curb the spread of bubonic plague in Europe 1865 The Term “Business Intelligence” is used first 1928 Fritz Pfleumer creates a method of storing data magnetically, which forms the basis of modern digital data storage 1965 The US Gov plans the world’s first data center to store 742 million tax returns and 175 million sets of fingerprints on magnetic tape 1965 Relational Database model developed by IBM mathematici an Edgar F. Codd. Everyone can have an ability to use databases, not just computer scientists. 1969 Early use of term Big Data in magazine article by Erik Larson 1991 Birth of the WWW. Anyone can upload their own data Birth of the ARPANET, that later led to the creation of Internet (October 29, 1969 22:30) 1989
  • 9. History of Big Data 1996 The price of digital storage makes it more cost- effective than paper 1997 Google launched the World’s most popular search engine 1997 First use of the term Big Data in an academic paper 2001 3 Vs of Big Data – Volume, Velocity and Variety - defined by Dough Laney 2005 Hadoop – an open source Big Data framework is developed 2009 The average US company with over 1000 employees is storing more than 200 Tb of data, according McKinsey Global Institute Every two days, as much data is being created, as was from the beginning of human civilization to the year 2003 (Eric Schmidt, Google) 2010 2011 By 2018, the US will face a shortfall of 140- 190,000 data scientists (McKinsey) 2014 Mobile internet use overtakes desktop for the first time 2015 Internet of Things is being adopted by industries 2020 Some 30 billion objects may be connected to the Internet of Things
  • 11. 4 V’s of Big Data
  • 12. 4 V’s of Big Data • Volume – a Terabyte? a Petabyte? More?... • Variety – a Web Log? A Tweeter feed? A YouTube video? • Velocity – New data comes every hour? Minute? Second? • Veracity – how much do I trust this data? 40%? 100%? 0%?
  • 13. History of Big Data IBM delivers an HDD, weighing over a ton, storing 5 Mb of data (September, 1956)
  • 15. How Big is Big? 4,000,000,000,000,000,000,000 bytes Zeta Mega KiloGigaTeraPetaExa
  • 17. Unstructured Data • Refers to information that does not have a pre-defined data model or is not organized in a pre-defined manner. • Examples: social network feeds, customer reviews or comments, YouTube videos, etc.
  • 18. Structured Data • Refers to information that does not have a pre-defined data model or is not organized in a pre-defined manner.
  • 23. What is Data Science?
  • 24. 24 What is Data Science? *http://en.wikipedia.org/wiki/Data_science • 1960-The term "data science" (originally used interchangeably with "datalogy") has existed for over thirty years and was used initially as a substitute for computer science by Peter Naur in 1960. • 2002-The International Council for Science: Committee on Data for Science and Technology started the Data Science Journal • 2004-Usama Fayyad became the first CDO at Yahoo. • 2008-DJ Patil and Jeff Hammerbacher coined the term “data scientist” to define their jobs at Linkedin and Facebook, respectively
  • 26. What is Data Science? Math & Statistics • Discrete • Finite • Linear Algebra • Multivariate Computer Science • Programming • Business Intelligence Soft Skills • Oral Communications • Creativity • Project Management • Team play • Presentation
  • 28. Data Science vs Data Analytics vs … • Business Intelligence – covers data analysis and relies heavily on aggregation, focusing on business information • Statistics – the study of collection, analysis, interpretation, presentation and organization of data. • Data Mining – a techniques that focuses on modeling and knowledge discovery for predictive rather than prescriptive purposes • Data Analytics – a process of inspecting, cleaning, transforming and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making. • Business Analytics - practices for continuous iterative exploration and investigation of past business performance to gain insight and drive business planning – Descriptive Analytics – analyzes the past performance and understands that performance by mining historical data to look for the reasons behind past success or failure – Predictive Analytics - encompasses a variety of statistical techniques from predictive modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future or otherwise unknown events. – Prescriptive Analytics - automatically synthesizes big data, multiple disciplines of mathematical sciences and computational sciences, and business rules, to make predictions and then suggests decision options to take advantage of the predictions. • Data Science – an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields, such as statistics, data mining, and predictive analytics. • https://en.wikipedia.org/wiki/Data_science • https://en.wikipedia.org/wiki/Data_analysis
  • 29. Who are Data Scientists?
  • 30. Who are Data Scientists?
  • 31. Who are Data Scientists?
  • 32. What Do Data Scientists Do?
  • 33. What Do Data Scientists Do? In a nutshell a data scientist creates data products. This can mean a lot of things but we can generalize as having the ability to create interfaces for people and machines that use data of any kind. Responsibilities vary a lot. It can be running experiments, creating interfaces using machine learning, providing insights from complex datasets. Data scientists work with hypothesis. For instance the experiments we run at Minclip are becoming full fledged randomised controlled trials but I think that is the most similar case. I believe the term scientist appeared when data itself became a field of study. The way machine learning treats data is highly empirical. The process of improving and validating a model, while not using the traditional statistical methods of scientific research is, nevertheless, highly empiric, skeptic and pragmatic. Sometimes more than some papers that are published. • Quora http://qr.ae/RUWYc8
  • 34. What Do Data Scientists Do? • “There are multiple communities of data scientists throughout the amazon offices which are easily approachable” • “They mostly work on the vertical like ad space optimization or marketing. People have in depth understanding of domain and some of the best minds in the industry” • “There is a Data Science Toolkit, which contains almost every kind of tools for Data Scientists… Biggest data warehouse (Datanet) to play with, Extedned internal wiki of almost every possible topic in the universe of Data; mentorship of data science wizards” – Quora, http://qr.ae/RUPSv4
  • 35. What Do Data Scientists Do? • Netflix Prize – was an open competition for the best collaborative filtering algorithm to predict user rating for films, based on previous ratings without any other information about users or films.
  • 36. What Do Data Scientists Do? On 9/21/2009, $1 mln was given to the Pragmatic Chaos team that improved prediction by 10.06%
  • 37. What Do Data Scientists Do? • We work on core ML, on computer vision, on computational photography and on language technologies. • In computer vision we have a system that processes every single image and video uploaded to Facebook, totaling well over 1B items per day. We predict the content of an image for example in order to generate captions for the blind, or to automatically detect and take down offensive content, improve media search results, automate visual captcha among many other use cases. • In language technology, one thing we are trying to do is eliminate language barriers on Facebook. In order to do this we translate over 2B posts every single day, with over 1800 language directions representing more than 40 unique languages. • In core ML, we focus on researching and shipping large scale and realtime ML/AI algorithms for some of the biggest ML applications in the world. Whenever a users logs into Facebook, these models are used to rank news feed stories (1B users every day, 1.5K stories per user per day on average), ads, search results (1B+ queries a day), trending news, friend recommendations and even rank notifications that a user receives, or rank the comments on a post. – Quora (http://qr.ae/RZ3JBx)
  • 38. What Do Data Scientists Do? • There are multiple analytics teams at Facebook • A team of Data Scientists working on Ads and is probably the largest and most centralized analytics team at Facebook • Our goal is to come up with data backed insights which will result in informing the product road-map or move key metrics that our product teams track. We sometimes also build infrastructure (less common in my world) that are used by other Data Scientists and engineers. We work in close concert with Engineering and Product and we often wear Engineering or Product management hats in addition to our Data Scientist responsibilities. We spend our time in: – Analyzing and designing experiments to optimize product features or move key metrics – Data mining/analysis to come up with business opportunities to pursue or product feature suggestions or sometimes to understand metric movements. – Building production ML models (though this is mostly done by SW Engineering) • The multidisciplinary nature of the role, access to one of the largest troves of data, brilliant colleagues and ability to create a huge impact in a very short time period make this an exciting job. – Quora (http://qr.ae/RUPJbx)
  • 39. What Do Data Scientists Do? • Predicting the past – let's say you want to determine the gender of Jason Lemkin. If you are a human, that's easy (hint: he's a man). If you are a computer, it is more difficult. But you might have a large dataset of genders and first names and see that 99% of Jasons are men so your algorithm says he is a man. This would be much more difficult with me ("Auren" is a more gender neutral name) and so you might not be confident enough to make a gender pronouncement and thus might need more data (like doing natural language processing on articles about me that refer to me as "he" and "him). • Predicting the future – figuring out what posts should be shown to the right person. – Quora: http://qr.ae/RUgn33
  • 40. What Do Data Scientists Do? • Airbnb wrangles a lot of data—roughly 11 petabytes. Much of it, such as a guest’s lodging preferences and whether a host likes to be continuously booked or prefers having a few days free between visitations, helps the online marketplace’s search algorithm determine the most likely match between guest and host. • Preferences of this sort fall into one of four data categories: – Behavioral, which describes user behavior as they interact with the Airbnb website; – Dimensional, which covers user attributes including access device used, language and location; – Sentiment, which reflects lodging reviews, ratings and survey results; – Imputed, which infers user behaviors, such as “this guest always travels to big cities, whereas this other guest always travels to small coastal towns.” • To collect, process and analyze all this data, Airbnb relies on a team of about 100 people. These include around 20 engineers who support the computing infrastructure and Newman's 80-person data science team. – http://www.information-management.com/news/big-data-analytics/how-airbnb-uses-big-data-to- better-match-guests-rooms-10028582-1.html
  • 41. What Do Data Scientists Do? • Data captured through all its channels – text message, Twitter, Pebble, Android, Amazon Echo – to name just a fraction – is fed into the Domino’s Information Management Framework. There it’s combined with enrichment data from a large number of third party sources such as the United States Postal Service as well as geocode information, demographic and competitor data, to allow in depth customer segmentation. • “We have the ability to not only look at a consumer as an individual and assess their buying patterns, but also look at the multiple consumers residing within a household, understand who is the dominant buyer, who reacts to our coupons, and, foremost, understand how they react to the channel that they’re coming to us on.” – http://www.forbes.com/sites/bernardmarr/2016/04/06/big-data-driven-decision-making-at-dominos-pizza/#5c668fd4647f
  • 42. What Do Data Scientists Do? (Finance) Source: Hortonworks
  • 43. What Do Data Scientists Do? (Government) • Fraud, Waste and Abuse (FWA) – Fraud and Abuse occur when there loopholes created by complex interactions between business controls, regulatory requirements and day-to-day process. Recognizing these control point loopholes are hard, manual review is difficult. Source: KPMG
  • 44. What Do Data Scientists Do? (Government) • Fraud, Waste and Abuse (FWA) – Fraud and Abuse occur when there loopholes created by complex interactions between business controls, regulatory requirements and day-to-day process. Recognizing these control point loopholes are hard, manual review is difficult. Source: KPMG
  • 45. What Do Data Scientists Do? (Government) • FWA in Other Sectors Source: KPMG
  • 46. • Data Analysts/Scientists in Games are concerned with how to: – Engage the gamer – Monetize the gamer What Do Data Scientists Do? (Game industry)
  • 47. • Pre-launch data simulation – Simulating loot drop rules and preference in Call of Duty before launching the game What Do Data Scientists Do? (Game industry) Source: Activision
  • 48. • In-Game analytics: – Why are people leaving? – Investigating churn, building a churn prediction model and impact behavior before players quit What Do Data Scientists Do? (Game industry) Source: Activision
  • 49. • Game Feature Research: What Do Data Scientists Do? (Game industry) Source: Activision
  • 50. What Do Data Scientists Do? (Non profit) Use-case: DataKind.org Source: DataKind
  • 51. What are the job perspectives? [By 2018] “The United States alone faces a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts to analyze big data and make decisions based on their findings.” • http://www.mckinsey.com/business-functions/business-technology/our-insights/big-data-the-next-frontier-for-innovation
  • 52. What are the job perspectives? • http://www.indeed.com/salary?q1=%22Data+Scientist%22&l1=
  • 53. What are the job perspectives? • https://www.glassdoor.com/Best-Jobs-in-America-LST_KQ0,20.htm
  • 54. What are the job perspectives? • https://www.dezyre.com/article/data-scientist-salary-report-of-100-top-tech-companies-/218
  • 55. How Happy Are Data Scientists? Machine Learning Developers are Happy! StackOverflow survey
  • 56. Bachelor of Science in Data Science • Building Foundations • 120 credits • Foundations in: – Math – Statistics and Multivariate Statistics – Machine Learning – Computer Programming – Practicum
  • 58. 58