SlideShare a Scribd company logo
1 of 67
Emerging Technologies
Data Science and Artificial Intelligence
Data Scientist is the sexiest
job of 21st century
(c) Harvard Business Review
What is Data Science and Who is a Data Scientist
About me
3 |
• Presenter:
• Robert Williams, (Bobby)
• Microsoft Certified Trainer, (Bytes People Solutions)
• Experience:
• Business Intelligence, (SSAS, SSIS and SSRS)
• Data Science, (Machine Learning in R)
• Microsoft SQL Server, (Transact-SQL)
• Microsoft Azure, (Azure ML, HDInsight, SQL Database,
Data Warehouse and VMs)
Topics
• What is Data Science
• Who is a Data Scientist
• Discover about Data Science
• There is high demand for students trained in Data
Science and related fields
• Databases, warehousing, data architectures
• Data analytics – statistics, machine learning
• Big data (Hadoop, Spark)
• Supports “Business Intelligence”
• Quantitative decision-making and control
• Finance, inventory, pricing/marketing, advertising
• Need data for identifying risks, opportunities, conducting
“what-if” analyses
Data Science is currently popular to employers
• Business Intelligence
• Statistics
• Data Engineering
• Data Visualization
• Machine Learning
• Data Mining
• Artificial Intelligence
• Big Data
Data Science and related fields
• Data Analysis/Statistics
• Discover and clean data
• Visualize trends
• Find hidden correlations between parameters
• Modeling/Machine Learning
• How many cars are we going to sell next year
• Which city is better for opening a new store
• Which products are usually bought together
• Engineering/Prototyping
• Prototype of a working algorithm
• Deploy prediction model to use on a daily basis
Regular Data Science tasks
• Data Cleansing
• Filling in missing data (imputing values)
• Detecting and removing outliers
• Smoothing
• Removing noise by averaging values together
• Filtering, sampling
• Keeping only selected representative values
• Feature extraction
• e.g. in a photo database, which people are wearing glasses?
which have more than one person? which are outdoors?
Cleaning data: Garbage-In-Garbage-Out (GIGO)
• Numerical data
• Correlations
• Multivariate regression
• Fitting “models”
• Predictive equations that fit the data
e.g. from a real estate database of home sales, we get
housing price = 100*SqFt - 6*DistanceToSchools +
0.1*AverageOfNeighborhood
• ANOVA for testing differences between groups
• R is one of the most commonly used software packages
for doing statistical analysis
• Can load a data table, calculate means and correlations, fit
distributions, estimate parameters, test hypotheses, generate
graphs and histograms
Statistical analysis methods
Machine Learning methods
• Unsupervised learning methods
• Supervised learning methods
• Clustering (Hierarchical, K-means)
• Similar photos, documents, cases
• Discovery of “structure” in the data
• Example: accident database
• Some clusters might be identified with “accidents involving a
truck and trailer” or “accidents at night”
• Top-down vs. bottom-up clustering methods
• Granularity: how many clusters?
Unsupervised learning methods
• Classifiers (Decision Trees)
• What factors, decisions, or treatments led to different
outcomes?
• Recursive partitioning algorithms
• Related methods
• “Discriminant” analysis
• What factors lead to return of product?
• Extract “association rules”
• Boxers dogs tend to have congenital defects
• Covers 5% of patients with 80% confidence
Veterinary database - dogs treated for disease
breed gender age drug sibsp outcome
terrier F 10 methotrexate 4.0 died
spaniel M 5 cytarabine 2.3 survived
doberman F 7 doxorubicin 0.1 died
Supervised learning methods
• Other types of data
• Time series and forecasting:
• Model the price of fuel using autoregression
• A function of recent prices, demand, geopolitics...
• De-trend: factor out seasonal trends
• GIS (geographic information systems)
• Longitude/latitude coordinates in the database
• Objects: city/state boundaries, river locations, roads
• Find regions in CS/B with an excess of coffee shops
from: Basic Statistics for Business and Economics, Lind et al (2009), Ch 16.
Toy Sales
credit: Frank Curriero
Miscellaneous methods
What is a Data Scientist?
What IS-IS NOT Data Science
 This  Not that
Machine
Learning/
Statistics
Collecting
data-storage
Business
Intelligence
Industry
Knowledge
Software
Engineering
Automation
(Applications)
Data Science Skillset
Who is a Data Scientist?
• Scientist
• Someone who find new discoveries
• Make a hypothesis
• Investigate that hypothesis
• Data Scientist
• Do the same with data
• Look for meaning, knowledge in the data
• Answering questions and rely on data
“It doesn't matter how beautiful your theory is, it doesn't
matter how smart you are. If it doesn't agree with
experiment, it's wrong. In that simple statement is the key
to science” – Richard Feynman: twitter.com/ProfFeynman
What’s in the Data Science toolkit?
Tools
User Experience
Research
Statistical
Methods
Data
Modeling
Time series
analysis
Survival
analysis
Missing data
imputations
Logistic,
multinomial and
multiple linear
regression
techniques
Classification
and
clustering
Forecasting
Pattern
recognition
Principal
component and
factor analysisMachine
learning
Propensity
score
matching
Data
mining
A/B
testing
Sentiment
analysis
Network
analysis
Data
Visualization
Regression
What’s in the Data Science toolkit?
Tools
User Experience
Research
Statistical
Methods
Languages
Python
R
SQL
SAS
Javascript
NodeJS
Libraries
NumPy
Pandas
Scikit-
Learn
Tidyverse
Revo
ScaleR
Mahout
+many
others
Data
Engineering
Profiling
ETL
Job notices
APIs
Optimized data
pipelines
Optimized data
storage/access
RDBMS
Hadoop/Spark
Visualization
D3.js
Base R
Leaflet
Power BI
Matplotlib
ggplot2
shiny
Let’s do some Data Science
What is doing Data Science?
Data
Science
Apply Machine
Learning and
Statistics
Data
Engineering
Managing data
for creating
insights
Smarter Work
More efficient and effective organization
Finding a needle in the
haystack
Prioritizing a backlog
Flagging “stuff” early
A/B test something
Optimize a resources
Some combination
Something else…
Data Science problems?
Service Issue:
Costly changes
which are not
tested before
implementation
Which form? Data Science
Service
Change
Data Science
Process:
Statistical testing
to identify which
is better
Service Change:
Use the best
statistically
validated option
Result: Increases customer satisfaction
62%
respond
78%
respond
Statistical Inference: A/B testing
Find Samples:
Identify targets
within a sample
population
Question? Data Science Production
Data Science
Process:
Use existing data
and predictive
modeling to
identify targets
Deploy:
Implement data
science solution
into a production
environment
Result: A successful data science process
Target categories
Target individuals
Target areas
Data Science Process: Machine Learning
Where to Learn?
• University
• Online Resources
• Coursera
• edX
• etc.
• Books
How to start?
• Your own company
• Open competitions (Kaggle.com)
Module review and takeaways
• Review Question(s)
Artificial Intelligence
Introduction to Machine Learning
Human vs. Machine
Human vs. Machine
• Unfortunately AI has often been negatively
portrayed in the popular media, for example:
• AI is going to take away our jobs
Or even worse
• Machines are going to kill us all!
• What we actually want is a Human/Machine
partnership going forward…
Oops – Wrong slide!
This is what I meant ;-)
Human vs. Machine
• Human
• Naturally can work with small amount of data
• Have a knowledge about domain
• Good image recognition
• Machines
• Can make intensive computations
• Knows only numbers and strings (well, actually only
numbers)
AI, is actually Machine Learning
• What is machine learning?
• Introduction to machine learning algorithms
• Introduction to machine learning languages
What is Machine Learning?
• Machine learning overview
• How machine learning fits into data science
• Machine learning concepts and methodologies
• Models
Machine Learning overview
Machine learning:
• Detecting patterns and trends
• Statistical analysis
• Creating software models
Examples:
• Predicting success of medical intervention
• Identifying airplane maintenance
• Identifying fraudulent financial transactions
• Recommending books or movies
How Machine Learning fits into Data Science
Key questions:
• Is something X or Y?
• What is likely to be the numerical value of X or Y?
• Is something out of the ordinary or unexpected?
• How is this data structured?
Machine Learning concepts and methodologies
• Key steps:
1. Obtain raw data
2. Preprocess the data
3. Prepare the data
4. Apply one or more machine learning algorithms
to the data
5. Determine the best model to use
6. Deploy the model
Models
Machine learning model: the code generated after
an algorithm has been run
Training models:
• Experiments
• Evaluation
Deploying models:
• Applications
• Retraining
Introduction to Machine Learning algorithms
• Algorithms overview
• Classification algorithms
• Regression algorithms
• Clustering
• Supervised and unsupervised learning
• Anomaly detection
Algorithms overview
• Algorithm: set of steps, methods, or actions
• Classification algorithms: yes/no questions, or
identify most likely outcome from multiclass list
• Regression algorithms: make predictions of
outcomes, based on historical patterns
• Clustering algorithms: identify groupings within
dataset
Classification algorithms
Classification algorithms:
• Logistic Regression
• Naïve Bayes
• Decision Tree
• Decision Forest
• Boosted Decision Tree
• Neural Network
• Support Vector Machine
Regression algorithms
Regression algorithms:
• Linear Regression
• Decision Tree
• Decision Forest
• Boosted Decision Tree
Clustering
Clustering:
• Often used during the initial stages of model
development
• Detects patterns and anomalies
Example:
• K-Means Clustering
Supervised and unsupervised learning
Supervised learning
• Target values known
• Classification
• Regression
Unsupervised learning
• Target values unknown
• Clustering
Reinforcement learning
• Self-learning through feedback
Anomaly detection
Anomaly detection:
• Rare events
• Imbalanced data
Anomaly detection methods:
• Support Vector Machine (SVM)
• PCA-Based Anomaly Detection
Introduction to Machine Learning languages
• Languages overview
• Using R in machine learning
• Using Python in machine learning
Languages overview
Machine learning requires computer code:
• Most popular programming languages:
• R and
• Python
Use SQL for queries:
• Select data to use
• Join/filter data
Using R in Machine Learning
• R is open-source
• R is specifically designed to support statistics and
data analysis
R packages:
• Collections of functions, data, and code
• Available from CRAN
• Includes 10 000+ R packages
Using Python in Machine Learning
Python:
• Not a specialist data science or statistical tool
• Widely used within scientific computing
• Lots of resources available
Python machine learning-related libraries:
• numpy
• pandas
• matplotlib
• scikit-learn
Artificial Intelligence – Cognitive Services
• Cognitive Services overview
• Processing image and video
• Processing language
What is a cognitive service?
Cognitive Services:
• Vision. Analyze photos and videos
• Speech. Convert speech to text and text to speech
• Language. Understand intent from language
• Search. Find information on the web using Bing
Customer scenarios
• Uber driver identification
• Starship Commander voice control
Processing image and video
• Face
• Emotion
• Content moderator
• Video
• Computer Vision
Face
• Person and person groups
• Face detection
• Face verification
• Face identification
• Similar face searching
• Face grouping
Emotion
"faceRectangle": {
"left": 488,
"top": 263,
"width": 148,
"height": 148
},
"scores": {
"anger": 9.075572e-13,
"contempt": 7.048959e-9,
"disgust": 1.02152783e-11,
"fear": 1.778957e-14,
"happiness": 0.9999999,
"neutral": 1.31694478e-7,
"sadness": 6.04054263e-12,
"surprise": 3.92249462e-11
}
Content moderator
• Automated
• Human
• Hybrid
• Content Moderator UI
• Image moderation
• Text Moderation
Video
• Face detection and tracking
• Motion detection
• Stabilization
• Video thumbnail
Computer Vision
• Computer Vision
• Tagging images
• Categorizing images
• Generating descriptions
Uber driver identification
Processing language - LUIS
• Language
• Learning to talk
• Using language to make decisions
Language
• Natural Language Processing
• Part of speech
• Nouns
• Adjectives
• Verbs
• Tokens
• The yellow fox can’t jump = The – yellow – fox – can –’t
–jump
Learning to talk
• Bing Spell Check
• Linguistic analysis
• Text analysis
• Translator
Using language to make decisions
• Utterances are translated to intents
• Intents drive app decisions
• Entities describe information about the intent
• Features help identify intents and entities
STARSHIP COMMANDER – Virtual Reality Game
Resources
Microsoft Artificial Intelligence (AI) Professional Certificate
https://www.edx.org/professional-certificate/microsoft-
artificial-intelligence
Module Review and Takeaways
• Review Question(s)

More Related Content

What's hot

What's hot (20)

DataAnalyticsLC_20180410_public
DataAnalyticsLC_20180410_publicDataAnalyticsLC_20180410_public
DataAnalyticsLC_20180410_public
 
Data science syllabus
Data science syllabusData science syllabus
Data science syllabus
 
Mauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshopMauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshop
 
Introduction overviewmachinelearning sig Door Lucas Jellema
Introduction overviewmachinelearning sig Door Lucas JellemaIntroduction overviewmachinelearning sig Door Lucas Jellema
Introduction overviewmachinelearning sig Door Lucas Jellema
 
Data Science at UC Irvine
Data Science at UC IrvineData Science at UC Irvine
Data Science at UC Irvine
 
Analytics in Online Retail
Analytics in Online RetailAnalytics in Online Retail
Analytics in Online Retail
 
NPS_TDA_forPDF_JPrendki
NPS_TDA_forPDF_JPrendkiNPS_TDA_forPDF_JPrendki
NPS_TDA_forPDF_JPrendki
 
Data Visualization: Sales forecasting
Data Visualization: Sales forecastingData Visualization: Sales forecasting
Data Visualization: Sales forecasting
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
 
Image Analytics In Healthcare
Image Analytics In HealthcareImage Analytics In Healthcare
Image Analytics In Healthcare
 
Image Analytics for Retail
Image Analytics for RetailImage Analytics for Retail
Image Analytics for Retail
 
Melissa Informatics - Data Quality and AI
Melissa Informatics - Data Quality and AIMelissa Informatics - Data Quality and AI
Melissa Informatics - Data Quality and AI
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
 
Studying Data Science At University In The United Kingdom
Studying Data Science At University In The United KingdomStudying Data Science At University In The United Kingdom
Studying Data Science At University In The United Kingdom
 
An overview of big data analytics
An overview of big data analytics An overview of big data analytics
An overview of big data analytics
 
Statistics for Librarians, Session 1: What is statistics & Why is it important?
Statistics for Librarians, Session 1: What is statistics & Why is it important?Statistics for Librarians, Session 1: What is statistics & Why is it important?
Statistics for Librarians, Session 1: What is statistics & Why is it important?
 
Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouse
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 

Similar to Altron presentation on Emerging Technologies: Data Science and Artificial Intelligence

351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
XanGwaps
 

Similar to Altron presentation on Emerging Technologies: Data Science and Artificial Intelligence (20)

Data science and business analytics
Data  science and business analyticsData  science and business analytics
Data science and business analytics
 
Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh h
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
 
Data Mining .pptx
Data Mining .pptxData Mining .pptx
Data Mining .pptx
 
Data Science in Digital Marketing - Forest Cassidy, LeadFerret
Data Science in Digital Marketing - Forest Cassidy, LeadFerretData Science in Digital Marketing - Forest Cassidy, LeadFerret
Data Science in Digital Marketing - Forest Cassidy, LeadFerret
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 

Recently uploaded

怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 

Recently uploaded (20)

怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 

Altron presentation on Emerging Technologies: Data Science and Artificial Intelligence

  • 1. Emerging Technologies Data Science and Artificial Intelligence
  • 2. Data Scientist is the sexiest job of 21st century (c) Harvard Business Review What is Data Science and Who is a Data Scientist
  • 3. About me 3 | • Presenter: • Robert Williams, (Bobby) • Microsoft Certified Trainer, (Bytes People Solutions) • Experience: • Business Intelligence, (SSAS, SSIS and SSRS) • Data Science, (Machine Learning in R) • Microsoft SQL Server, (Transact-SQL) • Microsoft Azure, (Azure ML, HDInsight, SQL Database, Data Warehouse and VMs)
  • 4. Topics • What is Data Science • Who is a Data Scientist • Discover about Data Science
  • 5. • There is high demand for students trained in Data Science and related fields • Databases, warehousing, data architectures • Data analytics – statistics, machine learning • Big data (Hadoop, Spark) • Supports “Business Intelligence” • Quantitative decision-making and control • Finance, inventory, pricing/marketing, advertising • Need data for identifying risks, opportunities, conducting “what-if” analyses Data Science is currently popular to employers
  • 6. • Business Intelligence • Statistics • Data Engineering • Data Visualization • Machine Learning • Data Mining • Artificial Intelligence • Big Data Data Science and related fields
  • 7. • Data Analysis/Statistics • Discover and clean data • Visualize trends • Find hidden correlations between parameters • Modeling/Machine Learning • How many cars are we going to sell next year • Which city is better for opening a new store • Which products are usually bought together • Engineering/Prototyping • Prototype of a working algorithm • Deploy prediction model to use on a daily basis Regular Data Science tasks
  • 8. • Data Cleansing • Filling in missing data (imputing values) • Detecting and removing outliers • Smoothing • Removing noise by averaging values together • Filtering, sampling • Keeping only selected representative values • Feature extraction • e.g. in a photo database, which people are wearing glasses? which have more than one person? which are outdoors? Cleaning data: Garbage-In-Garbage-Out (GIGO)
  • 9. • Numerical data • Correlations • Multivariate regression • Fitting “models” • Predictive equations that fit the data e.g. from a real estate database of home sales, we get housing price = 100*SqFt - 6*DistanceToSchools + 0.1*AverageOfNeighborhood • ANOVA for testing differences between groups • R is one of the most commonly used software packages for doing statistical analysis • Can load a data table, calculate means and correlations, fit distributions, estimate parameters, test hypotheses, generate graphs and histograms Statistical analysis methods
  • 10. Machine Learning methods • Unsupervised learning methods • Supervised learning methods
  • 11. • Clustering (Hierarchical, K-means) • Similar photos, documents, cases • Discovery of “structure” in the data • Example: accident database • Some clusters might be identified with “accidents involving a truck and trailer” or “accidents at night” • Top-down vs. bottom-up clustering methods • Granularity: how many clusters? Unsupervised learning methods
  • 12. • Classifiers (Decision Trees) • What factors, decisions, or treatments led to different outcomes? • Recursive partitioning algorithms • Related methods • “Discriminant” analysis • What factors lead to return of product? • Extract “association rules” • Boxers dogs tend to have congenital defects • Covers 5% of patients with 80% confidence Veterinary database - dogs treated for disease breed gender age drug sibsp outcome terrier F 10 methotrexate 4.0 died spaniel M 5 cytarabine 2.3 survived doberman F 7 doxorubicin 0.1 died Supervised learning methods
  • 13. • Other types of data • Time series and forecasting: • Model the price of fuel using autoregression • A function of recent prices, demand, geopolitics... • De-trend: factor out seasonal trends • GIS (geographic information systems) • Longitude/latitude coordinates in the database • Objects: city/state boundaries, river locations, roads • Find regions in CS/B with an excess of coffee shops from: Basic Statistics for Business and Economics, Lind et al (2009), Ch 16. Toy Sales credit: Frank Curriero Miscellaneous methods
  • 14. What is a Data Scientist?
  • 15. What IS-IS NOT Data Science  This  Not that Machine Learning/ Statistics Collecting data-storage Business Intelligence Industry Knowledge Software Engineering Automation (Applications)
  • 17. Who is a Data Scientist? • Scientist • Someone who find new discoveries • Make a hypothesis • Investigate that hypothesis • Data Scientist • Do the same with data • Look for meaning, knowledge in the data • Answering questions and rely on data “It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong. In that simple statement is the key to science” – Richard Feynman: twitter.com/ProfFeynman
  • 18. What’s in the Data Science toolkit? Tools User Experience Research Statistical Methods Data Modeling Time series analysis Survival analysis Missing data imputations Logistic, multinomial and multiple linear regression techniques Classification and clustering Forecasting Pattern recognition Principal component and factor analysisMachine learning Propensity score matching Data mining A/B testing Sentiment analysis Network analysis Data Visualization Regression
  • 19. What’s in the Data Science toolkit? Tools User Experience Research Statistical Methods Languages Python R SQL SAS Javascript NodeJS Libraries NumPy Pandas Scikit- Learn Tidyverse Revo ScaleR Mahout +many others Data Engineering Profiling ETL Job notices APIs Optimized data pipelines Optimized data storage/access RDBMS Hadoop/Spark Visualization D3.js Base R Leaflet Power BI Matplotlib ggplot2 shiny
  • 20. Let’s do some Data Science
  • 21. What is doing Data Science? Data Science Apply Machine Learning and Statistics Data Engineering Managing data for creating insights Smarter Work More efficient and effective organization
  • 22. Finding a needle in the haystack Prioritizing a backlog Flagging “stuff” early A/B test something Optimize a resources Some combination Something else… Data Science problems?
  • 23. Service Issue: Costly changes which are not tested before implementation Which form? Data Science Service Change Data Science Process: Statistical testing to identify which is better Service Change: Use the best statistically validated option Result: Increases customer satisfaction 62% respond 78% respond Statistical Inference: A/B testing
  • 24. Find Samples: Identify targets within a sample population Question? Data Science Production Data Science Process: Use existing data and predictive modeling to identify targets Deploy: Implement data science solution into a production environment Result: A successful data science process Target categories Target individuals Target areas Data Science Process: Machine Learning
  • 25. Where to Learn? • University • Online Resources • Coursera • edX • etc. • Books
  • 26. How to start? • Your own company • Open competitions (Kaggle.com)
  • 27. Module review and takeaways • Review Question(s)
  • 30. Human vs. Machine • Unfortunately AI has often been negatively portrayed in the popular media, for example: • AI is going to take away our jobs Or even worse • Machines are going to kill us all! • What we actually want is a Human/Machine partnership going forward…
  • 31. Oops – Wrong slide!
  • 32. This is what I meant ;-)
  • 33. Human vs. Machine • Human • Naturally can work with small amount of data • Have a knowledge about domain • Good image recognition • Machines • Can make intensive computations • Knows only numbers and strings (well, actually only numbers)
  • 34. AI, is actually Machine Learning • What is machine learning? • Introduction to machine learning algorithms • Introduction to machine learning languages
  • 35. What is Machine Learning? • Machine learning overview • How machine learning fits into data science • Machine learning concepts and methodologies • Models
  • 36. Machine Learning overview Machine learning: • Detecting patterns and trends • Statistical analysis • Creating software models Examples: • Predicting success of medical intervention • Identifying airplane maintenance • Identifying fraudulent financial transactions • Recommending books or movies
  • 37. How Machine Learning fits into Data Science Key questions: • Is something X or Y? • What is likely to be the numerical value of X or Y? • Is something out of the ordinary or unexpected? • How is this data structured?
  • 38. Machine Learning concepts and methodologies • Key steps: 1. Obtain raw data 2. Preprocess the data 3. Prepare the data 4. Apply one or more machine learning algorithms to the data 5. Determine the best model to use 6. Deploy the model
  • 39. Models Machine learning model: the code generated after an algorithm has been run Training models: • Experiments • Evaluation Deploying models: • Applications • Retraining
  • 40. Introduction to Machine Learning algorithms • Algorithms overview • Classification algorithms • Regression algorithms • Clustering • Supervised and unsupervised learning • Anomaly detection
  • 41. Algorithms overview • Algorithm: set of steps, methods, or actions • Classification algorithms: yes/no questions, or identify most likely outcome from multiclass list • Regression algorithms: make predictions of outcomes, based on historical patterns • Clustering algorithms: identify groupings within dataset
  • 42. Classification algorithms Classification algorithms: • Logistic Regression • Naïve Bayes • Decision Tree • Decision Forest • Boosted Decision Tree • Neural Network • Support Vector Machine
  • 43. Regression algorithms Regression algorithms: • Linear Regression • Decision Tree • Decision Forest • Boosted Decision Tree
  • 44. Clustering Clustering: • Often used during the initial stages of model development • Detects patterns and anomalies Example: • K-Means Clustering
  • 45. Supervised and unsupervised learning Supervised learning • Target values known • Classification • Regression Unsupervised learning • Target values unknown • Clustering Reinforcement learning • Self-learning through feedback
  • 46. Anomaly detection Anomaly detection: • Rare events • Imbalanced data Anomaly detection methods: • Support Vector Machine (SVM) • PCA-Based Anomaly Detection
  • 47. Introduction to Machine Learning languages • Languages overview • Using R in machine learning • Using Python in machine learning
  • 48. Languages overview Machine learning requires computer code: • Most popular programming languages: • R and • Python Use SQL for queries: • Select data to use • Join/filter data
  • 49. Using R in Machine Learning • R is open-source • R is specifically designed to support statistics and data analysis R packages: • Collections of functions, data, and code • Available from CRAN • Includes 10 000+ R packages
  • 50. Using Python in Machine Learning Python: • Not a specialist data science or statistical tool • Widely used within scientific computing • Lots of resources available Python machine learning-related libraries: • numpy • pandas • matplotlib • scikit-learn
  • 51. Artificial Intelligence – Cognitive Services • Cognitive Services overview • Processing image and video • Processing language
  • 52. What is a cognitive service? Cognitive Services: • Vision. Analyze photos and videos • Speech. Convert speech to text and text to speech • Language. Understand intent from language • Search. Find information on the web using Bing
  • 53. Customer scenarios • Uber driver identification • Starship Commander voice control
  • 54. Processing image and video • Face • Emotion • Content moderator • Video • Computer Vision
  • 55. Face • Person and person groups • Face detection • Face verification • Face identification • Similar face searching • Face grouping
  • 56. Emotion "faceRectangle": { "left": 488, "top": 263, "width": 148, "height": 148 }, "scores": { "anger": 9.075572e-13, "contempt": 7.048959e-9, "disgust": 1.02152783e-11, "fear": 1.778957e-14, "happiness": 0.9999999, "neutral": 1.31694478e-7, "sadness": 6.04054263e-12, "surprise": 3.92249462e-11 }
  • 57. Content moderator • Automated • Human • Hybrid • Content Moderator UI • Image moderation • Text Moderation
  • 58. Video • Face detection and tracking • Motion detection • Stabilization • Video thumbnail
  • 59. Computer Vision • Computer Vision • Tagging images • Categorizing images • Generating descriptions
  • 61. Processing language - LUIS • Language • Learning to talk • Using language to make decisions
  • 62. Language • Natural Language Processing • Part of speech • Nouns • Adjectives • Verbs • Tokens • The yellow fox can’t jump = The – yellow – fox – can –’t –jump
  • 63. Learning to talk • Bing Spell Check • Linguistic analysis • Text analysis • Translator
  • 64. Using language to make decisions • Utterances are translated to intents • Intents drive app decisions • Entities describe information about the intent • Features help identify intents and entities
  • 65. STARSHIP COMMANDER – Virtual Reality Game
  • 66. Resources Microsoft Artificial Intelligence (AI) Professional Certificate https://www.edx.org/professional-certificate/microsoft- artificial-intelligence
  • 67. Module Review and Takeaways • Review Question(s)

Editor's Notes

  1. Review Question(s) Question:
  2. For more information on Cognitive Services, see: Cognitive Services https://aka.ms/xst2si
  3. YouTube Video: https://www.youtube.com/watch?time_continue=7&v=nRZh7dkB_hs
  4. Review Question(s) Question: