SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
BÂLE BERNE BRUGG DUSSELDORF FRANCFORT S.M. FRIBOURG E.BR. GENÈVE
HAMBOURG COPENHAGUE LAUSANNE MUNICH STUTTGART VIENNE ZURICH
Data Science
… a comprehensible customer case
by Lev Kiwi
The menu
1. What is Data Science, and how one does it?
2. Our client‘s context
3. Demo Time with R
Exploring data
Modeling
Evaluating
Data Science
Why Data Science?
Raw Data
Operational
Reporting
Descriptive
Analytics
Predictive
Analytics
Prescriptive
Analytics
Analytic maturity
Value
What is Data Science?
Domain
Expertise
Math &
Stats
Computer
Science
CRISP-DM (CRoss Industry Standard Process - Data Mining)
Shearer, C. (2000). The CRISP-DM model: The new blueprint for data mining.
Journal of Data Warehousing, 5(4), 13–22.
Business
Understanding
Data
Understanding
Data
Preparation
Modeling
Evaluation
Deployment
DATA
Client‘s Context
University of Fribourg CH
CRISP-DM (CRoss Industry Standard Process - Data Mining)
Shearer, C. (2000). The CRISP-DM model: The new blueprint for data mining.
Journal of Data Warehousing, 5(4), 13–22.
Business
Understanding
Data
Understanding
Data
Preparation
Modeling
Evaluation
Deployment
DATA
What are ECTS
(European Credit Transfer and Accumulation System)?
Obtain
ECTS
Credits
Pass
exams
Take
exams
Follow
courses
How does the studies at the university works?
Mathematics
(120 ECTS)
Computer Science
(60 ECTS)
Bachelor in Mathematics
Philosophy
(60 ECTS)
Biology
(60 ECTS)
Mathematics
(90 ECTS)
Master in Mathematics
How does the studies at the university works?
3 years
180 ECTS
1.5 years
90 ECTS
30 ECTS are equivalent to a full-
time study load for one semester
What is the study intensity?
The study intensity of a student for a given semester is the number of ECTS
credits this students gets evaluated in the semester.
-------------------------------------------------------------------------------------------------------
Example. Dominique Duay follows the courses:
1. Introduction to Machine Learning (4 ECTS)
2. Macroeconomics (6 ECTS)
3. Data Analysis and Statistics with R (8 ECTS)
Dominique takes the exams for the first two only, his study intensity for the first
semester is 10 ECTS.
The next semester he decides to take the exam of the third course, this will add 8 ECTS
to its study intensity of the next semester.
What is the big deal about the study intensity?
 The average study intensity across study paths, programs, faculties, levels
is not the same and varies significantly. It is not clear why…
 It is felt that the study intensity is somehow linked to the reputation of the
studies.
 The number of ECTS evaluated per year is somehow correlated to the
budget the university will receive from the confederation.
 The Swiss Confederation started few years ago to monitor more closely the
study intensity of Universities.
Strategy
Identify variables correlated to low/high study
intensity
Predict which students will have a low study
intensity
Increase study intensity with concrete actions
Demo
What is R?
 Programming language for statistical computing and graphics
 Interpreted language (access it through the console)
 Open source and used by researchers, statisticians and data miners all
around the world
 Features > 9000 libraries on the CRAN repository
 Runs in memory (mostly…)
CRISP-DM (CRoss Industry Standard Process - Data Mining)
Shearer, C. (2000). The CRISP-DM model: The new blueprint for data mining.
Journal of Data Warehousing, 5(4), 13–22.
Business
Understanding
Data
Understanding
Data
Preparation
Modeling
Evaluation
Deployment
DATA
The Data
 DWH fact table and related dimensions
 1 line = number of ECTS evaluated per student, per course, per professor…
 We aggregate this data to have one line per student, per semester
 We take data from 2012-2015
The Data
OFS Report on study intensity
Significant independent variables we already
know:
 Age
 Level (Bachelor, Master)
 Major (Economy, Law, Medicine, …)
Linear Regression Model
Regression Tree Model
Comparison between models
 Root Mean Square Error:
0 < RMSE < 𝜎
(the smaller, the better…)
 R Squared:
𝑅2
< 1
(the closer to 1, the better…)
Comparison between models
 Root Mean Square Error:
0 < RMSE < 𝜎
(the smaller, the better…)
 R Squared:
𝑅2
< 1
(the closer to 1, the better…)
Linear Regression.
RMSE = 12.2
𝑅2
= 0.37
Regression Tree.
RMSE = 11.6
𝑅2
= 0.43
CRISP-DM (CRoss Industry Standard Process - Data Mining)
Shearer, C. (2000). The CRISP-DM model: The new blueprint for data mining.
Journal of Data Warehousing, 5(4), 13–22.
Business
Understanding
Data
Understanding
Data
Preparation
Modeling
Evaluation
Deployment
DATA
Variable importance of the advanced model
Overall
AVG_PRESTATION_POPULARITE 9,721355
PRC_EXAM_IN_SEMESTRE 6,282732
PRC_PERIODICITE0 6,004846
AVG_TIME_TO_EXAM 5,810226
AVG_PRESTATION_DIFFICULTE 5,762537
UNI_VP_FAC 5,646884
PRC_PERIODICITE1 5,473713
UNI_VP_VE_NIVEAU 5,182513
PRC_PERIODICITE4 4,442017
PRC_TYPE_PRESTATION2 4,342148
PRC_TYPE_PRESTATION13 3,675386
ETU_DOMAVET_CANTON 2,938208
ETU_AGE 2,775811
ETU_ETABLISSEMENT 1,585795
PRC_TYPE_PRESTATION1 1,556703
PRC_PERIODICITE23 1,413239
PRC_TYPE_PRESTATION3 0,286059
SEMESTRE 0,16307
ETU_SEXE 0,133126
PRC_TYPE_PRESTATION7 0,07456
Predictive capabilities of the advanced model
RMSE = 9.3
𝑅2
= 0.63
CRISP-DM (CRoss Industry Standard Process - Data Mining)
Shearer, C. (2000). The CRISP-DM model: The new blueprint for data mining.
Journal of Data Warehousing, 5(4), 13–22.
Business
Understanding
Data
Understanding
Data
Preparation
Modeling
Evaluation
Deployment
DATA
Cumulative Response Curve
Baseline model Advanced model
Feedback from the client
Dr. Lev Kiwi
Consultant BI
Tel. +58 459 53 75
Lev.Kiwi@trivadis.com

Contenu connexe

Tendances

Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
Vamshikrishna Goud
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Simplilearn
 

Tendances (20)

Open Source Tools for Big Data
Open Source Tools for Big DataOpen Source Tools for Big Data
Open Source Tools for Big Data
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
 
Big data frameworks
Big data frameworksBig data frameworks
Big data frameworks
 
Big Data
Big DataBig Data
Big Data
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data Warehousing
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data Analysis
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data Science
 
NextGen Infrastructure for Big Data
NextGen Infrastructure for Big DataNextGen Infrastructure for Big Data
NextGen Infrastructure for Big Data
 

En vedette

En vedette (9)

Optimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec AzureOptimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec Azure
 
Digitalisation de la donnée Client
Digitalisation de la donnée ClientDigitalisation de la donnée Client
Digitalisation de la donnée Client
 
Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?
 
Building High-scalable Enterprise Solutions,
Building High-scalable Enterprise Solutions, Building High-scalable Enterprise Solutions,
Building High-scalable Enterprise Solutions,
 
Augmentez votre efficacité dans votre planification budgétaire
Augmentez votre efficacité dans votre planification budgétaireAugmentez votre efficacité dans votre planification budgétaire
Augmentez votre efficacité dans votre planification budgétaire
 
Internet of Things and Big Data
Internet of Things and Big DataInternet of Things and Big Data
Internet of Things and Big Data
 
Cloud transition - The Trivadis approach
Cloud transition - The Trivadis approachCloud transition - The Trivadis approach
Cloud transition - The Trivadis approach
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
 

Similaire à Cas pratique de la science de la donnée dans le domaine universitaire - Data Science,

Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical University
butest
 
Best Data Science Online Training in Hyderabad
  Best Data Science Online Training in Hyderabad  Best Data Science Online Training in Hyderabad
Best Data Science Online Training in Hyderabad
bharathtsofttech
 
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxDeliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
randyburney60861
 

Similaire à Cas pratique de la science de la donnée dans le domaine universitaire - Data Science, (20)

AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical University
 
Knowledge Discovery in Environmental Management
Knowledge Discovery in Environmental Management Knowledge Discovery in Environmental Management
Knowledge Discovery in Environmental Management
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototyping
 
Kdd by Mr.Sameer Kumar Das
Kdd by Mr.Sameer Kumar DasKdd by Mr.Sameer Kumar Das
Kdd by Mr.Sameer Kumar Das
 
DEA
DEADEA
DEA
 
Introduction to Data Science 1113.pptx
Introduction to Data Science 1113.pptxIntroduction to Data Science 1113.pptx
Introduction to Data Science 1113.pptx
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
factorization methods
factorization methodsfactorization methods
factorization methods
 
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
 
Best Data Science Online Training in Hyderabad
  Best Data Science Online Training in Hyderabad  Best Data Science Online Training in Hyderabad
Best Data Science Online Training in Hyderabad
 
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxDeliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
 
Paper presentation
Paper presentationPaper presentation
Paper presentation
 
Data reduction techniques to analyze nsl kdd dataset
Data reduction techniques to analyze nsl kdd datasetData reduction techniques to analyze nsl kdd dataset
Data reduction techniques to analyze nsl kdd dataset
 
CLIM Program: Remote Sensing Workshop, A Notional Framework for a Theory of D...
CLIM Program: Remote Sensing Workshop, A Notional Framework for a Theory of D...CLIM Program: Remote Sensing Workshop, A Notional Framework for a Theory of D...
CLIM Program: Remote Sensing Workshop, A Notional Framework for a Theory of D...
 
What's new with analytics in academia?
What's new with analytics in academia?What's new with analytics in academia?
What's new with analytics in academia?
 

Plus de Swiss Data Forum Swiss Data Forum

Plus de Swiss Data Forum Swiss Data Forum (19)

Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ? Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
 
Customer Event Hub - the modern Customer 360° view
Customer Event Hub - the modern Customer 360° viewCustomer Event Hub - the modern Customer 360° view
Customer Event Hub - the modern Customer 360° view
 
Montée en version de 300 bases de données vers Oracle 12c en 300 jours. Quel...
Montée en version de 300 bases de données vers Oracle 12c en 300 jours.  Quel...Montée en version de 300 bases de données vers Oracle 12c en 300 jours.  Quel...
Montée en version de 300 bases de données vers Oracle 12c en 300 jours. Quel...
 
Le monde NOSQL pour les spécialistes du relationnel,
Le monde NOSQL pour les spécialistes du relationnel, Le monde NOSQL pour les spécialistes du relationnel,
Le monde NOSQL pour les spécialistes du relationnel,
 
IoT Portal with PowerBI and SharePoint
IoT Portal with PowerBI and SharePointIoT Portal with PowerBI and SharePoint
IoT Portal with PowerBI and SharePoint
 
Bigdata et datamining au service de la transition énergétique
Bigdata et datamining au service de la transition énergétiqueBigdata et datamining au service de la transition énergétique
Bigdata et datamining au service de la transition énergétique
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenant
 
Intelligence & Gouvernance
Intelligence & GouvernanceIntelligence & Gouvernance
Intelligence & Gouvernance
 
Avec biGenius® sur Azure, oubliez la technique, concentrez vos efforts sur le...
Avec biGenius® sur Azure, oubliez la technique, concentrez vos efforts sur le...Avec biGenius® sur Azure, oubliez la technique, concentrez vos efforts sur le...
Avec biGenius® sur Azure, oubliez la technique, concentrez vos efforts sur le...
 
Le Swiss Data Cloud, vu par l’opérateur UPC Cablecom Business
Le Swiss Data Cloud, vu par l’opérateur UPC Cablecom BusinessLe Swiss Data Cloud, vu par l’opérateur UPC Cablecom Business
Le Swiss Data Cloud, vu par l’opérateur UPC Cablecom Business
 
IoT – The reality of real world solutions
IoT – The reality of real world solutions IoT – The reality of real world solutions
IoT – The reality of real world solutions
 
The Power of Mobile & Cloud: Building a Homesecurity-System with Microsoft Az...
The Power of Mobile & Cloud: Building a Homesecurity-System with Microsoft Az...The Power of Mobile & Cloud: Building a Homesecurity-System with Microsoft Az...
The Power of Mobile & Cloud: Building a Homesecurity-System with Microsoft Az...
 
Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,
 
IT-Analytics: Screen your IT processes with BI Technology
IT-Analytics: Screen your IT processes with BI TechnologyIT-Analytics: Screen your IT processes with BI Technology
IT-Analytics: Screen your IT processes with BI Technology
 
PoC Oracle Exadata - Retour d'expérience
PoC Oracle Exadata - Retour d'expériencePoC Oracle Exadata - Retour d'expérience
PoC Oracle Exadata - Retour d'expérience
 
A gentle introduction to Oracle R Enterprise
A gentle introduction to Oracle R EnterpriseA gentle introduction to Oracle R Enterprise
A gentle introduction to Oracle R Enterprise
 
Mobilité dans l'entreprise - Facts & Figures
Mobilité dans l'entreprise - Facts & FiguresMobilité dans l'entreprise - Facts & Figures
Mobilité dans l'entreprise - Facts & Figures
 
Information Life Cycle Management avec Oracle 12c
Information Life Cycle Management avec Oracle 12cInformation Life Cycle Management avec Oracle 12c
Information Life Cycle Management avec Oracle 12c
 
Data vault modeling et retour d'expérience
Data vault modeling et retour d'expérienceData vault modeling et retour d'expérience
Data vault modeling et retour d'expérience
 

Dernier

一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
pyhepag
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
Amil baba
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
fztigerwe
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
ppy8zfkfm
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
dq9vz1isj
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
pyhepag
 

Dernier (20)

一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
Heaps & its operation -Max Heap, Min Heap
Heaps & its operation -Max Heap, Min  HeapHeaps & its operation -Max Heap, Min  Heap
Heaps & its operation -Max Heap, Min Heap
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 

Cas pratique de la science de la donnée dans le domaine universitaire - Data Science,

  • 1. BÂLE BERNE BRUGG DUSSELDORF FRANCFORT S.M. FRIBOURG E.BR. GENÈVE HAMBOURG COPENHAGUE LAUSANNE MUNICH STUTTGART VIENNE ZURICH Data Science … a comprehensible customer case by Lev Kiwi
  • 2. The menu 1. What is Data Science, and how one does it? 2. Our client‘s context 3. Demo Time with R Exploring data Modeling Evaluating
  • 4. Why Data Science? Raw Data Operational Reporting Descriptive Analytics Predictive Analytics Prescriptive Analytics Analytic maturity Value
  • 5. What is Data Science? Domain Expertise Math & Stats Computer Science
  • 6. CRISP-DM (CRoss Industry Standard Process - Data Mining) Shearer, C. (2000). The CRISP-DM model: The new blueprint for data mining. Journal of Data Warehousing, 5(4), 13–22. Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment DATA
  • 8. CRISP-DM (CRoss Industry Standard Process - Data Mining) Shearer, C. (2000). The CRISP-DM model: The new blueprint for data mining. Journal of Data Warehousing, 5(4), 13–22. Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment DATA
  • 9. What are ECTS (European Credit Transfer and Accumulation System)? Obtain ECTS Credits Pass exams Take exams Follow courses
  • 10. How does the studies at the university works? Mathematics (120 ECTS) Computer Science (60 ECTS) Bachelor in Mathematics Philosophy (60 ECTS) Biology (60 ECTS) Mathematics (90 ECTS) Master in Mathematics
  • 11. How does the studies at the university works? 3 years 180 ECTS 1.5 years 90 ECTS 30 ECTS are equivalent to a full- time study load for one semester
  • 12. What is the study intensity? The study intensity of a student for a given semester is the number of ECTS credits this students gets evaluated in the semester. ------------------------------------------------------------------------------------------------------- Example. Dominique Duay follows the courses: 1. Introduction to Machine Learning (4 ECTS) 2. Macroeconomics (6 ECTS) 3. Data Analysis and Statistics with R (8 ECTS) Dominique takes the exams for the first two only, his study intensity for the first semester is 10 ECTS. The next semester he decides to take the exam of the third course, this will add 8 ECTS to its study intensity of the next semester.
  • 13. What is the big deal about the study intensity?  The average study intensity across study paths, programs, faculties, levels is not the same and varies significantly. It is not clear why…  It is felt that the study intensity is somehow linked to the reputation of the studies.  The number of ECTS evaluated per year is somehow correlated to the budget the university will receive from the confederation.  The Swiss Confederation started few years ago to monitor more closely the study intensity of Universities.
  • 14. Strategy Identify variables correlated to low/high study intensity Predict which students will have a low study intensity Increase study intensity with concrete actions
  • 15. Demo
  • 16. What is R?  Programming language for statistical computing and graphics  Interpreted language (access it through the console)  Open source and used by researchers, statisticians and data miners all around the world  Features > 9000 libraries on the CRAN repository  Runs in memory (mostly…)
  • 17. CRISP-DM (CRoss Industry Standard Process - Data Mining) Shearer, C. (2000). The CRISP-DM model: The new blueprint for data mining. Journal of Data Warehousing, 5(4), 13–22. Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment DATA
  • 18. The Data  DWH fact table and related dimensions  1 line = number of ECTS evaluated per student, per course, per professor…  We aggregate this data to have one line per student, per semester  We take data from 2012-2015
  • 20. OFS Report on study intensity Significant independent variables we already know:  Age  Level (Bachelor, Master)  Major (Economy, Law, Medicine, …)
  • 23. Comparison between models  Root Mean Square Error: 0 < RMSE < 𝜎 (the smaller, the better…)  R Squared: 𝑅2 < 1 (the closer to 1, the better…)
  • 24. Comparison between models  Root Mean Square Error: 0 < RMSE < 𝜎 (the smaller, the better…)  R Squared: 𝑅2 < 1 (the closer to 1, the better…) Linear Regression. RMSE = 12.2 𝑅2 = 0.37 Regression Tree. RMSE = 11.6 𝑅2 = 0.43
  • 25. CRISP-DM (CRoss Industry Standard Process - Data Mining) Shearer, C. (2000). The CRISP-DM model: The new blueprint for data mining. Journal of Data Warehousing, 5(4), 13–22. Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment DATA
  • 26. Variable importance of the advanced model Overall AVG_PRESTATION_POPULARITE 9,721355 PRC_EXAM_IN_SEMESTRE 6,282732 PRC_PERIODICITE0 6,004846 AVG_TIME_TO_EXAM 5,810226 AVG_PRESTATION_DIFFICULTE 5,762537 UNI_VP_FAC 5,646884 PRC_PERIODICITE1 5,473713 UNI_VP_VE_NIVEAU 5,182513 PRC_PERIODICITE4 4,442017 PRC_TYPE_PRESTATION2 4,342148 PRC_TYPE_PRESTATION13 3,675386 ETU_DOMAVET_CANTON 2,938208 ETU_AGE 2,775811 ETU_ETABLISSEMENT 1,585795 PRC_TYPE_PRESTATION1 1,556703 PRC_PERIODICITE23 1,413239 PRC_TYPE_PRESTATION3 0,286059 SEMESTRE 0,16307 ETU_SEXE 0,133126 PRC_TYPE_PRESTATION7 0,07456
  • 27. Predictive capabilities of the advanced model RMSE = 9.3 𝑅2 = 0.63
  • 28. CRISP-DM (CRoss Industry Standard Process - Data Mining) Shearer, C. (2000). The CRISP-DM model: The new blueprint for data mining. Journal of Data Warehousing, 5(4), 13–22. Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment DATA
  • 29. Cumulative Response Curve Baseline model Advanced model
  • 31. Dr. Lev Kiwi Consultant BI Tel. +58 459 53 75 Lev.Kiwi@trivadis.com