SlideShare une entreprise Scribd logo
1  sur  58
Télécharger pour lire hors ligne
A Pipeline for Modeling Automated
Scoring Using Python, R and
Jupyter Notebooks
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Nitin Madnani, Anastassia Loukina & Lei Chen
Machine Learning &
Educational Assessment
A Pythonic Love Story
Nitin Madnani, Anastassia Loukina & Lei Chen
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Testing Service
• A non-profit educational organization founded in 1947,
headquartered in Princeton, New Jersey (N≊3500).
• Designs and administers global as well as domestic educational
assessments (GRE®, TOEFL®, PRAXIS® etc.)
• Conducts and publishes extensive research on psychometrics,
statistics, cognitive science, and computer science.[1]
• Mission: To advance quality and equity in education by providing
fair and valid assessments, research and related services.
3
[1]	http://search.ets.org/researcher/
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Two Parts
• Part 1: What makes educational assessment a
challenging application for machine learning?



• Part 2: How does Python help us address some of these
challenges at ETS?
4
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational
Assessment
Machine
Learning
Part 1
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Classroom Quiz
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Classroom Quiz
Homework
Assignment
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Classroom Quiz
MOOC
Assignments
Homework
Assignment
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Classroom Quiz
MOOC
Assignments
K-12
Standardized
Tests
Homework
Assignment
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Classroom Quiz
MOOC
Assignments
Teacher
Certification
K-12
Standardized
Tests
Homework
Assignment
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Classroom Quiz
MOOC
Assignments
Teacher
Certification
K-12
Standardized
Tests
Homework
Assignment
Practice Tests
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Classroom Quiz
MOOC
Assignments
TOEFL/IELTS
Teacher
Certification
K-12
Standardized
Tests
Homework
Assignment
Practice Tests
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Classroom Quiz
GRE
MOOC
Assignments
TOEFL/IELTS
Teacher
Certification
K-12
Standardized
Tests
Homework
Assignment
Practice Tests
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Classroom Quiz
GRE
MOOC
Assignments
TOEFL/IELTS
Teacher
Certification
GMAT
K-12
Standardized
Tests
Homework
Assignment
Practice Tests
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Classroom Quiz
GRE
MOOC
Assignments
TOEFL/IELTS
GED
Teacher
Certification
GMAT
K-12
Standardized
Tests
Homework
Assignment
Practice Tests
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
7
Classroom Quiz GRE
MOOC
Assignments
TOEFL/IELTS
GED
Teacher
Certification
GMAT
K-12
Standardized
Tests
Homework
Assignment
Practice Tests
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
8
Classroom Quiz GRE
MOOC
Assignments
TOEFL/IELTS
GED
Teacher
Certification
GMAT
K-12
Standardized
Tests
Homework
Assignment High Stakes
Practice Tests
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
“High Stakes”
• A test with results that have important, direct
consequences for the test-takers.



• A test-taker would want to understand what their score
means and how it maps to what they did on the test.
9
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
10
Classroom Quiz GRE
MOOC
Assignments
TOEFL/IELTS
GED
Teacher
Certification
GMAT
K-12
Standardized
Tests
Homework
Assignment High Stakes
Practice Tests
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
11
Classroom Quiz GRE
MOOC
Assignments
TOEFL/IELTS
GED
Teacher
Certification
GMAT
K-12
Standardized
Tests
Homework
Assignment High Stakes
Practice Tests
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
The GRE
• Graduate Record Examination, designed and administered by ETS.
• Used by at least 3000 colleges and universities across the world for
graduate school applications to MS, MBA & PhD programs.[1]
• ~575,000 test-takers from ~200 countries between July 2013 and June
2014 (50% women, 45% men). [2]
• Three sections:
• Verbal Reasoning
• Quantitative Reasoning
• Analytical Writing
12
[2]	http://www.ets.org/s/gre/pdf/snapshot_test_taker_data_2014.pdf[1]	https://www.ets.org/s/gre/pdf/gre_aidi_fellowships.pdf
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
The GRE
• Graduate Record Examination, designed and administered by ETS.
• Used by at least 3000 colleges and universities across the world for
graduate school applications to MS, MBA & PhD programs.[1]
• ~575,000 test-takers from ~200 countries between July 2013 and June
2014 (50% women, 45% men). [2]
• Three sections:
• Verbal Reasoning
• Quantitative Reasoning
• Analytical Writing
13
[2]	http://www.ets.org/s/gre/pdf/snapshot_test_taker_data_2014.pdf[1]	https://www.ets.org/s/gre/pdf/gre_aidi_fellowships.pdf
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
GRE Analytical Writing
14
“As people rely more and more on technology to solve problems,
the ability of humans to think for themselves will surely deteriorate.”


Directions: Write a response in which you discuss the extent to
which you agree or disagree with the statement and explain your
reasoning for the position you take.
https://www.ets.org/gre/revised_general/prepare/analytical_writing/issue/scoring_guide
Score 6. Outstanding
articulates a clear and insightful position
develops the position fully
well-focused, well-organized analysis
conveys ideas fluently and precisely
demonstrates superior facility with English
Score 1. Fundamentally Deficient
provides little/no evidence of understanding
disorganized or extremely brief
severe problems with sentence structure
pervasive errors in grammar
incoherent and meaning not clear
…
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Scoring essays
Given the stakes, our scoring methodology must
maximize:
• Accuracy: how accurately does the assigned score measure
the analytical skills of the test-taker?
• Interpretability: how easily can test-takers understand why
they was assigned a particular score and what that score
means?
15
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Scoring essays
Given the stakes, our scoring methodology must
maximize:
• Accuracy: how accurately does the assigned score measure
the analytical skills of the test-taker?
• Interpretability: how easily can test-takers understand why
they was assigned a particular score and what that score
means?
15
It would also be nice to minimize:
• Cost: how efficiently can we score each test (how much money
can we save the test-taker in fees)?
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Scoring essays
16
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Scoring essays
17
Essay
Scoring
Guide
Trained Human Readers
High Accuracy
Medium Interpretability
High Cost
Option 1
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Scoring essays
18
Essay
Scoring
Guide
Features
Automated Scoring System
(Machine Learning)
Medium Accuracy
(Choice of) High Interpretability
Low Cost
Option 2
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Scoring essays
19
Essay
Scoring
Guide
One Trained Human Reader
Scoring
Guide
Features
Automated Scoring System
(Machine Learning)
Final score
Human Score
System Score
As good as using two
human readers[1].
[1]	http://www.ets.org/Media/Research/pdf/RD_Connections2.pdf
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
E-rater
20
Scoring
Guide
Features
Automated Scoring System
(Machine Learning)
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
E-rater
20
“Essay Rater”

Linear regression trained on older essays
written to the same topic and scored by
human readers.

Features
errors in grammar (e.g., subject-verb agreement)
usage errors (incorrect prepositions/articles)
mechanics errors (capitalization, spelling)
errors in style (repetitious word use)
discourse structure (presence of a thesis
statement, main points)
vocabulary sophistication
essay organization
Automated	Essay	Scoring	With	e-rater®	V.2,	The	Journal	of	Technology,	Learning,	and	Assessment,	Volume	4(3),	2006	
Scoring
Guide
Features
Automated Scoring System
(Machine Learning)
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
E-rater & Research
21
E-rater still an active area of research at ETS

Design new features; examine their effect on
performance, and whether they overlap with
existing features.
Try more sophisticated machine learning
models (higher accuracy worth lower
interpretability?)
Last year, 10 new e-rater features proposed
just for GRE!
GRE one of a dozen assessments, e-rater one
of many automated scoring engines
Research untenable for a large group (>15
scientists) without a standardized pipeline.
Scoring
Guide
Features
Automated Scoring System
(Machine Learning)
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Need an end-to-end machine learning pipeline that can:
• Work on (almost) all platforms,
• Read features in any tabular format and clean it up,
• Efficiently apply filtering, scaling and transformations,
• Train any specified model with those features, and
• Generate a standardized, detailed report of performance on
unseen essays.
22
Ideal research pipeline
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational
Assessment
Machine
Learning
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational
Assessment
Machine
Learning
Python
Part 2
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
25
Python Pipeline
Input
Preprocess
Model
Evaluate
Report
Input
final self-contained report
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
26
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
26
Model Name
(str)
Training Features
(csv/tsv/xls)
Unseen Test Features
(csv/tsv/xls)
Feature Definitions
(json)
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
26
Model Name
(str)
Training Features
(csv/tsv/xls)
Unseen Test Features
(csv/tsv/xls)
Feature Definitions
(json)
1.Input
• Read files into data frames
• Check for missing feature columns, exclude others
• Filter out non-numeric and blank values
• Standardize essay ID and essay score column names
pandas
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
26
Model Name
(str)
Training Features
(csv/tsv/xls)
Unseen Test Features
(csv/tsv/xls)
Feature Definitions
(json)
Training
Data
Frame
(Raw)
Test
Data
Frame
(Raw)
1.Input
• Read files into data frames
• Check for missing feature columns, exclude others
• Filter out non-numeric and blank values
• Standardize essay ID and essay score column names
pandas
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
27
Model Name
(str)
Feature Definitions
(json)
Training
Data
Frame
(Raw)
Test
Data
Frame
(Raw)
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
27
Model Name
(str)
Feature Definitions
(json)
Training
Data
Frame
(Raw)
Test
Data
Frame
(Raw)
2.Preprocess
• Filter out user-flagged rows, if so specified
• Remove feature outliers & “intelligently” apply feature
transformations (log,	inv,	sqrt, etc.), if available
• Standardize all features (center and scale)
numpy	
+	
pandas
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
27
Model Name
(str)
Feature Definitions
(json)
Training
Data
Frame
(Raw)
Test
Data
Frame
(Raw)
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
2.Preprocess
• Filter out user-flagged rows, if so specified
• Remove feature outliers & “intelligently” apply feature
transformations (log,	inv,	sqrt, etc.), if available
• Standardize all features (center and scale)
numpy	
+	
pandas
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
28
Model Name
(str)
Feature Definitions
(json)
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
28
Model Name
(str)
Feature Definitions
(json)
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
3.Model
• Train regression/classification model via SKLL API or R
• Grid-search using a task-appropriate objective
• Serializes model to disk (using joblib)
R	
+		
skll
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
28
Model Name
(str)
Feature Definitions
(json)
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
3.Model
• Train regression/classification model via SKLL API or R
• Grid-search using a task-appropriate objective
• Serializes model to disk (using joblib)
R	
+		
skll
SKLL (pronounced “skull”) provides an API and command-line
utilities to make it much simpler to run common scikit-learn
experiments with pre-generated features.
(Presented by @dsblanch at PyData 2013 & 2014)
https://github.com/EducationalTestingService/skll
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
28
Model Name
(str)
Feature Definitions
(json)
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
3.Model
• Train regression/classification model via SKLL API or R
• Grid-search using a task-appropriate objective
• Serializes model to disk (using joblib)
R	
+		
skll
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
28
Model Name
(str)
Feature Definitions
(json)
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
3.Model
• Train regression/classification model via SKLL API or R
• Grid-search using a task-appropriate objective
• Serializes model to disk (using joblib)
R	
+		
skll
Serialized model
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
29
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
Serialized model
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
29
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
Serialized model
4.Evaluate
• Use serialized model to compute test set predictions
• Trim and re-scale predictions to match training data
• Compute a set of standard evaluation metrics by
comparing predictions to test set human scores
skll	
+	
pandas
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
29
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
Serialized model
Test
Data
Predictions
Evaluation
Statistics
4.Evaluate
• Use serialized model to compute test set predictions
• Trim and re-scale predictions to match training data
• Compute a set of standard evaluation metrics by
comparing predictions to test set human scores
skll	
+	
pandas
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
30
Test
Data
Predictions
Evaluation
Statistics
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
Serialized model
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
30
Test
Data
Predictions
Evaluation
Statistics
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
Serialized model
5.Report
• Determine what report sections should be included
• Merge pre-existing section templates (.ipynb files)
• Dynamically Run final .ipynb file (via
ExecutePreprocessor and environment variables)
• Convert report to HTML using HTMLExporter
jupyter	
+	seaborn		
	+	pandas
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
30
Test
Data
Predictions
Evaluation
Statistics
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
Serialized model
Final
Report
(.ipynb)
Final
Report
(html)
5.Report
• Determine what report sections should be included
• Merge pre-existing section templates (.ipynb files)
• Dynamically Run final .ipynb file (via
ExecutePreprocessor and environment variables)
• Convert report to HTML using HTMLExporter
jupyter	
+	seaborn		
	+	pandas
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Demo
31
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Summary
• Machine learning in high-stakes educational assessment requires
additional number crunching to verify accuracy and interpretability.
• Need a pipeline to compare a large number of research experiments
using a standardized, easy-to-read report.
• The scientific Python stack makes it super easy to implement all
stages of the pipeline!
• In progress
• Release under open-source license (2016 release)
• A CherryPy/JS web-app to allow wider reach
32
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Questions?
33
https://github.com/EducationalTestingService	
https://github.com/desilinguist	
@haikuman

Contenu connexe

Similaire à Pipeline for Modeling Automated Scoring (PyData NYC 2015)

JRT Presentation
JRT PresentationJRT Presentation
JRT Presentationtdimattia
 
JRT Training Presentation
JRT Training PresentationJRT Training Presentation
JRT Training Presentationtdimattia
 
Elar module 5 oct16
Elar module 5 oct16Elar module 5 oct16
Elar module 5 oct16Megan Berger
 
On the wrong tram ll 1210 clint smith
On the wrong tram ll 1210 clint smithOn the wrong tram ll 1210 clint smith
On the wrong tram ll 1210 clint smithclintos
 
Non-Cognitive Factors as Predictors of Student Success
Non-Cognitive Factors as Predictors of Student SuccessNon-Cognitive Factors as Predictors of Student Success
Non-Cognitive Factors as Predictors of Student Successwmiller824
 
Vet courses for schools
Vet courses for schoolsVet courses for schools
Vet courses for schoolsMarry Davis
 
Initial Teacher Training - Eligio Cerval-Pena
Initial Teacher Training - Eligio Cerval-PenaInitial Teacher Training - Eligio Cerval-Pena
Initial Teacher Training - Eligio Cerval-PenaIMI PQ NET Romania
 
SAT Math Workbook chapter-1-TestMentor's
SAT Math Workbook chapter-1-TestMentor'sSAT Math Workbook chapter-1-TestMentor's
SAT Math Workbook chapter-1-TestMentor'sTest Mentor LLC
 
Dr. Connie Johnson: Student Success & MyFoundationsLab
Dr. Connie Johnson: Student Success & MyFoundationsLab Dr. Connie Johnson: Student Success & MyFoundationsLab
Dr. Connie Johnson: Student Success & MyFoundationsLab Pearson North America
 
Pedagogy assignment
Pedagogy assignmentPedagogy assignment
Pedagogy assignmentreshmafmtc
 
Case Study MBA Schools in Asia-Pacific Grading GuideQNT561.docx
Case Study MBA Schools in Asia-Pacific Grading GuideQNT561.docxCase Study MBA Schools in Asia-Pacific Grading GuideQNT561.docx
Case Study MBA Schools in Asia-Pacific Grading GuideQNT561.docxwendolynhalbert
 
International Entrance Exams to Study Overseas
International Entrance Exams to Study OverseasInternational Entrance Exams to Study Overseas
International Entrance Exams to Study OverseasIntStu
 
E Capability What And How
E Capability What And HowE Capability What And How
E Capability What And Howclintos
 

Similaire à Pipeline for Modeling Automated Scoring (PyData NYC 2015) (20)

JRT Presentation
JRT PresentationJRT Presentation
JRT Presentation
 
JRT Training Presentation
JRT Training PresentationJRT Training Presentation
JRT Training Presentation
 
Elar module 5 oct16
Elar module 5 oct16Elar module 5 oct16
Elar module 5 oct16
 
On the wrong tram ll 1210 clint smith
On the wrong tram ll 1210 clint smithOn the wrong tram ll 1210 clint smith
On the wrong tram ll 1210 clint smith
 
COMEDK UGET 2011
COMEDK UGET 2011COMEDK UGET 2011
COMEDK UGET 2011
 
Non-Cognitive Factors as Predictors of Student Success
Non-Cognitive Factors as Predictors of Student SuccessNon-Cognitive Factors as Predictors of Student Success
Non-Cognitive Factors as Predictors of Student Success
 
Vet courses for schools
Vet courses for schoolsVet courses for schools
Vet courses for schools
 
OBTC Capability Statement
OBTC Capability StatementOBTC Capability Statement
OBTC Capability Statement
 
Initial Teacher Training - Eligio Cerval-Pena
Initial Teacher Training - Eligio Cerval-PenaInitial Teacher Training - Eligio Cerval-Pena
Initial Teacher Training - Eligio Cerval-Pena
 
Ielts
IeltsIelts
Ielts
 
SAT Math Workbook chapter-1-TestMentor's
SAT Math Workbook chapter-1-TestMentor'sSAT Math Workbook chapter-1-TestMentor's
SAT Math Workbook chapter-1-TestMentor's
 
Fall 2011 NT4CM
Fall 2011 NT4CMFall 2011 NT4CM
Fall 2011 NT4CM
 
Youth4work Prep Tests
Youth4work Prep TestsYouth4work Prep Tests
Youth4work Prep Tests
 
Resonance.ac.in
Resonance.ac.inResonance.ac.in
Resonance.ac.in
 
Online assignment
Online assignmentOnline assignment
Online assignment
 
Dr. Connie Johnson: Student Success & MyFoundationsLab
Dr. Connie Johnson: Student Success & MyFoundationsLab Dr. Connie Johnson: Student Success & MyFoundationsLab
Dr. Connie Johnson: Student Success & MyFoundationsLab
 
Pedagogy assignment
Pedagogy assignmentPedagogy assignment
Pedagogy assignment
 
Case Study MBA Schools in Asia-Pacific Grading GuideQNT561.docx
Case Study MBA Schools in Asia-Pacific Grading GuideQNT561.docxCase Study MBA Schools in Asia-Pacific Grading GuideQNT561.docx
Case Study MBA Schools in Asia-Pacific Grading GuideQNT561.docx
 
International Entrance Exams to Study Overseas
International Entrance Exams to Study OverseasInternational Entrance Exams to Study Overseas
International Entrance Exams to Study Overseas
 
E Capability What And How
E Capability What And HowE Capability What And How
E Capability What And How
 

Dernier

Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxVelmuruganTECE
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Autonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptAutonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptbibisarnayak0
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectErbil Polytechnic University
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfChristianCDAM
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxNiranjanYadav41
 

Dernier (20)

Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptx
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Autonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptAutonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.ppt
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction Project
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdf
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptx
 

Pipeline for Modeling Automated Scoring (PyData NYC 2015)

  • 1. A Pipeline for Modeling Automated Scoring Using Python, R and Jupyter Notebooks Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Nitin Madnani, Anastassia Loukina & Lei Chen
  • 2. Machine Learning & Educational Assessment A Pythonic Love Story Nitin Madnani, Anastassia Loukina & Lei Chen Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
  • 3. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Testing Service • A non-profit educational organization founded in 1947, headquartered in Princeton, New Jersey (N≊3500). • Designs and administers global as well as domestic educational assessments (GRE®, TOEFL®, PRAXIS® etc.) • Conducts and publishes extensive research on psychometrics, statistics, cognitive science, and computer science.[1] • Mission: To advance quality and equity in education by providing fair and valid assessments, research and related services. 3 [1] http://search.ets.org/researcher/
  • 4. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Two Parts • Part 1: What makes educational assessment a challenging application for machine learning?
 
 • Part 2: How does Python help us address some of these challenges at ETS? 4
  • 5. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessment Machine Learning Part 1
  • 6. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6
  • 7. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6 Classroom Quiz
  • 8. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6 Classroom Quiz Homework Assignment
  • 9. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6 Classroom Quiz MOOC Assignments Homework Assignment
  • 10. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6 Classroom Quiz MOOC Assignments K-12 Standardized Tests Homework Assignment
  • 11. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6 Classroom Quiz MOOC Assignments Teacher Certification K-12 Standardized Tests Homework Assignment
  • 12. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6 Classroom Quiz MOOC Assignments Teacher Certification K-12 Standardized Tests Homework Assignment Practice Tests
  • 13. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6 Classroom Quiz MOOC Assignments TOEFL/IELTS Teacher Certification K-12 Standardized Tests Homework Assignment Practice Tests
  • 14. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6 Classroom Quiz GRE MOOC Assignments TOEFL/IELTS Teacher Certification K-12 Standardized Tests Homework Assignment Practice Tests
  • 15. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6 Classroom Quiz GRE MOOC Assignments TOEFL/IELTS Teacher Certification GMAT K-12 Standardized Tests Homework Assignment Practice Tests
  • 16. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6 Classroom Quiz GRE MOOC Assignments TOEFL/IELTS GED Teacher Certification GMAT K-12 Standardized Tests Homework Assignment Practice Tests
  • 17. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 7 Classroom Quiz GRE MOOC Assignments TOEFL/IELTS GED Teacher Certification GMAT K-12 Standardized Tests Homework Assignment Practice Tests
  • 18. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 8 Classroom Quiz GRE MOOC Assignments TOEFL/IELTS GED Teacher Certification GMAT K-12 Standardized Tests Homework Assignment High Stakes Practice Tests
  • 19. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 “High Stakes” • A test with results that have important, direct consequences for the test-takers.
 
 • A test-taker would want to understand what their score means and how it maps to what they did on the test. 9
  • 20. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 10 Classroom Quiz GRE MOOC Assignments TOEFL/IELTS GED Teacher Certification GMAT K-12 Standardized Tests Homework Assignment High Stakes Practice Tests
  • 21. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 11 Classroom Quiz GRE MOOC Assignments TOEFL/IELTS GED Teacher Certification GMAT K-12 Standardized Tests Homework Assignment High Stakes Practice Tests
  • 22. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 The GRE • Graduate Record Examination, designed and administered by ETS. • Used by at least 3000 colleges and universities across the world for graduate school applications to MS, MBA & PhD programs.[1] • ~575,000 test-takers from ~200 countries between July 2013 and June 2014 (50% women, 45% men). [2] • Three sections: • Verbal Reasoning • Quantitative Reasoning • Analytical Writing 12 [2] http://www.ets.org/s/gre/pdf/snapshot_test_taker_data_2014.pdf[1] https://www.ets.org/s/gre/pdf/gre_aidi_fellowships.pdf
  • 23. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 The GRE • Graduate Record Examination, designed and administered by ETS. • Used by at least 3000 colleges and universities across the world for graduate school applications to MS, MBA & PhD programs.[1] • ~575,000 test-takers from ~200 countries between July 2013 and June 2014 (50% women, 45% men). [2] • Three sections: • Verbal Reasoning • Quantitative Reasoning • Analytical Writing 13 [2] http://www.ets.org/s/gre/pdf/snapshot_test_taker_data_2014.pdf[1] https://www.ets.org/s/gre/pdf/gre_aidi_fellowships.pdf
  • 24. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 GRE Analytical Writing 14 “As people rely more and more on technology to solve problems, the ability of humans to think for themselves will surely deteriorate.” 
 Directions: Write a response in which you discuss the extent to which you agree or disagree with the statement and explain your reasoning for the position you take. https://www.ets.org/gre/revised_general/prepare/analytical_writing/issue/scoring_guide Score 6. Outstanding articulates a clear and insightful position develops the position fully well-focused, well-organized analysis conveys ideas fluently and precisely demonstrates superior facility with English Score 1. Fundamentally Deficient provides little/no evidence of understanding disorganized or extremely brief severe problems with sentence structure pervasive errors in grammar incoherent and meaning not clear …
  • 25. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Scoring essays Given the stakes, our scoring methodology must maximize: • Accuracy: how accurately does the assigned score measure the analytical skills of the test-taker? • Interpretability: how easily can test-takers understand why they was assigned a particular score and what that score means? 15
  • 26. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Scoring essays Given the stakes, our scoring methodology must maximize: • Accuracy: how accurately does the assigned score measure the analytical skills of the test-taker? • Interpretability: how easily can test-takers understand why they was assigned a particular score and what that score means? 15 It would also be nice to minimize: • Cost: how efficiently can we score each test (how much money can we save the test-taker in fees)?
  • 27. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Scoring essays 16
  • 28. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Scoring essays 17 Essay Scoring Guide Trained Human Readers High Accuracy Medium Interpretability High Cost Option 1
  • 29. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Scoring essays 18 Essay Scoring Guide Features Automated Scoring System (Machine Learning) Medium Accuracy (Choice of) High Interpretability Low Cost Option 2
  • 30. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Scoring essays 19 Essay Scoring Guide One Trained Human Reader Scoring Guide Features Automated Scoring System (Machine Learning) Final score Human Score System Score As good as using two human readers[1]. [1] http://www.ets.org/Media/Research/pdf/RD_Connections2.pdf
  • 31. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 E-rater 20 Scoring Guide Features Automated Scoring System (Machine Learning)
  • 32. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 E-rater 20 “Essay Rater”
 Linear regression trained on older essays written to the same topic and scored by human readers.
 Features errors in grammar (e.g., subject-verb agreement) usage errors (incorrect prepositions/articles) mechanics errors (capitalization, spelling) errors in style (repetitious word use) discourse structure (presence of a thesis statement, main points) vocabulary sophistication essay organization Automated Essay Scoring With e-rater® V.2, The Journal of Technology, Learning, and Assessment, Volume 4(3), 2006 Scoring Guide Features Automated Scoring System (Machine Learning)
  • 33. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 E-rater & Research 21 E-rater still an active area of research at ETS
 Design new features; examine their effect on performance, and whether they overlap with existing features. Try more sophisticated machine learning models (higher accuracy worth lower interpretability?) Last year, 10 new e-rater features proposed just for GRE! GRE one of a dozen assessments, e-rater one of many automated scoring engines Research untenable for a large group (>15 scientists) without a standardized pipeline. Scoring Guide Features Automated Scoring System (Machine Learning)
  • 34. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Need an end-to-end machine learning pipeline that can: • Work on (almost) all platforms, • Read features in any tabular format and clean it up, • Efficiently apply filtering, scaling and transformations, • Train any specified model with those features, and • Generate a standardized, detailed report of performance on unseen essays. 22 Ideal research pipeline
  • 35. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessment Machine Learning
  • 36. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessment Machine Learning Python Part 2
  • 37. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 25 Python Pipeline Input Preprocess Model Evaluate Report Input final self-contained report
  • 38. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 26
  • 39. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 26 Model Name (str) Training Features (csv/tsv/xls) Unseen Test Features (csv/tsv/xls) Feature Definitions (json)
  • 40. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 26 Model Name (str) Training Features (csv/tsv/xls) Unseen Test Features (csv/tsv/xls) Feature Definitions (json) 1.Input • Read files into data frames • Check for missing feature columns, exclude others • Filter out non-numeric and blank values • Standardize essay ID and essay score column names pandas
  • 41. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 26 Model Name (str) Training Features (csv/tsv/xls) Unseen Test Features (csv/tsv/xls) Feature Definitions (json) Training Data Frame (Raw) Test Data Frame (Raw) 1.Input • Read files into data frames • Check for missing feature columns, exclude others • Filter out non-numeric and blank values • Standardize essay ID and essay score column names pandas
  • 42. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 27 Model Name (str) Feature Definitions (json) Training Data Frame (Raw) Test Data Frame (Raw)
  • 43. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 27 Model Name (str) Feature Definitions (json) Training Data Frame (Raw) Test Data Frame (Raw) 2.Preprocess • Filter out user-flagged rows, if so specified • Remove feature outliers & “intelligently” apply feature transformations (log, inv, sqrt, etc.), if available • Standardize all features (center and scale) numpy + pandas
  • 44. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 27 Model Name (str) Feature Definitions (json) Training Data Frame (Raw) Test Data Frame (Raw) Training Data Frame (Processed) Test Data Frame (Processed) 2.Preprocess • Filter out user-flagged rows, if so specified • Remove feature outliers & “intelligently” apply feature transformations (log, inv, sqrt, etc.), if available • Standardize all features (center and scale) numpy + pandas
  • 45. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 28 Model Name (str) Feature Definitions (json) Training Data Frame (Processed) Test Data Frame (Processed)
  • 46. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 28 Model Name (str) Feature Definitions (json) Training Data Frame (Processed) Test Data Frame (Processed) 3.Model • Train regression/classification model via SKLL API or R • Grid-search using a task-appropriate objective • Serializes model to disk (using joblib) R + skll
  • 47. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 28 Model Name (str) Feature Definitions (json) Training Data Frame (Processed) Test Data Frame (Processed) 3.Model • Train regression/classification model via SKLL API or R • Grid-search using a task-appropriate objective • Serializes model to disk (using joblib) R + skll SKLL (pronounced “skull”) provides an API and command-line utilities to make it much simpler to run common scikit-learn experiments with pre-generated features. (Presented by @dsblanch at PyData 2013 & 2014) https://github.com/EducationalTestingService/skll
  • 48. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 28 Model Name (str) Feature Definitions (json) Training Data Frame (Processed) Test Data Frame (Processed) 3.Model • Train regression/classification model via SKLL API or R • Grid-search using a task-appropriate objective • Serializes model to disk (using joblib) R + skll
  • 49. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 28 Model Name (str) Feature Definitions (json) Training Data Frame (Processed) Test Data Frame (Processed) 3.Model • Train regression/classification model via SKLL API or R • Grid-search using a task-appropriate objective • Serializes model to disk (using joblib) R + skll Serialized model
  • 50. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 29 Training Data Frame (Processed) Test Data Frame (Processed) Serialized model
  • 51. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 29 Training Data Frame (Processed) Test Data Frame (Processed) Serialized model 4.Evaluate • Use serialized model to compute test set predictions • Trim and re-scale predictions to match training data • Compute a set of standard evaluation metrics by comparing predictions to test set human scores skll + pandas
  • 52. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 29 Training Data Frame (Processed) Test Data Frame (Processed) Serialized model Test Data Predictions Evaluation Statistics 4.Evaluate • Use serialized model to compute test set predictions • Trim and re-scale predictions to match training data • Compute a set of standard evaluation metrics by comparing predictions to test set human scores skll + pandas
  • 53. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 30 Test Data Predictions Evaluation Statistics Training Data Frame (Processed) Test Data Frame (Processed) Serialized model
  • 54. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 30 Test Data Predictions Evaluation Statistics Training Data Frame (Processed) Test Data Frame (Processed) Serialized model 5.Report • Determine what report sections should be included • Merge pre-existing section templates (.ipynb files) • Dynamically Run final .ipynb file (via ExecutePreprocessor and environment variables) • Convert report to HTML using HTMLExporter jupyter + seaborn + pandas
  • 55. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 30 Test Data Predictions Evaluation Statistics Training Data Frame (Processed) Test Data Frame (Processed) Serialized model Final Report (.ipynb) Final Report (html) 5.Report • Determine what report sections should be included • Merge pre-existing section templates (.ipynb files) • Dynamically Run final .ipynb file (via ExecutePreprocessor and environment variables) • Convert report to HTML using HTMLExporter jupyter + seaborn + pandas
  • 56. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Demo 31
  • 57. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Summary • Machine learning in high-stakes educational assessment requires additional number crunching to verify accuracy and interpretability. • Need a pipeline to compare a large number of research experiments using a standardized, easy-to-read report. • The scientific Python stack makes it super easy to implement all stages of the pipeline! • In progress • Release under open-source license (2016 release) • A CherryPy/JS web-app to allow wider reach 32
  • 58. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Questions? 33 https://github.com/EducationalTestingService https://github.com/desilinguist @haikuman