Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

[243] Deep Learning to help student’s Deep Learning

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
MobyMax Participant Work
MobyMax Participant Work
Chargement dans…3
×

Consultez-les par la suite

1 sur 24 Publicité

Plus De Contenu Connexe

Similaire à [243] Deep Learning to help student’s Deep Learning (20)

Plus par NAVER D2 (20)

Publicité

Plus récents (20)

[243] Deep Learning to help student’s Deep Learning

  1. 1. Deep Learning to help student’s Deep Learning Hak Kim | AI Team
  2. 2. The return of the MOOC ● Education is no longer a one-time event but a lifelong experience ● Working lives are now so lengthy and fast-changing ● People must be able to acquire new skill throughout their careers ● MOOC is broadening access to cutting-edge vocational subjects Backgrounds Hak Kim |
  3. 3. Challenges ● Significant increase in student numbers ● Impracticality to assess each student’s level of engagement by human instructors ● Increasingly fast dev and update cycle of course contents ● Diverse demographics of students in each online classroom In the World of MOOCs Hak Kim |
  4. 4. ● Challenging problem to forecast the future performance of students as they interact with online coursework ● Reliable early-stage prediction of a student’s future performance is critical to facilitate timely (educational) interventions during a course Only a few prior works have explored the problem from a deep learning perspective! Student Perf. Prediction Hak Kim |
  5. 5. GritNet
  6. 6. Graduate (user 1122) ● First views four lecture videos, then gets a quiz correct ● In the subsequent 13 events, watches a series of videos, answers quizzes and submits first project three times and finally get it passed Drop-out (user 2214) ● Views four lectures sparsely and fails the first project three times in a row With this raw student activity data, GritNet can make a prediction as to whether or not student would graduate (or any other student outcomes) Student Event Sequence Hak Kim |
  7. 7. Input ● Feed time-stamped students’ raw event sequence (w/o engineered features) ● Encode them by one-hot encoding of possible event tuple {action, timestamp} - to reserve ordering information and to capture student’s learning speed Output ● Probability of a student performance ● E.g., graduation or any other student outcomes Model ● GMP layer is able to capture most relevant part of the input event sequence and to deal with imbalanced nature of data! GritNet Hak Kim |
  8. 8. Training objective is the negative log likelihood of the observed event sequence of student activities ● Binary cross entropy is minimized using stochastic gradient descent on mini-batches (size of 32) Model optimization ● BLSTM layers of 128 cell dimensions per direction ● Embedding layer dimension grid-searched ● Dropout applied to the BLSTM output ● Note: Baseline model (logistic regression based) was also optimized extensively Trained a different model for different weeks based on student’s week-by-week event records GritNet Training Hak Kim |
  9. 9. On Udacity Data ● ND-A program: 777 graduated out of 1,853 students (41.9%) ● ND-B program: 1,005 graduated from 8,301 students (12.1%) Results ● Compared student graduation prediction accuracy in terms of mean area under the ROC curve (AUC) over 5-fold cross validation ● GritNet consistently outperforms the baseline and provides superior performance by more than 5% abs on both ND-{A,B} dataset ● On ND-B dataset, the baseline model requires a wait of two months to reach the same performance that the GritNet shows within three weeks I.e., improvements of GritNet are substantially pronounced in the first few weeks when accurate predictions are most challenging! Prediction Performance 5.3% abs (7.7% rel) 5% abs Hak Kim |
  10. 10. Intermission GritNet is the current state of the art for student prediction problem ● P1: Does not need any feature engineering (learns from raw input) ● P2: Can operate on any student event data associated with a timestamp (even when highly imbalanced) Hak Kim |
  11. 11. Remaining challenges ● Significant increase in student numbers ● Impracticality to assess each student’s level of engagement by human instructors ● Increasingly fast dev and update cycle of course contents ● Diverse demographics of students in each online classroom In the World of MOOCs Hak Kim |
  12. 12. What’s Next ● So far GritNet trained and tested on the same course data ● Real-time prediction during on-going course (before the course finishes) Hak Kim |
  13. 13. Real-Time Prediction
  14. 14. Observation ● GMP layer output captures generalizable sequential representation of input event sequence ● FC layer learns the course-specific features GritNet can be viewed as a sequence-level feature (or embedding) extractor plus a classifier Intuition Hak Kim |
  15. 15. Unsupervised DA framework ● Trained from previous (source) course but to be deployed on another (target) course ● E.g., v1 → v2, existing → newly-launched ND program Ordinal Input Encoding ● Group (raw) content IDs into subcategories ● Normalize them using natural ordering implicit in the content paths within each category Domain Adaptation Hak Kim |
  16. 16. Domain Adaptation Hak Kim | Unsupervised DA framework ● Trained from previous (source) course but to be deployed on another (target) course ● E.g., v1 → v2, existing → newly-launched ND program Ordinal Input Encoding ● Group (raw) content IDs into subcategories ● Normalize them using natural ordering implicit in the content paths within each category
  17. 17. Pseudo-Labels generation ● No target labels from the target course ● Use a trained GritNet to assign hard one-hot labels (Step 4) Domain Adaptation (cont’d) Hak Kim |
  18. 18. Transfer Learning ● Continue training the last FC layer on the target course while keeping the GritNet core fixed (Step 6 and Step 7) ● The last FC layer limits the number of params to learn Very practical solution for faster adaptation time and applicability for even a smaller size target course! Domain Adaptation (cont’d) Hak Kim |
  19. 19. Udacity Data Setup ● Vanilla baseline: LR trained on source course, evaluated on target course ● GritNet baseline: GritNet trained on source course, evaluated on target course ○ BLSTM layers of 256 cell dimensions per direction and embedding layer of 512 dimension ● Domain adaptation ● Oracle: use oracle target labels instead of pseudo-labels Experimental Setup Hak Kim |
  20. 20. Generalization Performance Hak Kim | 70.60% ARR 58.06% ARR Up to 84.88% ARR at week 3Up to 75.07% ARR at week 3 Results ● AUC Recovery Rate (ARR): AUC reduction that unsupervised domain adaptation provides divided by the absolute reduction that oracle training produces ○ Clear wins of GritNet baseline vs. vanilla baseline ○ Domain adaptation enhances real-time predictions explicitly in the first few weeks!
  21. 21. Summary GritNet is the current state of the art for student prediction problem ● P1: Does not need any feature engineering (learns from raw input) ● P2: Can operate on any student event data associated with a timestamp (even when highly imbalanced) ● P3: Can be transferred to new courses without any (students’ outcome) label! Full papers available on arXiv (GritNet, GritNet2) Hak Kim |
  22. 22. Looking ahead Is a student likely to benefit from (sequential) interventions? Student, X Intervention, T Learning outcomes, Y ? ... ... ... ... Hak Kim |
  23. 23. Wrap Up Questions? hak@udacity.com Hak Kim |
  24. 24. Thank You!

×