Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Big Data in Learning Analytics - Analytics for Everyday Learning

919 vues

Publié le

Keynote at LearnTec 2017

Publié dans : Technologie
  • Identifiez-vous pour voir les commentaires

Big Data in Learning Analytics - Analytics for Everyday Learning

  1. 1. Backup Big Data in Learning Analytics – Analytics for Everyday Learning Stefan Dietze, L3S Research Center, Hannover 24.01.2017 LearnTec 2017, Karlsruhe 23/02/17 1Stefan Dietze
  2. 2. Research areas  Web science, Information Retrieval, Semantic Web, Social Web Analytics, Knowledge Discovery, Human Computation  Interdisciplinary application areas: digital humanities, TEL/education, Web archiving, mobility Some projects L3S Research Center 23/02/17 2Stefan Dietze http://l3s.de/ http://stefandietze.net/
  3. 3. Technology-enhanced Learning / Web-based Learning Big Data in Learning Analytics? A simplistic perspective 23/02/17 3Stefan Dietze Learning Analytics & Educational Data Mining  Application of data mining techniques to understand learning activities and performance  Traditionally confined to dedicated learning environments and platforms (e.g, Moodle)  Examples: JLA special issue on LA Datasets, data ranging between few MB and max. 15 GB  Near complete research corpus: LAK Dataset (http://lak.linkededucation.org)
  4. 4. Learning Analytics & Knowledge Dataset  Cooperation of  Near-complete Linked Data corpus of Learning Analytics research publications (~ 800, seit 2009) Dietze, S., Taibi, D., D’Aquin, M., Facilitating Scientometrics in Learning Analytics and Educational Data Mining - the LAK Dataset, Semantic Web Journal, 2017. 23/02/17 4Stefan Dietze http://lak.linkededucation.org/
  5. 5. Technology-enhanced Learning / Web-based Learning Big Data in Learning Analytics? A simplistic Perspective 23/02/17 5Stefan Dietze Learning Analytics & Educational Data Mining  Application of data mining techniques to understand learning activities and performance  Traditionally confined to dedicated learning environments and platforms (e.g, Moodle)  Examples: JLA special issue on LA Datasets, data ranging between few MB and max. 15 GB  Near complete research corpus: LAK Dataset (http://lak.linkededucation.org)  Broader understanding: informal learning, micro-learning  Research often focused on resources: sharing, reusing, recommendation  Data examples:  „LinkedUp Catalog“: > 50 M resources, 300 M statements  „LRMI/schema.org“: > 45 M quads (Common Crawl 2015) Big Data? – Depends, but mostly not! (Volume?)
  6. 6. LinkedUp Catalog of learning resources Dataset Catalog/Registry http://data.linkededucation.org/linkedup/catalog/  “LinkedUp” (FP7 project): L3S, OU, OKFN, Elsevier, Exact Learning Solutions  Publishing and curation of educational/learning resources according to Linked Data principles  Largest collection of Linked Data about learning resources (approx. 50 datasets, 50 M resources) 23/02/17 6Stefan Dietze
  7. 7. 1 10 100 1000 10000 100000 1000000 10000000 1 51 101 151 201 count(log) PLD (ranked) # entities # statements Learning Resources annotations on the Web?  “Learning Resources Metadata Intiative (LRMI)”: schema.org vocabulary for annotation of learning resources in Web documents (schema.org etc)  Approx. 5000 PLDs in “Common Crawl” (2 bn Web documents)  LRMI-Adaptation on the Web (WDC) [LILE16]:  2015: 44.108.511 quads, 6.243.721 resources  2014: 30.599.024 quads, 4.182.541 resources  2013: 10.636873 quads, 1.461.093 resources 23/02/17 7 Power law distribution across providers 4805 Providers / PLDs Taibi, D., Dietze, S., Towards embedded markup of learning resources on the Web: a quantitative Analysis of LRMI Terms Usage, in Companion Publication of the IW3C2 WWW 2016 Conference, IW3C2 2016, Montreal, Canada, April 11, 2016 Stefan Dietze, Besnik Fetahu, Ujwal Gadiraju http://lrmi.itd.cnr.it/
  8. 8. Technology-enhanced Learning / Web-based Learning Big Data in Learning Analytics? A simplistic Perspective Learning Analytics & Educational Data Mining  Application of data mining techniques to understand learning activities and performance  Traditionally confined to dedicated learning environments and platforms (e.g, Moodle)  Complete research corpus: LAK Dataset (http://lak.linkededucation.org)  Data examples: JLA special issue on LA Datasets, data ranging between few MB and max. 15 GB  Broader understanding: informal learning, micro-learning  Research focused on resources: sharing, reusing, recommendation  Data examples:  „LinkedUp Catalog“: > 50 M resources, 300 M statements  „LRMI/schema.org“: > 45 M quads (Common Crawl 2015) Big Data? – Depends, but mostly not! (Volume?) Big Data? – Depends, but mostly not! (Velocity?) 23/02/17 8Stefan Dietze
  9. 9. 23/02/17 9 (Informal) Learning on the Web ? Stefan Dietze  Anything can be a learning resource  The activity makes the difference (not the resource): i.e. how a resource is being used  Learning Analytics in online/non-learning environments? o Activity streams, o Social graphs (and their evolution), o Behavioural traces (mouse movements, keystrokes) o ...  Research challenges: o How to detect „learning“? o How to detect learning-specific notions such as „competences“, „learning performance“ etc?
  10. 10. 23/02/17 10 „AFEL – Analytics for Everyday (Online) Learning“ Stefan Dietze Examples of AFEL data sources: • Activity streams and behavioral traces • L3S Twitter Crawl: 6 bn tweets • Common Crawl (2015): 2 bn documents • Web Data Commons (2015): 1 TB = 24 bn quads • „German Academic Web“: 6 TB Web crawl (quarterly recrawled) • Wikipedia edit history: 3 M edits/month (engl.) • ....  H2020 project (since 12/2015) aimed at understanding/supporting learning in social Web environments
  11. 11. Big Data Challenges/Tasks in AFEL & beyond: some examples 23/02/17 11Stefan Dietze I Efficient data capture  Crawling & extracting activity data  Crawling, extracting and indexing learning resources (eg Common Crawl) II Efficient data analysis  Understanding learning resources: entity extraction & clustering on large Web crawls of resources  “Search as learning”: detecting learning in heterogeneous search query logs & click streams  Detecting learning activities: detection of learning pattern (eg competent behavior) in absence of learning objectives & assessments (!) o Obtaining performance indicators from behavioral traces? o Quasi experiments in crowdsourcing platforms to obtain training data Gadiraju, U., Demartini, G., Kawase, R., Dietze, S. Human beyond the Machine: Challenges and Opportunities of Microtask Crowdsourcing. In: IEEE Intelligent Systems, Volume 30 Issue 4 – Jul/Aug 2015. Gadiraju, U., Kawase, R., Dietze, S, Demartini, G., Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys. ACM CHI Conference on Human Factors in Computing Systems (CHI2015), April 18-23, Seoul, Korea.
  12. 12. Gadiraju, U., Demartini, G., Kawase, R., Dietze, S. Human beyond the Machine: Challenges and Opportunities of Microtask Crowdsourcing. In: IEEE Intelligent Systems, Volume 30 Issue 4 – Jul/Aug 2015. Gadiraju, U., Kawase, R., Dietze, S, Demartini, G., Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys. ACM CHI Conference on Human Factors in Computing Systems (CHI2015), April 18-23, Seoul, Korea. 23/02/17 12Stefan Dietze Detecting competence in online users? Capturing assessment data: microtasks in Crowdflower  “Content Creation (CC)”: transcription of captchas  “Information Finding (IF)”: middle name of famous persons  1800 assessments: 2 tasks * 3 durations * 3 difficulty levels * 100 users (per assessment) Level 1 „Daniel Craig“ Level 2 „George Lucas“ (profession: Archbishop) Level 3 „Brian Smith“ (profession: Ice Hockey, born: 1972) Behavioral Traces: keystrokes- and mouse movements  timeBeforeInput, timeBeforeClick  tabSwitchFreq  windowToggleFreq  openNewTabFreq  WindowFocusFrequency  totalMouseMovements  scrollUpFreq, scrollDownFreq  ….  Total amount of events: 893.285 (CC Tasks), 736.664 (IF Tasks) Find the middle name of:
  13. 13. 23/02/17 13Stefan Dietze Predicting competence from behavioural traces? Training data  Manual annotation of 1800 assessments  Performance types [CHI15]: o “Competent Worker” , o “Diligent Worker” o “Fast Deceiver” o “Incompetent Worker” o “Rule Breaker” o “Smart Deceiver” o “Sloppy Worker”  Prediction of performance types from behavioral traces? Predicting learner types from behavioral traces  “Random Forest Classifier” (per task)  10-fold cross validation  Prediction performance: Accuracy, F-Measure Results  Longer assessments  more signals  Simpler assessments  more conclusive signals  “Competent Workers” (CW, DW): accuracy of 91% respectively 87%  Most significant features: “TotalTime”, “TippingPoint”, “MouseMovementFrequency”, “WindowFocusFrequency”
  14. 14. 23/02/17 14Stefan Dietze Other features to predict competence in learning/assessments? “Dunning-Kruger Effect”  Incompetence in task/domain reduces capacity to recognice/assess own incompetence Research question  Self-assessment as indicator for competence? Results  Self-assessment as reliable indicator of competence (94% accuracy), superior to mere performance measurement  Tendency to over-estimated own competence increases with increasing difficulty level David Dunning. 2011. The Dunning-Kruger Effect: On Being Ignorant of One’s Own Ignorance. Advances in experimental social psychology 44 (2011), 247. Performance („Accuracy“) of users classified as „competent“
  15. 15. 23/02/17 15Stefan Dietze Summary & outlook  Learning analytics in online & Web-based settings o Detection of learning & learning-related notions in absence of assessment/performance indicators? o Analysis of range of data, including behavioral traces, activity streams, self assessment etc o Actual big data  Positive results from initial models and classifiers  Application of developed models and classifiers in online (learning) environments (e.g. AFEL Projekt) o GNOSS/Didactalia (200.000 users) o LearnWeb o Deutsche Welle online o …
  16. 16. Acknowledgements: Team 23/02/17 16Stefan Dietze  Pavlos Fafalios (L3S)  Besnik Fetahu (L3S)  Ujwal Gadiraju (L3S)  Eelco Herder (L3S)  Ivana Marenzi (L3S)  Ran Yu (L3S)  Pracheta Sahoo (L3S, IIT India)  Bernardo Pereira Nunes (L3S, PUC Rio de Janeiro)  Mathieu d‘Aquin (The Open University, UK)  Davide Taibi (CNR, Italy)  ...
  17. 17. Acknowledgements: Team 23/02/17 17Stefan Dietze  Pavlos Fafalios (L3S)  Besnik Fetahu (L3S)  Ujwal Gadiraju (L3S)  Eelco Herder (L3S)  Ivana Marenzi (L3S)  Ran Yu (L3S)  Pracheta Sahoo (L3S, IIT India)  Bernardo Pereira Nunes (L3S, PUC Rio de Janeiro)  Mathieu d‘Aquin (The Open University, UK)  Davide Taibi (CNR, Italy)  ... ?http://stefandietze.net

×