Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
1. Learning Analytics –
Opportunities for ISO/IEC JTC 1/SC36
standardisation
Tore Hoel
Oslo and Akershus University College of Applied Sciences
Norway
ISO/IEC JTC 1/SC36 WG8 meeting, 29 November 2015
Hangzhou, China
3. Characteristics of Educational Big Data
• Grain size of recordable and analysable data has become smaller
– every pen stroke, every keystroke is recorded
• Sources of evidence are (more) varied
– tests, essay scoring, learning games, social interactions, affects,
body sensors, intelligent tutors, simulations, semantic mapping,
LMS data…
– Unstructured (e.g., . log files, clicks, timestamps)
– When structured different schemas are used
• How do we bring these data together to form a overall view of an
individual learner or a cohort of learners?
3
(Cope, B., & Kalantzis, M., 2015)
4. What data practices are emerging?
• Multi-scalar Data Collection
– Embedded, simultaneous collection of data that can be used
for different purposes at different scales
– Semantically legible datapoint (learner-actionable feedback):
«teachable moment»
• Self-describing, structured data ➔ meanings immediately evident
to learners, teachers, others
• Sample size n= all
• Data and interventions are not separate: Recursive micro
intervention ➔ result➔ redesign cycles
• More widely distributed data collection roles
4
(Cope, B., & Kalantzis, M., 2015)
5. Need for new Education Data Standards
supporting Learning Analytics
• Harmonization of Activity Stream Specifications (ADL xAPI, IMS
Caliper, W3C Activity Streams)
• Building Vocabularies – Profiles – Recipes – Communities of
Practice
• Storage designs – centralised data warehouses or distributed
Learning Record/Event Stores
• Extract, Transform and Load (ETL) tools for data storage
• Privacy and Data Protection – how to do Privacy-by-design in this
field?
• Sharing of Algorithms and Predictive Models
5
7. Activity Streams
• Work started around 2009 by a group from IBM, Google,
Microsoft, MySpace, Facebook, VMware a.o.,
• First version published in 2011
• 2014 W3C Social Web Working Group took over the specification
• Working draft version 2.0 published October 2015
7
In its simplest form, an activity consists of an actor, a verb, an an
object, and a target. It tells the story of a person performing an action
on or with an object -- "Geraldine posted a photo to her album" or
"John shared a video". In most cases these components will be explicit,
but they may also be implied.
(Activity Streams Working Group, 2011)
8. Experience API (xAPI)
• 1st version 2013 (component of ADL Training and Learning
Architecture)
• A Statement consists of an <actor (learner)>, a <verb>, an
<object>, with a <result>, in a <context>. There is no constraint on
what these objects should be.
• Learning Record Store: a system that stores learning information
• xAPI is dependent on the presence of LRS to function
• Offered for standardisation in IEEE August 2014 – “it wasn’t the
slam dunk [they were] naively hoping it would be” (Silvers, 2014)
• End of 2015 a new Data Interoperability Standards Consortium
(not-for-profit organization in the State of Pennsylvania, USA) to
be the steward of Experience API
8
9. IMS Caliper Analytics
• White paper 2013
• Public release v 1.0 October 2015
• Information model buried in Sensor APIs
• Metric Profiles
• Base Metric Profile, Session, Annotation, Assignable,
Assessment, Outcome, Reading, Media
• IMS Learning Sensor API: defines basic learning events gathered
as learning metics across learning environments
• Leveraging of IMS LTI/LIS/QTI
9
13. Talking about learning activities
• Looser coupled systems, diverse Communities of Practice lead to
more diverse schemas and data models
• Interoperability could be promoted by more efficient sharing of
vocabularies
• Encourage smaller vocabularies / ontologies
13
IMS Caliper
xAPI Communities
14. How to promote more interoperable
vocabularies for education?
• "Document standards" for vocabularies have severe limitations!
• Communities of Practice (ref xAPI) are part of the solution…
• … but serious stewardship issues
• What could ISO offer in terms of dynamic vocabulary
management?
14
18. MIT Open Personal Data Store / Safe
Answers
• openPDS allows users to
collect, store, and give fine-
grained access to their data in
the cloud.
• openPDS also protects users’
privacy by only sharing
anonymous answers, not raw
data.
• openPDS can also engage in
privacy-preserving group
computations to aggregate
data across users without the
need to share sensitive data
with an intermediate entity. 18
http://openpds.media.mit.edu/#architecture
19. Extract - Transform - Load tools
• When data are coming from different sources in different
structures, one need tools to extract, transform and load data
into data stores
• There are Open Source ( e.g., Pentaho Kettle and Talend), but
most are commercial
software
• Are ETL tools a possible
hot spot for standards
efforts?
19
21. Challenges for standardisation
• Privacy and Data Ownership issues – how to turn these «soft»
requirements into «hard» ones?
• The role of Personal Data Stores in Learning Analytics
• Harmonization of data schemes prior to analysis
• Import / export facilities with ontology building (and automatic
reasoning technologies) as part of the storage solutions
• Publishing and Sharing of data for research and comparison and
testing of predictive models, student models, etc.
21
23. Implications for designs when Surveillance
turns into Sousveillance?
23
Image credit: http://commons.wikimedia.org/wiki/File:SurSousVeillanceByStephanieMannAge6.png
24. When Privacy is affecting all LA processes
• Privacy-By-Design is the
overall design principle. What
does it mean for the LA
processes?
• Data Sharing
• Search
• Storing
• Analysing
• Visualising
24
26. How to support sharing?
• Exemplar predictive models are needed to advance learning
analytics
• Besides a Culture for sharing data, algorithms and predictive
models, what else is needed?
• Parallel data streams from production systems to support
development and research
• How to deal with anonymization?
• How to get data for R&D from cloud-based systems?
• How do we talk about these algorithms and models (create a
vocabulary for tagging)
• Where to host the resources (stewardship, openness policies,
open repositories)
26
27. References
• Cho, Yong-Sang (2015) Quick review xAPI and IMS Caliper -
Principle of both data capturing technologies. Online at
http://www.slideshare.net/zzosang/quick-review-xapi-and-ims-
caliper-principle-of-both-data-capturing-technologies
• Cope, B., & Kalantzis, M. (2015). Sources of Evidence-of-Learning:
Learning and assessment in the era of big data. Open Review of
Educational Research, 2(1)
• Hoel, T. & Chen, W. (2015). Privacy in Learning Analytics –
Implications for System Architecture. In Watanabe, T. and Seta, K.
(Eds.) Proceedings of the 11th International Conference on
Knowledge Management. Online at
http://hoel.nu/publications/Hoel_Chen_ICKM15_final_preprint.p
df
27
Notes de l'éditeur
ETL tools combine three important functions (extract, transform, load) required to get data from one big data environment and put it into another data environment. Traditionally, ETL has been used with batch processing in data warehouse environments. Data warehouses provide business users with a way to consolidate information to analyze and report on data relevant to their business focus. ETL tools are used to transform data into the format required by data warehouses.
The transformation is actually done in an intermediate location before the data is loaded into the data warehouse. Many software vendors, including IBM, Informatica, Pervasive, Talend, and Pentaho, provide ETL software tools.
ETL provides the underlying infrastructure for integration by performing three important functions:
Extract: Read data from the source database.
Transform: Convert the format of the extracted data so that it conforms to the requirements of the target database. Transformation is done by using rules or merging data with other data.
Load: Write data to the target database.
However, ETL is evolving to support integration across much more than traditional data warehouses. ETL can support integration across transactional systems, operational data stores, BI platforms, MDM hubs, the cloud, and Hadoop platforms. ETL software vendors are extending their solutions to provide big data extraction, transformation, and loading between Hadoop and traditional data management platforms.
ETL and software tools for other data integration processes like data cleansing, profiling, and auditing all work on different aspects of the data to ensure that the data will be deemed trustworthy. ETL tools integrate with data quality tools, and many incorporate tools for data cleansing, data mapping, and identifying data lineage. With ETL, you only extract the data you will need for the integration.
ETL tools are needed for the loading and conversion of structured and unstructured data into Hadoop. Advanced ETL tools can read and write multiple files in parallel from and to Hadoop to simplify how data is merged into a common transformation process. Some solutions incorporate libraries of prebuilt ETL transformations for both the transaction and interaction data that run on Hadoop or a traditional grid infrastructure.
Data transformation is the process of changing the format of data so that it can be used by different applications. This may mean a change from the format the data is stored in into the format needed by the application that will use the data. This process also includes mapping instructions so that applications are told how to get the data they need to process.
The process of data transformation is made far more complex because of the staggering growth in the amount of unstructured data. A business application such as a customer relationship management has specific requirements for how data should be stored. The data is likely to be structured in the organized rows and columns of a relational database. Data is semi-structured or unstructured if it does not follow rigid format requirements.
The information contained in an e-mail message is considered unstructured, for example. Some of a company's most important information is in unstructured and semi-structured forms such as documents, e-mail messages, complex messaging formats, customer support interactions, transactions, and information coming from packaged applications like ERP and CRM.
Data transformation tools are not designed to work well with unstructured data. As a result, companies needing to incorporate unstructured information into its business process decision making have been faced with a significant amount of manual coding to accomplish the required data integration.
Given the growth and importance of unstructured data to decision making, ETL solutions from major vendors are beginning to offer standardized approaches to transforming unstructured data so that it can be more easily integrated with operational structured data.