Keynote by Chris Ballard, Data Scientist, Tribal, given at the LACE SoLAR Flare event held at The Open University, Milton Keynes, UK on 9 October 2015. #LACEflare
Keynote by Chris Ballard, Data Scientist, Tribal, given at the LACE SoLAR Flare event held at The Open University, Milton Keynes, UK on 9 October 2015. #LACEflare
Research in to Practice: Building and implementing learning analytics at Tribal
1.
Research into Practice
Building and implementing
learning analytics atTribal
Chris Ballard, Data Scientist
2.
Building and implementing learning analytics
1. Start at the very beginning
2. How research and practice differ
3. Building a learning analytics platform
4. Implementing learning analytics
5. Summary
3.
A leading provider of technology-enabled management solutions
for the international education, learning and training markets
Higher
Education
Vocational
Learning
Schools /
K-12
Research universities
Employment-focused
universities
Government agencies
Further education colleges
Training providers
and employers
Government agencies
Schools
School groups
State and district
government agencies
AboutTribal
3
4.
Objectives
• Predict student academic
performance to optimise success
• Predict students at risk of non-
continuation
• Build on research into link
betweenVLE activity and
academic success
• Scale data processing
• Understand risk factors and
compare to cohorts
3 years of matched student and activity
data used to build predictive models
Staff can use student, engagement and
academic data to understand how they
affect student outcomes.
Information accessible in one place on
easy to understand dashboards.
Integrated withTribal SITS:Vision and
staff e:vision portal
Consultation with academic staff on
presentation and design
Accuracy of module academic
performance predictions* 79%
*Using module academic history and demographic factors
R&D project overview
6.
Current projects
Providing learning analytics for 160,000
students across a state-wide vocational and
further education provider inAustralia
Student Insight being implemented
as part of the JISC UK Effective
Learning Analytics programme
8.
From research to practice
Research
• Domain knowledge
• Interpretation
• In depth understanding
• Testing an approach
Practice
• Integrated into everyday life
• Interpret easily
• Take action
• Implementing an approach
9.
Domain and people
• What is the problem?
• Identify the users and stakeholders
• Data owners
• Are research results sufficient?
• Design
• Project cost
10.
Technical
• Research limitations
• Architecture
• Data munging
• Automating manual processes
• Data suitability
• Robustness of technical platform
11.
Building a learning analytics platform
Key design decisions
1. Transparency – knowing why a student is a risk
2. Flexibility – viewing learning analytics which relates to an
institution's curriculum and organisation
3. Efficiency – ease of use, implementation and interpretation
19.
Reflecting differences between courses
Student activity data is not consistent across all courses/modules
1. Standardise data so it is comparative across all courses and
modules
2. Build different models for each course or module
Need to be careful that you have sufficient data for the model to
generalise to new data.
20.
Provide opportunity for intervention
Identify student at risk
Log intervention details
Assess intervention
effectiveness
Learning analytics
Allocates intervention to
support team
Assign SLA
SLA based alerts
Monitor intervention
progress
Student support teams
21.
Embed into business process
Need to consider how learning analytics becomes embedded into
the day to day working life of academic and support staff.
Notifications – analytics becomes proactive; support different
types of notification
Integration – accessible from existing tools and services through
single sign on
22.
Implementing learning analytics
CRISP-DM – Cross Industry Standard Process for Data Mining
https://the-modeling-agency.com/crisp-dm.pdf
24.
Data understanding
Understanding which features are important
Example: end month of unit for successful and failed units
25.
Data preparation
Creating comparative features
Example:Total proportion of hours worked on failed units
26.
Modelling and evaluation
Understanding whether model is under- or over- fitting
Example: Learning curve for Random Forest model
27.
Evaluation
Define business focused success criteria
Define model focused success criteria
Define what baseline performance is acceptable
Consider a model cost benefit analysis that takes into account
intervention cost
Actual
Withdrawn Enrolled
Predicted
Withdrawn benefit cost
Enrolled cost benefit
28.
Summary
Design
Embed learning analytics into business process
Ensure that analytics can be interpreted easily by staff
Intervention processes that are clearly articulated
Measure intervention effectiveness
Implementation
Use a standard project approach such as CRISP-DM
Evaluate data in the context of the business problem and
process
Define what success means, including acceptable accuracy and
how it needs to be measured
29.
Thank you
Chris Ballard
@chrisaballard
chris.ballard@tribalgroup.com
www.tribalgroup.com
Notes de l'éditeur
Tribal provides management solutions to the international education, training and learning markets.
Our tools allow education providers such as universities, colleges and local government to manage student admin and learning processes such as recruitment, admissions, finance, timetabling and course portfolios.
Student management systems – for example 70% of UK universities use our Higher Education Student Management system called SITS:Vision. Involved in large scale education technology implementations – e.g. we have recently implemented one of our student management systems across all schools and campuses in New South Wales, Australia.
Student Insight has been developed through a close working partnership between the University of Wolverhampton and Tribal. The original objectives of this partnership were to identify how we could build on initial research carried out by the university and build a solution that could enable Wolverhampton to benefit from improved use of student data. The outcome of this partnership is the Student Insight product that has been developed as a configurable solution that can now be adopted by other institutions. Consultation with academic staff at the university has enabled us to design the system in such a way so that it is flexible and can be tailored to meet the unique needs of each institution.
Example of violin practice from my childhood…
What do we mean by practice?
Applying something that has been proven by research to be effective Moving from a research technique to application of that technique Research is focused on testing a technique or proving a theory; practice is focused on applying the results of that research so that we can benefit from it.
To put it into perspective, we can compare research to a spreadsheet and practice to mobile apps. The latter "just works" and we can interpret what it is saying easily, and integrate it within our lives. The former requires in depth understanding, evaluation, interpretation and domain knowledge. We can't just stop and use it at a bus stop. But to go from one to the other takes effort and time to translate it from one area to the other.
Knowing what problem we are using the results of the research to solve. This has to be a real problem for which we have real data (or could collect data). Something that people are willing to use (and in the case of a product, pay for!) Identify the users and what their problems are - we might have a model that can identify student's at risk, but how do staff want to see this presented? Do they want to see individuals, or are they more interested in overall patterns to help future planning? Who are the stakeholders? Different from the users. Data owners. Senior leadership team - input onto university's strategy. Important that their needs are represented. Asking ourselves whether the research results are sufficient in order to be applied in the real world, whether further refinement is necessary, or whether what we have done needs to be put on the shelf. Very often, we will identify further improvements, but these may not necessarily stop us from moving to application. These may come gradual refinements to something that is being used. Design decisions - how does our technique need to change to be used in the real world? User interface; interpreting results of some analysis; involves understanding of objective; Might want to pilot different approaches to see what works - A/B testing commonly employed by Facebook etc. - look this up).
Technical decisions - often something written for research is not suitable for using in the real world. E.g. scalability, limitations of original research in order to keep the research focus narrow, hard coding, handling flexibility Data munging - often data used in research will have been collected from multiple sources and have been manipulated to resolve data quality issues, sampled etc. We will need to identify how to convert the manual approaches which were sufficient during research to automate them. Integration may present a real practical problem - often data owners want to hold onto their data! Data monitoring - a model is only as good as the data you use, therefore we need to be sure that data is loaded into the system correctly and there are no data quality issues.
How did we move from research to a platform that could be used by different institutions? Key design decisions were:
Transparency - ensuring you know why a student is at risk Flexibility - allowing an institution to see analytics in a way that relates to their curriculum and organisation Efficiency - allowing the product to be implemented quickly, reducing implementation cost
Here are some examples of key decisions that we made during the development of the product.
Worked with the University of Wolverhampton to identify the main users. Identified three main classes - Course Director, Module Director, Personal Tutor. Wanted to be able to monitor the students they are responsible for quickly and easily, from an aggregated perspective and down to individuals. Realised that we needed to do this generically as other institutions may have different requirements - different user roles and different institutional structures. Ultimately this changed how we approached the technical design of the system but also what features the system provides. We therefore designed an institution structure that could represent these different structures allowing the platform to adapt to different situations. Security can then be applied to different student groups within that structure. Discuss prediction aggregation.
Seeing an aggregated view of student risk as well as individuals - staff said they wanted to see how much groups of students may be at risk, not just individuals. Predictions are aggregated across the institution structure automatically to provide this information.
Seeing an aggregated view of student risk as well as individuals - staff said they wanted to see how much groups of students may be at risk, not just individuals. Predictions are aggregated across the institution structure automatically to provide this information.
We didn't want our models to be a "black box" where no-one could understand what they do, or why they identify a student at risk. An important consideration when intervening with a student is understanding what the data is saying and why the system has flagged a student. The initial research suggested a way forward and we evaluated a number of ways to solve this problem as part of the early "second phase" of R&D of the product. There are two examples of this:
Ensemble learning - multiple predictions, single overall decision. Influence chart allowing comparisons across different data sources. Helps at an individual student level.
Group influence chart - ability to see what is going on from a more strategic level. What factors influence student outcomes? Helps at a more strategic level when designing intervention measures.
Flexible data - institutions have different characteristics and students with different needs and backgrounds. Therefore it is important that the data you bring into the system reflects the needs of both the institution and your students. Your data requirements may change over time - new data may become available and you may want to test its efficacy in student early warning prediction. During research, data is necessarily hard coded and fixed, and, as a result, any models built from that data are more static and relate to the structure of the data that has been used. We built a flexible modelling approach that allows you to bring any data into the system, and map it and view what that data looks like. You can then test models to verify the usefulness of the data to student early warning prediction.
Configuration by an institution - an institution can configure the application once it has been setup.
Institutions are complex because rather than one overall consistent business process, often different faculties, departments, courses or even modules have different approaches to the delivery of their curriculum. The best example of this is in the use of the VLE for the delivery of course materials - some modules may not make use of the VLE at all, or may have a different approach to how the content is delivered. This will change the VLE usage patterns in the data and this needs to be taken into account when using student activity data sources such as those from the VLE. This represents one of the key areas that needs to be considered when implementing learning analytics - designing an approach that takes into account the different needs of students taking different courses and how that is reflected in the data. A conventional approach is to normalise the raw data to reflect this - ensuring that the predictive features can be compared across different course or modules. An alternative approach is to build separate models for different courses. However an impact of this may be that your training data for an individual course/module becomes too limited to generalise well to new data. Both approaches need to be compared in your context to identify which one works best for you. We are planning on building a tool in Student Insight that allows you to perform this comparison and automatically segment models by course using the institution structure hierarchy.
Intervention - unless the research is focused on evaluating the impact of interventions taken, one of the key areas which separates practice from research is in the action which is taken as a result of an early warning of risk. Staff should be able to determine whether an intervention is necessary based on the analytics and information they can see about the student and their context. They need a process to determine what types of intervention exist, record the intervention and then track the progress the student is making following that measures that have been taken. A key starting point is having clear institutional guidance about the intervention measures that are available, and in what circumstances they should be applied. With Student Insight, we have built in integration with our student support product that allows an intervention to be manually applied and assigned to the correct student support team for action and communication with the student. Institutions differ in their approach to how this is handled and therefore it is important that the workflows can be configured to reflect that.
Notifications - you need to consider how learning analytics can become embedded into the day to day working life of staff. There may be barriers to widespread adoption if staff need to go somewhere to look up information. Although predictive analytics is by nature proactive, if staff need to go and look for early warning indicators, then it will start to be used retrospectively. Notifications are one method to overcome this where staff perhaps receive a communication via email highlighting students who are at risk. They can then take action and arrange meetings directly from this communication or choose to view more information about the student. In addition, which may need to raise the profile of students at risk who have not been dealt with, perhaps highlighting those who have not been reviewed to ensure that they do not slip through the net.
When implementing learning analytics, institutions need to follow a project approach that ensures that key decisions have been taken and the project has been fully evaluated at different steps. This will increase the likelihood of project success and the embedding of learning analytics into the day to day activities of the institution.
When implementing Student Insight with an institution, we use an established data mining project methodology called "CRISP-DM" - Cross Industry Standard Process for Data Mining. CRISP-DM was first conceived in 1996 and a review in 2009 called CRISP-DM the "de facto standard for developing data mining and knowledge discovery projects."
The process model provides an overview of the lifecycle of a data mining project. Here we use the terminology "data mining", but it equally applies well to any analytics project. It breaks the analytics process down into six discrete phases. It is important to note that the order that the phases are approached is not constrained, as the result of each phase will dictate which phase needs to be performed next.
The diagram illustrates analytics as a cyclical process. Lessons learned during deployment and use of a model provide inputs to further iterations of the process and provide inputs to more in depth understanding of the business problem to be solved.
Business understanding
Understand what you are trying to accomplish, what is the problem that we are trying to use analytics to solve? This is a key step because although there are commonalities between how institutions are currently looking to deploy analytics, they are sufficiently different to have different slants on the same problem. For example, reducing student attrition may be a key issue, but there may be differences between the causes, effects and students for whom it is a particular issue. For example, although our objective may be to reduce attrition, it may be better to focus on student success for some cohorts of students, rather than specifically identifying students at risk of dropping out.
If we don't do this step we might identify the wrong objectives, or possibly spend a lot of time analysing data to get the right answers to the wrong problems in the first place.
In addition to identifying the objectives for the project, you need to agree what will be considered success, and therefore establish some success criteria. Such criteria may cover different areas, and relate to tangible improvements against which we want to target the use of analytics. For example in the case of retention, we may want to increase retention by a specified amount. If we decide on criteria such as this, we will need to carefully plan whether we wish to try and attribute any improvement to the implementation of analytics, taking into account other factors.
JISC Discovery Phase accomplishes most of the tasks required in the Business Understanding phase of a project.
Data understanding
Having good quality data available is at the heart of successful implementation of learning analytics. Although there are a large variety of algorithms, having good data with well thought out predictive attributes is the most important step is learning analytics implementation. Indeed, it has been said that 80% of total time spent on an analytics project is not building models, it is working with the data. So this is an important phase. The data understanding phase starts with the identification of potential data sources, collection of that data, assessment of data quality and initial analysis to help with understanding further. It is important that this is carried out in cooperation with data owners across the institution, as well as those who understand the link between a business process and how that process is reflected in the data itself. So, during a Tribal project, we identify the relevant people across the institution who need to be involved in this process.
Data preparation
Once the data understanding phase has been completed, the raw data collected in the data understanding phase will need to be processed and prepared ready for modelling. This may involve resolving data quality issues, creating aggregated summaries or merging multiple data sources together. Both business and data understanding feed into this stage - it may involve transforming the data to reflect the business process we are modelling. A common issue that you will encounter when working with student data is dealing with time dependencies, where data updates made at the time an event occurs affect data that acts as inputs to our model. For example, current modules a student is studying may be automatically recorded as a fail once the student has withdrawn. If we're not careful then failure can become a very good predictor of likelihood of dropout. It may be, but we need to control for these quirks in the way that the data is recorded otherwise the model that we build will not reflect the real world.
Modelling
Once data has been prepared, we're ready to build a model against the data. This will involve selecting appropriate algorithms and comparing the performance of models built using each algorithm. Parameters for each algorithm will need to be chosen so as to optimise the performance of a particular model. One of the most important aspects of this process is identifying whether our model is underfitting or overfitting the data. There can be a number of reasons for this, but one of the most important factors is that more complex models will tend to overfit the data and the converse is true in the case of underfitting. Complexity can come in different forms, for example models with a large number of predictive features in comparison to the number of training examples will tend to be more complex, and thus overfit. In each case, it will mean that our model does not generalise well to instances it has not seen before, and may not give us optimal performance.
When implementing a model in Student Insight at Tribal, one of the main techniques we use are learning and fitting curves which can be used to diagnose whether a model is under or over fitting.
[Illustrate our modelling process]
We have also built in functionality into Student Insight to allow a model to be optimised automatically whilst it is being trained.
Evaluation
In the case of predictive analytics, another type of success criteria may relate to goals as to the accuracy of models which have been built. In this situation, it is not sufficient to arbitrarily choose a baseline accuracy figure, but we need to decide what is going to be a sufficient baseline accuracy for our model. Accuracy of a predictive model is measured in different ways, according to our goal and the nature of the data we have available. We may decide that having a model which makes as few false positive predictions as possible is our goal, at the expense of the overall number of positive predictions made. Conversely, we may decide that we wish to make a large number of positive predictions in order to capture as many students at risk as possible. Choosing the appropriate balance for these figures can be tricky and needs to be based around an assessment of business objectives and intervention cost. Often, a cost benefit analysis can be helpful.
Il semblerait que vous ayez déjà ajouté cette diapositive à .
Créer un clipboard
Vous avez clippé votre première diapositive !
En clippant ainsi les diapos qui vous intéressent, vous pourrez les revoir plus tard. Personnalisez le nom d’un clipboard pour mettre de côté vos diapositives.
Créer un clipboard
Partager ce SlideShare
Vous avez les pubs en horreur?
Obtenez SlideShare sans publicité
Bénéficiez d'un accès à des millions de présentations, documents, e-books, de livres audio, de magazines et bien plus encore, sans la moindre publicité.
Offre spéciale pour les lecteurs de SlideShare
Juste pour vous: Essai GRATUIT de 60 jours dans la plus grande bibliothèque numérique du monde.
La famille SlideShare vient de s'agrandir. Profitez de l'accès à des millions de livres numériques, livres audio, magazines et bien plus encore sur Scribd.
Apparemment, vous utilisez un bloqueur de publicités qui est en cours d'exécution. En ajoutant SlideShare à la liste blanche de votre bloqueur de publicités, vous soutenez notre communauté de créateurs de contenu.
Vous détestez les publicités?
Nous avons mis à jour notre politique de confidentialité.
Nous avons mis à jour notre politique de confidentialité pour nous conformer à l'évolution des réglementations mondiales en matière de confidentialité et pour vous informer de la manière dont nous utilisons vos données de façon limitée.
Vous pouvez consulter les détails ci-dessous. En cliquant sur Accepter, vous acceptez la politique de confidentialité mise à jour.