With student success as a primary institutional goal, NC State’s DELTA organization has taken initial steps in building a scalable analytics infrastructure. This webinar provides insights into the open source frameworks, analytics technology, and strategy required to deploy analytics infrastructure with efficient IT delivery. We will discuss planning an architecture to accommodate large amounts of data, while still providing predictions in short order. We will also touch on some work we are doing building cohorts of sub-populations to improve scalability and accuracy. In addition, we will discuss ongoing and future work to improve the infrastructure even further.
Boost Fertility New Invention Ups Success Rates.pdf
Apereo Webinar: Learning What Works When Scaling Analytics Infrastructure (January 24, 2018)
1. Apereo Webinar: Learning What Works When
Scaling Analytics Infrastructure
LOU HARRISON
DIRECTOR OF EDUCATIONAL TECHNOLOGY SERVICES
DELTA
NORTH CAROLINA STATE UNIVERSITY LOU@NCSU.EDU
GARY GILBERT
SOFTWARE ARCHITECT
UNICON GGILBERT@UNICON.NET
2. ● Brief History: Open Academic Analytics Initiative (OAAI)
● The research
● Flashback to last year
● From Pilot to Enterprise efforts
● Slice and dice, including examples of ways to segment
the population
● Results
● Infrastructure overview
● Next steps / Q&A
If you’d like to follow along: https://goo.gl/g2MTCa
INTRODUCTION/OVERVIEW
3. ● Open Academic Analytics Initiative (OAAI)
○ EDUCAUSE Next Gen Learning Challenge (NGLC)
○ Funded by Bill & Melinda Gates Foundation
● Leverage SIS and LMS data to create an open source
academic early alert system (and interventions)
● OAAI led to the Learning Analytics Processor (LAP)
project, which is part of the Apereo Learning Analytics
Initiative
● Exciting results; however, all LMS data was based on
Sakai Models
● NC State partnered with Unicon and
Marist College to bring LAP to NC State, applying it to
their Moodle LMS
BRIEF HISTORY
4. Predictive Model worked well and was quite portable to other schools (with
some tuning).
For more info, see JAYAPRAKASH, S. M., MOODY, E. W., LAURÍA, E. J.,
REGAN, J. R., & BARON, J. D. (2014). EARLY ALERT OF
ACADEMICALLY AT-RISK STUDENTS: AN OPEN SOURCE ANALYTICS
INITIATIVE.
JOURNAL OF LEARNING ANALYTICS, 1(1), 6-47.
THE RESEARCH
5. ● Our Phase 1 Proof of Concept showed a 75% accuracy in
predicting at-risk students.* Recall rates were 88-90%, but
with high false positives (25%)
● Phase 2 (FY 15-16)
○ Make the LAP more automated, bigger, and badder
○ More Enterprise, more nimble
○ Similar results with much larger datasets
*in a small dataset, of incomplete historical data
FLASHBACK TO FY 16-17
6. Phase 3 work
● Cohorts (different models for different type classes)
○ Maybe, if incremental improvement outweighs cost
○ Tested ways to slice & dice into smaller cohorts to
improve accuracy
■ By LMS usage (no,light,med,heavy)
■ By Enrollment size (small,med,large)
■ By Student Level (FR, SO, JR, SR, GR)
● We learned splitting by courses is better than by people
● Splitting by LMS usage shows real promise
SLICE AND DICE, SEGMENT POPULATION
7. SOME PRELIMINARY RESULTS
Precision
Single Model 0.180994092 18.1%
Low LMS Usage 0.168674699 16.9%
Medium LMS Usage 0.184461986 18.4%
High LMS Usage 0.20375 20.4%
No LMS Usage 0.12540366 12.5%
Recall (“Accuracy for At-Risk Students”)
Single Model 0.639668826 64.0%
Low LMS Usage 0.612326044 61.2%
Medium LMS Usage 0.674772036 67.5%
High LMS Usage 0.75990676 76.0%
No LMS Usage 0.405217391 40.5%
Accuracy
Single Model 0.808493064 80.8%
Low LMS Usage 0.810299003 81.0%
Medium LMS Usage 0.758821249 75.9%
High LMS Usage 0.772434308 77.2%
No LMS Usage 0.863060429 86.3%
Testing Error
Single Model 0.191506936 19.2%
Low LMS Usage 0.189700997 19.0%
Medium LMS Usage 0.241178751 24.1%
High LMS Usage 0.227565692 22.8%
No LMS Usage 0.136939571 13.7%
● Numbers guy added to the team
● Learning how to set up cohorts and run the models
● There is a steep learning curve
8. ● Numbers guy added to the team
● Learning how to set up cohorts and run the models
● There is a steep learning curve
SOME PRELIMINARY RESULTS
Precision
Single Model 0.180994092 18.1%
Low LMS Usage 0.168674699 16.9%
Medium LMS Usage 0.184461986 18.4%
High LMS Usage 0.20375 20.4%
No LMS Usage 0.12540366 12.5%
Recall (“Accuracy for At-Risk Students”)
Single Model 0.639668826 64.0%
Low LMS Usage 0.612326044 61.2%
Medium LMS Usage 0.674772036 67.5%
High LMS Usage 0.75990676 76.0%
No LMS Usage 0.405217391 40.5%
Accuracy
Single Model 0.808493064 80.8%
Low LMS Usage 0.810299003 81.0%
Medium LMS Usage 0.758821249 75.9%
High LMS Usage 0.772434308 77.2%
No LMS Usage 0.863060429 86.3%
Testing Error
Single Model 0.191506936 19.2%
Low LMS Usage 0.189700997 19.0%
Medium LMS Usage 0.241178751 24.1%
High LMS Usage 0.227565692 22.8%
No LMS Usage 0.136939571 13.7%
9. ● Phase 3 - Learning Record Warehouse (LRW)
○ Currently only using Moodle logs (+ demo data)
○ Plans to incorporate data from other tools
■ BB Collaborate, Mediasite, etc.
○ All data input streams feed into LRW
○ Pull from LRW into predictive modeler
■ It's important to note that if we think we may have a need to
use certain data, it’s beneficial to have 3-5 years of historical
data to train from. So, if we think we might use it, we should
save it in the LRW.
● Implement OpenDashboard
○ To expose activity heatmap and possibly predictions
ENTERPRISE EFFORTS
11. Open Analytics Infrastructure
An Open Analytics Infrastructure
should support:
● Collection and Storage of a variety
of data
● Usage of data for analytics,
reporting and visualization
● Interoperability through Open
Standards
● Use of Open Software, Models
and Processes where appropriate
15. OpenLRW
● Supports xAPI, IMS Caliper, and IMS OneRoster
● Java / Spring Boot
○ Heavy use of streams, MapReduce features of Java 8
○ Follows Spring-Boot conventions and best practices
○ LRW is packaged as an executable JAR file
■ Tomcat embedded
● MongoDB
18. OpenLRW: Security
● API Security
○ JWT
● Authorization
○ Tenancy
○ Organization
● Data at Rest
○ Follow MongoDB best practices
19. IMS Caliper / xAPI in OpenLRW
● Caliper Messages are stored ~ as is
● xAPI Messages are converted to Caliper prior to storage
○ Current transformation is based on work done by the
Korean Ministry of Ed
○ More transformation options coming
■ IMS / ADL (this will be the default when available)
20. Other Entities in OpenLRW
● Tenants
● Organizations
● Events
○ xAPI & Caliper
● Supporting Data (OneRoster)
○ Users
○ Classes
○ Enrollments
○ Line Items
○ UserMapping & ClassMapping
21. OpenDashboard
● Originally developed to provide a widget-based framework for visualizations
● Evolved into a faculty / staff facing tool for monitoring student activity
● Java 8 / Spring-Boot
○ Heavy use of streams, MapReduce features of Java 8
○ Follows Spring-Boot conventions and best practices
○ Dashboard is packaged as an executable JAR file
■ Tomcat embedded
23. High Level View
● Ultimately the Dashboard may split
into two separate deployable
components: client and server
24. OpenDashboard: Session Storage
● Sessions stored in MongoDB
● Allows for horizontal scalability
● Essentially stateless client side
25. OpenDashboard For Students
● Dashboard is currently only intended for
faculty/staff
● To allow student access:
○ APIs would need to be apply finer grain authorization
controls
○ UI would need to be adapted for a single user view
26. Data Loader
● How do we get supporting (and maybe event) data into the LRW?
● Java application
● Run as cron job (or similar) daily or even more often
27.
28. ● Phase 4 Needs: FY17-18
● Plan for integrating dashboard
● Start incorporating data from other tools into
LRW
● Possibly add other tool data to predictions
● Start running the modeler regularly
(if we work out a way to share data)
WHAT’S HAPPENING NOW
29. ● Disenfranchised by big, outrageously expensive, commercial black
box analytics systems?
● Can’t afford big, outrageously expensive, commercial black box
analytics systems?
● Overwhelmed by all this analytics talk and complicated math?
● Want to get your feet wet without betting the farm?
● Want to join a group of like-minded schools where every new
development benefits us all?
● This is not free, but your $$$ goes farther, and you benefit from
others’ work
● If you’re interested, contact us
Lou Harrison Gary Gilbert
lou@ncsu.edu ggilbert@unicon.net
WHERE DO YOU FIT IN?
30. About Unicon
TECHNOLOGY CONSULTING, SERVICES, & SUPPORT FOR THE EDUCATION
INDUSTRY
● Services, strategy, and support focused on the education industry
● Deep domain-specific expertise
● Open source software foundations
● Learn more at www.unicon.net
UNICON CONTRIBUTES TO THE APEREO LEARNING ANALYTICS INITIATIVE
● Unicon has been involved since 2015
● Developed standards-based integrations for open analytics technologies
● Provides services for open analytics technologies (OpenLRW,
OpenDashboard, SSP)
● Learn more at www.apereo.org/communities/learning-analytics-initiative