SlideShare une entreprise Scribd logo
1  sur  200
Télécharger pour lire hors ligne
Leveraging “Flat Files”
from the Canvas LMS
Data Portal (at K-State)
SIDLIT 2017 | LMS Preconference
Colleague 2 Colleague
August 2, 2017
Presentation
 A lot of data are created in an LMS instance, and much of this can be
analyzed for insight. In 2016, Instructure, the makers of Canvas, made their
LMS data available to their customers through a data portal (updated
monthly). This portal enables access to a number of flat files related to that
particular instance. This presentation showcases how this big data was
analyzed on a regular laptop with basic office software, to summarize
Kansas State University’s use of the LMS. Methods for analysis include the
following: basic descriptive statistics, survival analysis, computational
linguistic analysis, and others.
2
Presentation (cont.)
 The results are reported out with both numbers and data visualizations,
including classic pie charts, line graphs, bar charts, mixed-charts, word
clouds, and others. The findings provide some insights about how to
approach the data, how to use a data dictionary, and other methods for
extracting the data for awareness and practical decision-making. This
work also is suggestive of next steps for more advanced analysis (using the
flat files in a SQL database).
 More information about this experience may be accessed on SlideShare
through an article download titled “Wrangling Big Data in a Small Tech
Ecosystem” at http://www.slideshare.net/ShalinHaiJew/wrangling-big-data-
in-a-small-tech-ecosystem (orig. from Oct. 2016). The original article
“Wrangling Big Data in a Small Tech Ecosystem” is from C2C Digital
Magazine.
3
Presentation Order
 Canvas LMS at Kansas State University (K-State)
 Canvas LMS Data Portal and Flat Files
 The Summary Data
 Some Practical Applications
 Moving Forward with the Data
4
General Approach
Framework
 Approaches
 An instructional design approach
 What can enhance teaching and
learning?
 A researcher approach
 What can enhance accurate
data collection, usage, researcher
awareness, and decision-making?
 Using all data (every part!)
 Using all basic software tools
available on a regular machine
Data Clients on a
Campus
 Faculty
 Staff
 System Administrators
 Leaders
 Students
 Analysts
5
Canvas LMS at
Kansas State University (K-State)6
LMS History at K-State
 Homegrown Learning Management System (LMS) (Axio Learning)
 Informed by faculty, admin, and staff needs (IT Help Desk tickets, focus groups
with faculty and staff)
 Software updates rolled out annually with some patches in-between
 Built mostly by K-State graduates and professional developers (often hired from
student ranks)
 Instructure’s Canvas LMS at K-State (2013 – present)
 Availability of the data portal in 2016
 Monthly updates of select data from the particular instance
 Accessed at K-State in October 2016
7
An Early Brainstorm
 Brainstorm beneficial questions (data queries) before exploring the data, so
you’re not limited by the found data, and keep these in mind even after
the initial data exploration. It is important to conceptualize what may be
practically helpful through the informed imagination first.
 It would be helpful to continue with the brainstorming as the data are
explored.
8
Initial Brainstormed Questions
 What can be reported out at various levels: university, college,
department, course, and individual?
 Is it possible to make observations about course design? Learner
engagement (Discussions? Conversations?)? Advising? Technology usage
(such as external tools)? Uses of the LMS site for non-course applications?
 What sorts of manual-created courses exist, and how are these used?
What percentage of the courses are these manual types of courses?
9
Initial Brainstormed Questions (cont.)
 How closely is it possible to map the data of a learner’s trajectory? A
group’s trajectory?
 What are some attributes to use to identify various groups? Which attributes
would be helpful? What sorts of group-specific questions may be asked?
 For example, is it possible to identify high-performing groups vs. low-performing
groups in order to run analytics to see what differences there may be between
the two?
 What may be understood about the learning going on in a particular
course? A learning sequence?
 Are there ways to understand effective support for learners and support for
learning from this data?
10
Required Preliminary Understandings
 Need to understand the front-end view of the LMS and its general uses on
campus; otherwise, the back-end data view will be looking through a mirror
darkly
 Need to understand what terms are applied to the various types of data
(because you want to be on the same page with the creators and users of the
LMS)
 Need to have experiences with the various analytical technologies applied to
the particular data because various queries require different data processing
and data structures
 Will be applying the following: descriptive statistics, inferential statistics, direct
data queries, linguistic analysis, survival analysis, sentiment analysis, topic
modeling, and others
 Will ultimately be applying more complex machine learning as well
11
Required Preliminary Understandings
(cont.)
 Need understandings of “states” of being for various objects in an LMS
 Need ability to identify anomalies and the skills to interpret what these
might mean
 Need to know what data mean and where to dig deeper for more relevant
information
 Need to know where noise might enter a particular dataset or an analytical
process…and to head off the introduction of or inclusion of noise
12
Canvas LMS Data Portal and “Flat
Files”13
Canvas Data Portal
 Data updated once a month (then, now, daily)
 Live dynamic data may be accessed via a higher level of service
 Flat files (in compressed .gz format for download with 7Zip) downloaded
from SQL servers
 Also known as table data (albeit without defined structural relationships between
records and therefore “flat”)
 May contain labeled data like numbers
 May contain unstructured or semi-structured data like texts, names, messages, and
others
 Contain content data (messaging), trace data (interaction data), and some
metadata (data about data, often riding on imagery and multimedia)
 Data described in a formal data dictionary
14
“Flat Files” Strengths and Weaknesses
Strengths
 Manageable on a small-scale
laptop
 Can ask questions across several
flat files
Weaknesses
 Lack relational data between the
various flat files
 Cannot query data effectively
across the various data tables
(because the relationships are not
defined)
 Lack access to identifier column
 Lack access to the foreign key
15
Data Dictionary
 A reference resource that describes particular data
 Documentation of data captured in the Canvas Data warehouse
 Helpful for understanding naming protocols of the various data types
 The following is a verbatim example:
16
Name Type Description
assignment_id bigint (big integer) Foreign key to the
assignment the
override is associated
with. May be empty.
Data Dictionary 1.15.0
Facts
 assignment_fact,
assignment_group_fact,
assignment_override_fact,
assignment_override_user_fact,
assignment_override_user_rollup_fa
ct, communication_channel_fact,
conversation_message_participant_
fact,
course_ui_navigation_item_fact,
discussion_entry_fact,
discussion_topic_fact,
enrollment_fact,
external_tool_activation_fact,
 file_fact, grading_period_fact,
group_fact,
group_membership_fact,
module_completion_requirement
_fact, module_fact,
module_item_fact,
module_prerequisite_fact,
module_progression_completion_r
equirement_fact,
module_progression_fact,
pseudonym_fact, quiz_fact,
17
Data Dictionary 1.15.0
Facts
 quiz_question_answer_fact,
quiz_question_fact,
quiz_question_group_fact,
quiz_submission_fact,
quiz_submission_historical_fact,
score_fact,
submission_comment_fact,
submission_comment_participant
_fact, submission_fact, wiki_fact,
wiki_page_fact
18
Data Dictionary 1.15.0
Dimension
 account_dim, assignment_dim,
assignment_group_dim,
assignment_group_rule_dim,
assignment_override_dim,
assignment_override_user_dim,
assignment_rule_dim,
communication_channel_dim,
conversation_dim,
conversation_message_dim,
course_dim, course_section_dim,
course_ui_canvas_navigation_dim
, course_ui_navigation_item_dim,
 discussion_entry_dim,
discussion_topic_dim,
enrollment_dim,
enrollment_rollup_dim,
enrollment_term_dim,
external_tool_activation_dim,
file_dim, grading_period_dim,
grading_period_group_dim,
group_dim,
group_membership_dim,
module_completion_requirement
_dim, module_dim,
19
Data Dictionary 1.15.0
Dimension
 module_item_dim,
module_prerequisite_dim,
module_progression_completion_r
equirement_dim,
module_progression_dim,
pseudonym_dim, quiz_dim,
quiz_question_answer_dim,
quiz_question_dim,
quiz_question_group_dim,
quiz_submission_dim,
quiz_submission_historical_dim,
 role_dim, score_dim,
submission_comment_dim,
submission_comment_participant
_dim, submission_dim, user_dim,
wiki_dim, wiki_page_dim
20
Data Dictionary 1.15.0
Both
 requests
21
The Summary Data
at instance level
22
Order: First Data Visualizations and
Then Light Text Commentary
 The data visualizations come first…so that the audience may analyze the
data to see what it says
 The summary analyses come directly after the visualization, so there is a
kind of debriefing
23
24
Purposeful Blur and Block
 Need to know how to protect against data leakage
 Never share the underlying dataset
 Never share unique identifiers
 Always double check screen grabs against accidental inclusion of personally
identifiable data (PII); use effective redaction if PII is viewable
 When redacting, make sure that the redaction cannot be reversed (backwards iterated
or some other strategy) and a person re-identified
 Check that no metadata is riding with multimedia being released
 Any personally identifiable information (PII) is obfuscated here
 No granular level of data was captured in the article
25
26
Workflow
1. Conceptualizing questions and applications of the data
2. Review of the dataset information
3. Data download
4. Data extraction
5. Data processing (cleaning) and analytics
6. Validating / invalidating the findings
7. Additional data analytics
8. Write-up for presentation
9. Data and informational materials archival
27
1. About Courses28
29
Course Visibility
 A majority of courses are not visible
30
31
Course Workflow States
 Claimed
 Available
 Deleted
 Completed
 Created
32
2. About Course Sections33
34
Life Cycle State for Course Section
 A majority active
 A minority deleted
35
36
Date Restriction Accesses for Course
Sections
 Non-defined (default) as the majority
 Restricted section access (by learner name) to defined dates
 Non-restricted (all participants in the course welcome) section access to
defined dates
37
38
Ability to Self-Enroll in a Section or Not
 Undefined (default)
 Can manually self-enroll
 Must be assigned to a section
39
3a. About Assignments40
41
Types of Assignments
 None
 Online_quiz
 Online_upload
 Assignment
 On_paper, and others
42
43
Time Features for Assignments
 Half of assignments with no time allotment
 Other half with time features
 Due_at, no unlock_at, no look_at
 Due_at, lock_at, unlock_at (all three)
44
45
Main Themes Auto-Identified in
Assignment Names
 Assignment
 Discussion
 Work
 Quiz
 Participation
 Final
 Class
 Presentation
 Exam
 Attendance
 Homework
 Chapter
 Questions
 Activity
46
47
Some Linguistic Features of the
Assignment Titles and Descriptions
 Analytic: 91.69
 “Formal, logical, and hierarchical thinking” vs. “more informal, personal, here-and-
now, and narrative thinking”
 Clout: 73.25
 “perspective of high expertise” and confidence vs. “more tentative, humble, even
anxious style”
 Authentic: 11.83
 “more honest, personal, and disclosing text” vs. “a more guarded, distanced form of
discourse”
 Tone: 64.98
 “a more positive, upbeat style” vs. “greater anxiety, sadness, or hostility” (emotional
tone) (“Linguistic Inquiry and Word Count: LIWC2015 Operator’s Manual,” 2015, p. 22)
48
49
Delving into Topics of Interest
 Identifying words (names, formulas, dates, symbols, etc.)-of-interest
 Using NVivo 11 Plus to create word trees with the target term as the seeding
topic
 Ability to double-click on the respective branches to link back to the
original source data files
50
51
Unmuted or Muted Assignments
 A majority unmuted assignments
 A minority muted assignments
52
53
Assignment Workflow States
 A majority published (77%)
 A smaller amount deleted (15%)
 The smallest amount unpublished (8%)
54
55
Survival Function of Assignments to
Update
 How long does it take before an assignment is updated?
 At what point does an assignment seem to be “safe” against update?
 What are some ways to understand assignments that are updated some
1,000 days after the date of creation?
 Is it possible that some assignments were transferred over from a prior LMS
through an LTI-enabled process that might have captured the very first moment
of creation for that assignment? (“LTI” refers to the Learning Tools Interoperability
standard created by the IMS Global Learning Consortium.)
56
3b. About Submitted Assignments57
58
Grades Submittal Counts for
Completed Assignments
 A slice-in-time view
 Roughly two-thirds graded
 Roughly one-third not graded
59
4. About Quizzes60
61
A Survey of Quiz Types
 Assignment
 Practice quiz
 Graded survey
 Survey
 Affordances of the various quiz types change over time, so it is important to
update on the various functions and capabilities even as one is looking at
the data.
62
63
Quiz Question Types in the LMS
Instance
 multiple_choice_questions
 true_false_questions
 essay_question
 multiple_answers_question
 short_answer_question, and others (in descending order)
64
65
Quiz Question Workflow States
 unpublished (default)
 published
 deleted
 So a majority of quiz questions are created / drafted but held in reserve
and not published.
 What are some possible inferences that can be made from the instance-
scale statistics and numbers?
66
67
An Inclusive Scatterplot of Quiz Point
Values
 min-max range: 0 – 23,700 points per quiz
 average quiz value: 33 points (w/o zeroes average in) and 28 points (with
zeroes averaged in)
 The 23,700 occurred twice, which suggests that it might be purposeful. That
huge number, though, pulls the curve, and in a normal research context,
such an outlier would likely be omitted to erase its pull on the curve, which
would result in skew. A zoom-in would require going to the particular
instructor and course. That might require a different approach to the data
than described in this work…such as re-animating all the flat files in a SQL
database and using unique identifiers to connect related data.
68
69
Histogram of Quiz Point Values in LMS
Instance (with a normal curve)
 Frequency of point values for quizzes
 Tendencies
 Most at the lower number values
70
71
Survival Curve of Deleted Quizzes in
LMS Instance
 Based on timestamp data, how long does it take for a deleted quiz to
achieve “event” or be deleted (from its moment of creation)?
 In this dataset, 22% of quizzes were deleted (14,769/66,366).
 The min-max day range for the quiz deletions ranged from 0 - 813 days.
 A survival analysis showed that the estimated survival time of quizzes that
were deleted were 23.6 days, with a lower bound of 22.7 and an upper
bound of 24.4 in the 95% confidence interval; the standard error was .419.
 The median survival time--of the deleted quizzes--was a low 2 days, which
means if a quiz is to be deleted, it usually happens fairly early.
 The drop-off in the curve below is steep but tapers off after about several
months.
72
73
One Minus Survival Function Curve for
Deleted Quizzes in the LMS Instance
 Shows how long a quiz survives before it is deleted from a set of quizzes that
were ultimately deleted
74
75
Hazard Function for Deleted Quizzes in
the LMS Instance
 All quizzes in the set were ultimately deleted
 This linegraph shows time-to-event of when quizzes were deleted from their
respective creation-dates in the LMS instance.
 All quizzes listed here ultimately were deleted.
 The hazard function curve sometimes shows particular time-patterns of
when a quiz is most at risk of deletion…but this curve only generally shows a
steep rise initially and then a gradual achievement of time-to-event.
76
5. About Discussion Boards77
78
Types of Discussion Boards:
Announcement vs. Default
 default (66%)
 announcement (34%)
79
80
Workflow States of Discussion Boards
 Undefined
 Active
 Deleted
 Unpublished
81
82
Active vs. Deleted Discussion Board
Entries (Replies)
 Active discussion board entries
 Deleted discussion board entries
83
6. About Learner Submitted Files84
85
Handling of Learner Submissions in the
LMS Instance
 human_graded
 not_graded
 auto_graded
86
87
Some Common Words from Comments
Made on Submissions
 Lots of encouraging words in comments made on submissions
88
89
Submission Comment Participation
Type
 Admin
 Submitter
 Author
 So administrators all comment on learner submissions, but not all authors or
submitters comment. In other words, the creator of contents may submit
the file without comment.
90
7. About Uploaded Files91
92
Uploads and Revisions of Files to the
LMS Instance by Year
 A sense of the university’s transition to the LMS, over multiple years (so
caution)
93
94
Observed Uploaded File Types
 .docx
 .pdf
 .jpg
 .png
 .pptx
 .xlsx
 .ppt
 .zip
 .dat
 .xl
 .mp4
 .html
 .txt
 .mp3
 .sdl
 .csv
 .rtf
 .css
 .m4v, and others
95
96
Word Cloud of File Contents (from the
Descriptions of File Contents)
 What do the words say about what people have uploaded to the LMS
system?
97
98
High Frequency Word Counts in the File
Names Set (as onegrams)
 Final
 Paper
 Lab
 2015
 Chapter
 2016
 Lesson
 2014
 Exam
 Reflection
 Project
 Syllabus
 Report
 Review
 Lecture
 Profile
 Study
 Week
 Analysis
 Essay, and others
99
8. About the Wikis and Wiki Pages100
Wikis and Wiki Pages
 A “wiki” in Canvas is a page with its history captured and able to be
reinstituted (enabled by wiki software)
 Pages may be interconnected
 A page may be set as the home page
 A page may be embedded in a modular sequence
 A page may contain the MediaSite video
 A page may contain any number of contents: imagery, iframes, videos,
and other contents
101
102
Parent Types for Wiki Pages in the LMS
Instance
 Course
 Group
 In other words, the administrators (instructors) of courses are the ones who
create a majority of the pages. The learners in groups create fewer of the
wiki pages.
 Note that the sense of a “wiki” page is different here.
103
104
Wiki Page Workflow
 Null (default)
 Active
 Unpublished
 Deleted
 This needs more insight, but the data dictionary does not explain the
different states and what they mean. For example, is a “null” wiki page
published? Is an “active” wiki page something that is included in a
sequence? Is a “deleted” wiki page recoverable or not?
105
106
Word Frequency Word Cloud from Wiki
Page Titles
 Focuses on introductions, projects, research, teams, and others
107
9. About Enrollment Role Types108
About Enrollment Role Types
Role Name Basic Role Type
Librarian TAEnrollment
StudentEnrollment StudentEnrollment
TeacherEnrollment TeacherEnrollment
TAEnrollment TAEnrollment
DesignerEnrollment DesignerEnrollment
ObserverEnrollment ObserverEnrollment
Grader TAEnrollment
GradeObserver TAEnrollment
109
University-Defined Roles and
Capabilities
 Some unique roles
 Some shared roles
110
111
Frequencies of Enrollment Roles
 StudentEnrollment
 TeacherEnrollment
 StudentViewEnrollment
 TAEnrollment
 ObserverEnrollment
 DesignerEnrollment (in descending order)
112
113
Top Dozen Computer System Configurations
for Accessing LMS Instance
 …and others
114
115
Request Types in the LMS Instance
 GET (Read)
 POST (Create)
 PUT (Create)
 HEAD (Retrieve Resource)
 DELETE (Remove)
 PATCH (Update, Modify)
116
10. About Groups117
118
Group Names Frequency Word Cloud
 Clone
 Teaching
 Design
 Clinical
 Plan
 Final
 Class
 Learning
 Ventilation, and others
119
120
Moderator Status of Learners in Groups
 not_moderator
 is_moderator
121
122
Learner Membership Status in Groups
 Accepted
 Deleted
 No invited
 No requested
123
11. About Users and Workflow
States124
125
User “Workflow” States in the LMS
Instance
 registered
 pre_registered
 deleted
 creation_pending
 The “creation_pending” may well refer to a process of approval for people
to have access—for a level of security.
126
127
Years of Origination of User Accounts
 Initial exploration in 2013
 Big push in 2014
 New accounts in 2015 and 2016 indicating not only students but also
employment churn and stragglers slow to change to a new LMS
128
129
Retired Accounts = Registered False
 2013 – early May 2017
 Word frequency count from unigrams (so no full names represented as
such)
 First names more common and so better represented
 One number removed in the “stopwords” list
130
131
Pseudonyms
 Pseudonyms = “logins associated with users”
 Seems to be the connection between the LMS and various university
information systems
 Seems like partial data (extracted in May 2017)
132
133
Current “States” of Pseudonyms
 A majority of pseudonyms “active” vs. “deleted”
134
12. About Course Level Grades
(based on Enrollments)135
136
Numbers of Attempts for Latest
Submitted Assignments
 Null (no scores)
 One
 Two
 Three
 Four, etc. (in descending order)
137
13. About Conversations (Emails)138
139
Conversations with Media Objects
Included
 False
 True
 So when people use the email system inside Canvas, they do not generally
attach media objects (like digital imagery, slideshows, audio, video, or
other digital files).
140
141
Conversations w/ or without
Attachments
 A majority of conversations are without attachments
 A minority of conversations are with attachments
142
143
Origins of Conversations / Messages
 Human-generated conversations (the overwhelming majority)
 System-generated messages
144
145
Conversation Messages Word
Frequency Count
 482,339 conversation messages
 Texts with 60,509,894 words
 2/3 analyzed for textual contents (because of data size)
146
147
Mass Conversation Message Contents
 Analytic: 82.33
 “Formal, logical, and hierarchical thinking” vs. “more informal, personal, here-and-
now, and narrative thinking”
 Clout: 80.21
 “perspective of high expertise” and confidence vs. “more tentative, humble, even
anxious style”
 Authentic: 26.41
 “more honest, personal, and disclosing text” vs. “a more guarded, distanced form of
discourse”
 Tone: 66.24
 “a more positive, upbeat style” vs. “greater anxiety, sadness, or hostility” (emotional
tone) (“Linguistic Inquiry and Word Count: LIWC2015 Operator’s Manual,” 2015, p. 22)
148
149
Messaging about “Human Drives” in
the Mass Conversation Messages
 Affiliation (2.35)
 Power (2.19)
 Achievement (1.46)
 Reward (1.3)
 Risk (0.37)
 “The focus on affiliation and social identity seems reasonable, given the
typical college age of learners. The "power" language may come from
faculty speaking from positions of authority. The low level of focus on risk is
intriguing here (maybe young learners are not thought to have developed
the efficacy and confidence to take on uncontrolled risks?). Clearly, there
is a role for theorizing and interpretation, even with computation-based
analytics.”
150
151
Sentiment Analysis of Sample of
Conversation Messaging
 A smaller sample of the conversation messages were analyzed for
sentiment. This set consisted of 72,377 messages.
 The automated observations of sentiment showed that there were two
tendencies...either very positive or moderately negative (in terms of text
categories).
 In this software tool, it is possible to explore which texts were categorized to
which categories of sentiment (very negative, moderately negative,
moderately positive, or very positive) in the comparisons between the
target text and the built-in sentiment dictionary.
 In other words, the actual exploration of the content is possible through both
machine reading and human close reading.
152
153
Auto-Extracted Theme Based Hierarchy
Chart of Conversation Messaging Sample
(as a Treemap)
 Class
 Assignment
 Time
 Paper
 Questions
 Exam
 Online
 Group, etc.
154
155
Auto-extracted Themes from
Conversation Messaging Sample
 These are in alphabetical order
 The themes are listed in a human-readable way going clockwise around
the pie (in a pie chart)
156
157
Auto-Coded Theme-Based Hierarchy Chart of
Topics and Subtopics from Conversation
Messaging Sample (as a Sunburst Diagram)
 This sunburst diagram—in the software—is somewhat interactive
 This enables digging down into a Topic by double-clicking on it and seeing
the subtopic contents there
 If the sliver is too thin, a mouse hovering will result in the actual subtopic
and the statistics and quant data available for viewing
158
159
Contexts of “Help” in a Word Tree
 It is possible to analyze the various contexts in which “help” was used in the
conversation messaging in the prior word tree
 In the software (NVivo 11 Plus), the word tree is interactive and is linked to
the original sources where the word appears, so it is possible to achieve
close reading of every use of “help” from the underlying dataset
 The challenge is engaging a full dataset of millions of words
160
14. About Third-Party External Tool
Activations on the LMS Instance161
162
Numbers of External Tool Activations on
the LMS Instance
 External tool activations
 Unique tool activations
163
164
Named External Tool Activations in the
K-State Canvas Instance
 YouTube
 Ted Ed
 DropBox (with name variations)
 Vimeo
 Quizlet (with name variations)
 MyOMLab
 Khan Academy
 Twitter
 Flat World Knowledge
 SlideShare
 Yellowdig
 SoftChalk Cloud (with name
variations)
 MyLab and Mastering
 Educreations
 Funbrain
 Wikipedia, and others…
165
166
External Tool Activations in 2013
(in alphabetical order)
 Attendance Tool
 Chat
 CodeAcademy
 Dropbox
 Flat World Knowledge
 Flickr Search
 Graph Builder
 Khan Academy
 Learn LTI
 McGraw-Hill Campus
 Public Collections
 SlideShare
 SoftChalk Cloud
 SoftChalk Cloud App
 Ted Ed
 Twitter
 Vimeo
 YouTube
 Zoom
167
168
External Tool Activations in 2014
 There is an increase in both variety and number of external tool activations
 No deeper analysis was applied, but it could be…as to the external tool types
and the changing senses of needs
169
170
External Tool Activations in 2015
 There is an increase in both variety and number of external tool activations
 No deeper analysis was applied, but it could be…as to the external tool types
and the changing senses of needs
171
172
External Tool Activations in 2016
 There is an increase in both variety and number of external tool activations
 No deeper analysis was applied, but it could be…as to the external tool types
and the changing senses of needs
173
15. About Course User Interface
(UI) Navigation Item States174
175
Course User Interface Navigation Item
State
 Visible
 Hidden
 This refers to user capabilities of enabling the pre-set functions in the left
navigation of a course shell remain active or be placed in “hidden.”
 There are “hidden” navigation element presets as well, which users may
choose to activate.
176
Enablements and Limits
re: the LMS Data Portal Data177
178
Delimiting the Analytics from the LMS
Data Portal Data
 The concept behind delimiting is to make conclusions more accurate by
representing how confident one may be about the results.
 As noted, there may be challenges and noise in the data from any step in
the workflow…but there are inherent limits also to the various data analytics
types—as shown in the visualization in the prior slide.
179
Some Practical Applications180
Some Practical Applications
 Self awareness (holding up a mirror to the campus for its use of its LMS)
 Analytics
 To improve usage of the LMS
 To know what functions and features are desirable
 To support learner usage
 To support teaching and learning
 To support non-teaching and learning approaches to the data
 Decision-making
 Instructional design
 Administrative awareness, decision-making, funding, and others
181
Moving Forward with the Data182
What are Ways to Go Beyond?
Other Analytical Methods
 Reconnecting the flat files as
relational files in SQL server
 Design of specific cross-file queries
for data analytics
 Applying more and varied
computational text analysis
 Engaging machine learning for
patterns (such as decision trees
for predictivity of classifications
based on available information)
Bringing in More Data
 Comparing macro-level data with
other instances of the Canvas LMS
(such as with comparable
institutions of higher education)
 Using additional data to enable
close-in reads (but without
compromising people’s privacy)
 Keep confidential information
confidential
183
Some Early Lessons Learned184
Assessing the Initial Haul of Biggish Data
 Formulating askable questions
 Analyzing the columnar data (and variables)
 Understanding where the data comes from and how it is processed by Instructure
 Analyzing the date data
 Analyzing the textual data
 Understanding ways to mix data in various datasets for enriched querying
 Conceptualizing mixes of questions and potential findings based on the
available data
185
Assessing the Initial Haul of Biggish Data
(cont.)
 Understanding the types of software that may be used to engage the data
 Software enables cross-sectional base rate counts from flat files
 Software enables cross-tabulation analysis and assessments of statistical
significance (rarity of patterns)
 Software enables finding patterns through machine learning (like applying
decision trees to see what variables help determine classifications)
 Software enables the identification of text-based patterns
186
Some Early Lessons Learned
 Data visualizations are only summary data, and it’s important to get to the
actual underlying data to understand some dynamics.
 It helps to theorize or hypothesize broadly to understand what may be
going on with the observed empirical data.
 It is always wise to “sanity check” data extractions and data processing to
see what is going on.
 It is important to understand the LMS data portal’s default settings and the
rationales behind those defaults to make sure that they make sense for the
particular context.
187
Some Early Lessons Learned (cont.)
 Avoid double-counting for complex data with similar lead-in terms.
 Watch out to not type incorrectly.
 Do not ignore error messages; figure out why they’re happening and deal
with the issues.
 Slow down the process, so you’re certain of what is happening at every
step. Be careful not to lose data.
 Be careful about going to Excel, which has 1.05 million rows of data limits.
Be careful also of OS clipboards, which have 65,000 record limits. Do not let
such limits stall the work and result in lost data. Go to MS Access first or SQL
server.
188
Some Early Lessons Learned (cont.)
 Use the LMS data portal “data dictionary” for the LMS data, but realize that
it may be dated or incomplete or inaccurate. A particular instance of an
LMS will be particular, so a general dictionary offers a general view, not a
specific one. Use the data dictionary in an attentive way.
 Realize that there are nuances in the data that may not be apparent initially.
 With computational text analysis, oftentimes, foreign languages will get
short shrift. There may be effective ways to address this.
 With any sort of automation, there will be trade-offs. It is important to check
findings against the data and conduct data queries on multiple software
tools.
189
Some Early Lessons Learned (cont.)
 Data is messy. It is totally possible (even probable) to have a process going
smoothly when something has glitch-ed with a data download.
 No matter what, it is not possible to import the data for processing into either
Microsoft Access or SQL. In that case, there may need to be a data
“substitution” by extracting the “same-ish” set from the LMS data portal (days
later from when the first set was extracted).
 The assumption is that new data is incremented on the end of the existing data,
so if the file is the proper one, a “later” version still should be accurate.
Depending on the data handling, though, that assumption may not be true. It
will be important to check.
190
Some Early Lessons Learned (cont.)
 Don’t just go with how software is designed. For example, with a word
frequency count, don’t just go with the high counts, but analyze the “long
tail” of the low counts.
 The “power law” does often apply to word counts in language. The long tail
shows something of outlier data in terms of single mentions (but you have to slog
through misspellings, strange alphanumeric strings, and other noise first).
 There are certain data visualizations that work better for certain types of
data.
 All data visualizations should be sufficiently labeled.
 It helps to calculate not only raw numbers but percentages, where possible.
191
Some Early Lessons Learned (cont.)
 Data portals contain personally identifiable information (PII), so extra care
has to be taken to ensure that people’s private information is not misused
nor leaked.
 What is knowable depends on what other datasets one has access to and
how one sets up the analyses…
 It helps to know what is possible to know from the data (full universe)
 It helps to know what is politically viable to ask and capture (subset) (people
may ask for the moon)
 It helps to use resources wisely to pursue asks that create constructive awareness
and good decision-making (sub-subset)
 Recording steps is important (in notes and in macros)…so everything can
be repeated as needed.
192
To a Relational Database
 So…Flat files are downloaded as compressed .gz files, opened with 7Zip as
.csv files.
 Microsoft offers SQL Server Express as a free tool but limits to one CPU (up to
4 cores), 1 GB RAM, and database size limits to 10 GB (“Limitations of SQL
Server Express”).
 Set this up on a dedicated machine, so the setup does not disrupt other work.
 In shifting to SQL Server Express, the flat files have to be properly processed
for the data to move without lossiness or other problems.
 It may help to process the data first in MS Access (as long as the flat file data is
not too large to handle in Access). Treat text columns as “Long Text,” not “Short
Text.” Label Date fields not as text but “Date with Time.” The idea is to have the
proper settings for appropriate receipt in SQL.
193
To a Relational Database (cont.)
 Then, export the object from Access to Excel 2016 with the formatting and proper
data structure.
 If the records have > 65,000 records, then MS Access is unable to export the data
table.
194
To a Relational Database (cont.)
 One option is to split the dataset in
Access (Highlight the table -> go to
Database Tools tab -> click Access
Database -> Split database.) The
problem with this is that a dataset will
have to be split quite a few times to
get to the low 65,000 records, and then
after ingestion into SQL, any repeat
data will have to be deleted. This path
is too onerous to be helpful, especially
with LMS data portal data which can
easily go into the millions and millions
of rows.
 A more direct option follows on the next
slide.
195
To a Relational Database (cont.)
 When files are too large (anything over the 65,000 records that will fit in a
clipboard), then it makes better sense to just clean data on export in SQL. The
sequence goes like this: .gz -> .csv (using 7Zip) -> open SQL Management Studio
-> import data (change “DT_String” columns to “DT_Text” (for a “text stream”), so
there is not a 50 character constraint on the columns), and the data import
generally goes well. (This solution takes up more computer memory and is
inelegant, but it solves the many issues that would crop up otherwise with a
straight import without the data label adjustments.)
 There is no import of column names in the first row.
 In SQL Server Management Studio 17, go to Databases -> System
Databases -> “master” database (right-click) -> Tasks -> Import Data … and
specify that the original source is from Microsoft Excel. The flat files are now
database objects (dbos) in the master database. Do keep the original file
names, for ease-of-reference.
196
To a Relational Database (cont.)
 Re-indexing needed?
 If so, the foreign keys may have to be reconnected to the correct primary
keys for the relating in a relational database to make sense and for SQL
queries across the files to make sense.
 Foreign keys point to primary keys in another table; they are unique identifiers
that connect related data between tables.
 Primary keys are unique identifiers (and “reserved” against reuse in that sense),
and they indicate unique records in data tables (and databases).
 If not, it may be possible to run SQL queries by loading the tables with
primary keys first and those with referring foreign keys second…but I am not
there yet. Working on it.
197
To a Relational Database (cont.)
 Proceed with a good basic text on SQL server. Give it a good read-through
before actually going too far into a project. (Experimentation is always
good, but time wastage—not so much.)
 If local support with a database administrator (DBA) is available, that would
be optimal.
198
References
 Pennebaker, J.W., Booth, R.J., Boyd, R.L., & Francis, M.E. (2015). Linguistic
Inquiry and Word Count: LIWC2015. Operator’s Manual. Retrieved at
https://s3-us-west-
2.amazonaws.com/downloads.liwc.net/LIWC2015_OperatorManual.pdf.
199
Contact and Conclusion
 Dr. Shalin Hai-Jew
 iTAC
 Kansas State University
 212 Hale / Farrell Library
 shalin@k-state.edu
 785-532-5262
200

Contenu connexe

Similaire à Leveraging Flat Files from the Canvas LMS Data Portal at K-State

Toward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxToward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docx
juliennehar
 
Multi-Tier Sentiment Analysis System in Big Data Environment
Multi-Tier Sentiment Analysis System in Big Data EnvironmentMulti-Tier Sentiment Analysis System in Big Data Environment
Multi-Tier Sentiment Analysis System in Big Data Environment
IJCSIS Research Publications
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
Brad Houston
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
Brad Houston
 

Similaire à Leveraging Flat Files from the Canvas LMS Data Portal at K-State (20)

The K-State Online Canvas LMS Data Portal and Five Years of Activated Third-P...
The K-State Online Canvas LMS Data Portal and Five Years of Activated Third-P...The K-State Online Canvas LMS Data Portal and Five Years of Activated Third-P...
The K-State Online Canvas LMS Data Portal and Five Years of Activated Third-P...
 
Data-Driven Learning Strategy
Data-Driven Learning StrategyData-Driven Learning Strategy
Data-Driven Learning Strategy
 
QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETS
QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETSQUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETS
QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETS
 
QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETS
QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETSQUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETS
QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETS
 
Toward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxToward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docx
 
IWMW 2002: The Value of Metadata and How to Realise It
IWMW 2002: The Value of Metadata and How to Realise ItIWMW 2002: The Value of Metadata and How to Realise It
IWMW 2002: The Value of Metadata and How to Realise It
 
CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008
CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008
CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Data Management Plans: a gentle introduction
Data Management Plans: a gentle introductionData Management Plans: a gentle introduction
Data Management Plans: a gentle introduction
 
Data science technology overview
Data science technology overviewData science technology overview
Data science technology overview
 
Fundamentals of Database Design
Fundamentals of Database DesignFundamentals of Database Design
Fundamentals of Database Design
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?
 
SIM PASCA CHAPTER 4.pdf
SIM PASCA CHAPTER 4.pdfSIM PASCA CHAPTER 4.pdf
SIM PASCA CHAPTER 4.pdf
 
Database
DatabaseDatabase
Database
 
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
 
AIS 3 - EDITED.pdf
AIS 3 - EDITED.pdfAIS 3 - EDITED.pdf
AIS 3 - EDITED.pdf
 
Information_Systems
Information_SystemsInformation_Systems
Information_Systems
 
Multi-Tier Sentiment Analysis System in Big Data Environment
Multi-Tier Sentiment Analysis System in Big Data EnvironmentMulti-Tier Sentiment Analysis System in Big Data Environment
Multi-Tier Sentiment Analysis System in Big Data Environment
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
 

Plus de Shalin Hai-Jew

Plus de Shalin Hai-Jew (20)

Writing a Long Non-Fiction Chapter......
Writing a Long Non-Fiction Chapter......Writing a Long Non-Fiction Chapter......
Writing a Long Non-Fiction Chapter......
 
Overcoming Reluctance to Pursuing Grant Funds in Academia
Overcoming Reluctance to Pursuing Grant Funds in AcademiaOvercoming Reluctance to Pursuing Grant Funds in Academia
Overcoming Reluctance to Pursuing Grant Funds in Academia
 
Pursuing Grants in Higher Ed
Pursuing Grants in Higher EdPursuing Grants in Higher Ed
Pursuing Grants in Higher Ed
 
Contrasting My Beginner Folk Art vs. Machine Co-Created Folk Art with an Art-...
Contrasting My Beginner Folk Art vs. Machine Co-Created Folk Art with an Art-...Contrasting My Beginner Folk Art vs. Machine Co-Created Folk Art with an Art-...
Contrasting My Beginner Folk Art vs. Machine Co-Created Folk Art with an Art-...
 
Creating Seeding Visuals to Prompt Art-Making Generative AIs
Creating Seeding Visuals to Prompt Art-Making Generative AIsCreating Seeding Visuals to Prompt Art-Making Generative AIs
Creating Seeding Visuals to Prompt Art-Making Generative AIs
 
Poster: Multimodal "Art"-Making Generative AIs
Poster:  Multimodal "Art"-Making Generative AIsPoster:  Multimodal "Art"-Making Generative AIs
Poster: Multimodal "Art"-Making Generative AIs
 
Poster: Digital Templating
Poster:  Digital TemplatingPoster:  Digital Templating
Poster: Digital Templating
 
Poster: Digital Qualitative Codebook
Poster:  Digital Qualitative CodebookPoster:  Digital Qualitative Codebook
Poster: Digital Qualitative Codebook
 
Common Neophyte Academic Book Manuscript Reviewer Mistakes
Common Neophyte Academic Book Manuscript Reviewer MistakesCommon Neophyte Academic Book Manuscript Reviewer Mistakes
Common Neophyte Academic Book Manuscript Reviewer Mistakes
 
Fashioning Text (and Image) Prompts for the CrAIyon Art-Making Generative AI
Fashioning Text (and Image) Prompts for the CrAIyon Art-Making Generative AIFashioning Text (and Image) Prompts for the CrAIyon Art-Making Generative AI
Fashioning Text (and Image) Prompts for the CrAIyon Art-Making Generative AI
 
Augmented Reality in Multi-Dimensionality: Design for Space, Motion, Multiple...
Augmented Reality in Multi-Dimensionality: Design for Space, Motion, Multiple...Augmented Reality in Multi-Dimensionality: Design for Space, Motion, Multiple...
Augmented Reality in Multi-Dimensionality: Design for Space, Motion, Multiple...
 
Introduction to Adobe Aero 2023
Introduction to Adobe Aero 2023Introduction to Adobe Aero 2023
Introduction to Adobe Aero 2023
 
Some Ways to Conduct SoTL Research in Augmented Reality (AR) for Teaching and...
Some Ways to Conduct SoTL Research in Augmented Reality (AR) for Teaching and...Some Ways to Conduct SoTL Research in Augmented Reality (AR) for Teaching and...
Some Ways to Conduct SoTL Research in Augmented Reality (AR) for Teaching and...
 
Exploring the Deep Dream Generator (an Art-Making Generative AI)
Exploring the Deep Dream Generator (an Art-Making Generative AI)  Exploring the Deep Dream Generator (an Art-Making Generative AI)
Exploring the Deep Dream Generator (an Art-Making Generative AI)
 
Augmented Reality for Learning and Accessibility
Augmented Reality for Learning and AccessibilityAugmented Reality for Learning and Accessibility
Augmented Reality for Learning and Accessibility
 
Art-Making Generative AI and Instructional Design Work: An Early Brainstorm
Art-Making Generative AI and Instructional Design Work:  An Early BrainstormArt-Making Generative AI and Instructional Design Work:  An Early Brainstorm
Art-Making Generative AI and Instructional Design Work: An Early Brainstorm
 
Engaging Pixabay as an open-source contributor to hone digital image editing,...
Engaging Pixabay as an open-source contributor to hone digital image editing,...Engaging Pixabay as an open-source contributor to hone digital image editing,...
Engaging Pixabay as an open-source contributor to hone digital image editing,...
 
Publishing about Educational Technology
Publishing about Educational TechnologyPublishing about Educational Technology
Publishing about Educational Technology
 
Human-Machine Collaboration: Using art-making AI (CrAIyon) as cited work, o...
Human-Machine Collaboration:  Using art-making AI (CrAIyon) as  cited work, o...Human-Machine Collaboration:  Using art-making AI (CrAIyon) as  cited work, o...
Human-Machine Collaboration: Using art-making AI (CrAIyon) as cited work, o...
 
Getting Started with Augmented Reality (AR) in Online Teaching and Learning i...
Getting Started with Augmented Reality (AR) in Online Teaching and Learning i...Getting Started with Augmented Reality (AR) in Online Teaching and Learning i...
Getting Started with Augmented Reality (AR) in Online Teaching and Learning i...
 

Dernier

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Dernier (20)

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 

Leveraging Flat Files from the Canvas LMS Data Portal at K-State

  • 1. Leveraging “Flat Files” from the Canvas LMS Data Portal (at K-State) SIDLIT 2017 | LMS Preconference Colleague 2 Colleague August 2, 2017
  • 2. Presentation  A lot of data are created in an LMS instance, and much of this can be analyzed for insight. In 2016, Instructure, the makers of Canvas, made their LMS data available to their customers through a data portal (updated monthly). This portal enables access to a number of flat files related to that particular instance. This presentation showcases how this big data was analyzed on a regular laptop with basic office software, to summarize Kansas State University’s use of the LMS. Methods for analysis include the following: basic descriptive statistics, survival analysis, computational linguistic analysis, and others. 2
  • 3. Presentation (cont.)  The results are reported out with both numbers and data visualizations, including classic pie charts, line graphs, bar charts, mixed-charts, word clouds, and others. The findings provide some insights about how to approach the data, how to use a data dictionary, and other methods for extracting the data for awareness and practical decision-making. This work also is suggestive of next steps for more advanced analysis (using the flat files in a SQL database).  More information about this experience may be accessed on SlideShare through an article download titled “Wrangling Big Data in a Small Tech Ecosystem” at http://www.slideshare.net/ShalinHaiJew/wrangling-big-data- in-a-small-tech-ecosystem (orig. from Oct. 2016). The original article “Wrangling Big Data in a Small Tech Ecosystem” is from C2C Digital Magazine. 3
  • 4. Presentation Order  Canvas LMS at Kansas State University (K-State)  Canvas LMS Data Portal and Flat Files  The Summary Data  Some Practical Applications  Moving Forward with the Data 4
  • 5. General Approach Framework  Approaches  An instructional design approach  What can enhance teaching and learning?  A researcher approach  What can enhance accurate data collection, usage, researcher awareness, and decision-making?  Using all data (every part!)  Using all basic software tools available on a regular machine Data Clients on a Campus  Faculty  Staff  System Administrators  Leaders  Students  Analysts 5
  • 6. Canvas LMS at Kansas State University (K-State)6
  • 7. LMS History at K-State  Homegrown Learning Management System (LMS) (Axio Learning)  Informed by faculty, admin, and staff needs (IT Help Desk tickets, focus groups with faculty and staff)  Software updates rolled out annually with some patches in-between  Built mostly by K-State graduates and professional developers (often hired from student ranks)  Instructure’s Canvas LMS at K-State (2013 – present)  Availability of the data portal in 2016  Monthly updates of select data from the particular instance  Accessed at K-State in October 2016 7
  • 8. An Early Brainstorm  Brainstorm beneficial questions (data queries) before exploring the data, so you’re not limited by the found data, and keep these in mind even after the initial data exploration. It is important to conceptualize what may be practically helpful through the informed imagination first.  It would be helpful to continue with the brainstorming as the data are explored. 8
  • 9. Initial Brainstormed Questions  What can be reported out at various levels: university, college, department, course, and individual?  Is it possible to make observations about course design? Learner engagement (Discussions? Conversations?)? Advising? Technology usage (such as external tools)? Uses of the LMS site for non-course applications?  What sorts of manual-created courses exist, and how are these used? What percentage of the courses are these manual types of courses? 9
  • 10. Initial Brainstormed Questions (cont.)  How closely is it possible to map the data of a learner’s trajectory? A group’s trajectory?  What are some attributes to use to identify various groups? Which attributes would be helpful? What sorts of group-specific questions may be asked?  For example, is it possible to identify high-performing groups vs. low-performing groups in order to run analytics to see what differences there may be between the two?  What may be understood about the learning going on in a particular course? A learning sequence?  Are there ways to understand effective support for learners and support for learning from this data? 10
  • 11. Required Preliminary Understandings  Need to understand the front-end view of the LMS and its general uses on campus; otherwise, the back-end data view will be looking through a mirror darkly  Need to understand what terms are applied to the various types of data (because you want to be on the same page with the creators and users of the LMS)  Need to have experiences with the various analytical technologies applied to the particular data because various queries require different data processing and data structures  Will be applying the following: descriptive statistics, inferential statistics, direct data queries, linguistic analysis, survival analysis, sentiment analysis, topic modeling, and others  Will ultimately be applying more complex machine learning as well 11
  • 12. Required Preliminary Understandings (cont.)  Need understandings of “states” of being for various objects in an LMS  Need ability to identify anomalies and the skills to interpret what these might mean  Need to know what data mean and where to dig deeper for more relevant information  Need to know where noise might enter a particular dataset or an analytical process…and to head off the introduction of or inclusion of noise 12
  • 13. Canvas LMS Data Portal and “Flat Files”13
  • 14. Canvas Data Portal  Data updated once a month (then, now, daily)  Live dynamic data may be accessed via a higher level of service  Flat files (in compressed .gz format for download with 7Zip) downloaded from SQL servers  Also known as table data (albeit without defined structural relationships between records and therefore “flat”)  May contain labeled data like numbers  May contain unstructured or semi-structured data like texts, names, messages, and others  Contain content data (messaging), trace data (interaction data), and some metadata (data about data, often riding on imagery and multimedia)  Data described in a formal data dictionary 14
  • 15. “Flat Files” Strengths and Weaknesses Strengths  Manageable on a small-scale laptop  Can ask questions across several flat files Weaknesses  Lack relational data between the various flat files  Cannot query data effectively across the various data tables (because the relationships are not defined)  Lack access to identifier column  Lack access to the foreign key 15
  • 16. Data Dictionary  A reference resource that describes particular data  Documentation of data captured in the Canvas Data warehouse  Helpful for understanding naming protocols of the various data types  The following is a verbatim example: 16 Name Type Description assignment_id bigint (big integer) Foreign key to the assignment the override is associated with. May be empty.
  • 17. Data Dictionary 1.15.0 Facts  assignment_fact, assignment_group_fact, assignment_override_fact, assignment_override_user_fact, assignment_override_user_rollup_fa ct, communication_channel_fact, conversation_message_participant_ fact, course_ui_navigation_item_fact, discussion_entry_fact, discussion_topic_fact, enrollment_fact, external_tool_activation_fact,  file_fact, grading_period_fact, group_fact, group_membership_fact, module_completion_requirement _fact, module_fact, module_item_fact, module_prerequisite_fact, module_progression_completion_r equirement_fact, module_progression_fact, pseudonym_fact, quiz_fact, 17
  • 18. Data Dictionary 1.15.0 Facts  quiz_question_answer_fact, quiz_question_fact, quiz_question_group_fact, quiz_submission_fact, quiz_submission_historical_fact, score_fact, submission_comment_fact, submission_comment_participant _fact, submission_fact, wiki_fact, wiki_page_fact 18
  • 19. Data Dictionary 1.15.0 Dimension  account_dim, assignment_dim, assignment_group_dim, assignment_group_rule_dim, assignment_override_dim, assignment_override_user_dim, assignment_rule_dim, communication_channel_dim, conversation_dim, conversation_message_dim, course_dim, course_section_dim, course_ui_canvas_navigation_dim , course_ui_navigation_item_dim,  discussion_entry_dim, discussion_topic_dim, enrollment_dim, enrollment_rollup_dim, enrollment_term_dim, external_tool_activation_dim, file_dim, grading_period_dim, grading_period_group_dim, group_dim, group_membership_dim, module_completion_requirement _dim, module_dim, 19
  • 20. Data Dictionary 1.15.0 Dimension  module_item_dim, module_prerequisite_dim, module_progression_completion_r equirement_dim, module_progression_dim, pseudonym_dim, quiz_dim, quiz_question_answer_dim, quiz_question_dim, quiz_question_group_dim, quiz_submission_dim, quiz_submission_historical_dim,  role_dim, score_dim, submission_comment_dim, submission_comment_participant _dim, submission_dim, user_dim, wiki_dim, wiki_page_dim 20
  • 22. The Summary Data at instance level 22
  • 23. Order: First Data Visualizations and Then Light Text Commentary  The data visualizations come first…so that the audience may analyze the data to see what it says  The summary analyses come directly after the visualization, so there is a kind of debriefing 23
  • 24. 24
  • 25. Purposeful Blur and Block  Need to know how to protect against data leakage  Never share the underlying dataset  Never share unique identifiers  Always double check screen grabs against accidental inclusion of personally identifiable data (PII); use effective redaction if PII is viewable  When redacting, make sure that the redaction cannot be reversed (backwards iterated or some other strategy) and a person re-identified  Check that no metadata is riding with multimedia being released  Any personally identifiable information (PII) is obfuscated here  No granular level of data was captured in the article 25
  • 26. 26
  • 27. Workflow 1. Conceptualizing questions and applications of the data 2. Review of the dataset information 3. Data download 4. Data extraction 5. Data processing (cleaning) and analytics 6. Validating / invalidating the findings 7. Additional data analytics 8. Write-up for presentation 9. Data and informational materials archival 27
  • 29. 29
  • 30. Course Visibility  A majority of courses are not visible 30
  • 31. 31
  • 32. Course Workflow States  Claimed  Available  Deleted  Completed  Created 32
  • 33. 2. About Course Sections33
  • 34. 34
  • 35. Life Cycle State for Course Section  A majority active  A minority deleted 35
  • 36. 36
  • 37. Date Restriction Accesses for Course Sections  Non-defined (default) as the majority  Restricted section access (by learner name) to defined dates  Non-restricted (all participants in the course welcome) section access to defined dates 37
  • 38. 38
  • 39. Ability to Self-Enroll in a Section or Not  Undefined (default)  Can manually self-enroll  Must be assigned to a section 39
  • 41. 41
  • 42. Types of Assignments  None  Online_quiz  Online_upload  Assignment  On_paper, and others 42
  • 43. 43
  • 44. Time Features for Assignments  Half of assignments with no time allotment  Other half with time features  Due_at, no unlock_at, no look_at  Due_at, lock_at, unlock_at (all three) 44
  • 45. 45
  • 46. Main Themes Auto-Identified in Assignment Names  Assignment  Discussion  Work  Quiz  Participation  Final  Class  Presentation  Exam  Attendance  Homework  Chapter  Questions  Activity 46
  • 47. 47
  • 48. Some Linguistic Features of the Assignment Titles and Descriptions  Analytic: 91.69  “Formal, logical, and hierarchical thinking” vs. “more informal, personal, here-and- now, and narrative thinking”  Clout: 73.25  “perspective of high expertise” and confidence vs. “more tentative, humble, even anxious style”  Authentic: 11.83  “more honest, personal, and disclosing text” vs. “a more guarded, distanced form of discourse”  Tone: 64.98  “a more positive, upbeat style” vs. “greater anxiety, sadness, or hostility” (emotional tone) (“Linguistic Inquiry and Word Count: LIWC2015 Operator’s Manual,” 2015, p. 22) 48
  • 49. 49
  • 50. Delving into Topics of Interest  Identifying words (names, formulas, dates, symbols, etc.)-of-interest  Using NVivo 11 Plus to create word trees with the target term as the seeding topic  Ability to double-click on the respective branches to link back to the original source data files 50
  • 51. 51
  • 52. Unmuted or Muted Assignments  A majority unmuted assignments  A minority muted assignments 52
  • 53. 53
  • 54. Assignment Workflow States  A majority published (77%)  A smaller amount deleted (15%)  The smallest amount unpublished (8%) 54
  • 55. 55
  • 56. Survival Function of Assignments to Update  How long does it take before an assignment is updated?  At what point does an assignment seem to be “safe” against update?  What are some ways to understand assignments that are updated some 1,000 days after the date of creation?  Is it possible that some assignments were transferred over from a prior LMS through an LTI-enabled process that might have captured the very first moment of creation for that assignment? (“LTI” refers to the Learning Tools Interoperability standard created by the IMS Global Learning Consortium.) 56
  • 57. 3b. About Submitted Assignments57
  • 58. 58
  • 59. Grades Submittal Counts for Completed Assignments  A slice-in-time view  Roughly two-thirds graded  Roughly one-third not graded 59
  • 61. 61
  • 62. A Survey of Quiz Types  Assignment  Practice quiz  Graded survey  Survey  Affordances of the various quiz types change over time, so it is important to update on the various functions and capabilities even as one is looking at the data. 62
  • 63. 63
  • 64. Quiz Question Types in the LMS Instance  multiple_choice_questions  true_false_questions  essay_question  multiple_answers_question  short_answer_question, and others (in descending order) 64
  • 65. 65
  • 66. Quiz Question Workflow States  unpublished (default)  published  deleted  So a majority of quiz questions are created / drafted but held in reserve and not published.  What are some possible inferences that can be made from the instance- scale statistics and numbers? 66
  • 67. 67
  • 68. An Inclusive Scatterplot of Quiz Point Values  min-max range: 0 – 23,700 points per quiz  average quiz value: 33 points (w/o zeroes average in) and 28 points (with zeroes averaged in)  The 23,700 occurred twice, which suggests that it might be purposeful. That huge number, though, pulls the curve, and in a normal research context, such an outlier would likely be omitted to erase its pull on the curve, which would result in skew. A zoom-in would require going to the particular instructor and course. That might require a different approach to the data than described in this work…such as re-animating all the flat files in a SQL database and using unique identifiers to connect related data. 68
  • 69. 69
  • 70. Histogram of Quiz Point Values in LMS Instance (with a normal curve)  Frequency of point values for quizzes  Tendencies  Most at the lower number values 70
  • 71. 71
  • 72. Survival Curve of Deleted Quizzes in LMS Instance  Based on timestamp data, how long does it take for a deleted quiz to achieve “event” or be deleted (from its moment of creation)?  In this dataset, 22% of quizzes were deleted (14,769/66,366).  The min-max day range for the quiz deletions ranged from 0 - 813 days.  A survival analysis showed that the estimated survival time of quizzes that were deleted were 23.6 days, with a lower bound of 22.7 and an upper bound of 24.4 in the 95% confidence interval; the standard error was .419.  The median survival time--of the deleted quizzes--was a low 2 days, which means if a quiz is to be deleted, it usually happens fairly early.  The drop-off in the curve below is steep but tapers off after about several months. 72
  • 73. 73
  • 74. One Minus Survival Function Curve for Deleted Quizzes in the LMS Instance  Shows how long a quiz survives before it is deleted from a set of quizzes that were ultimately deleted 74
  • 75. 75
  • 76. Hazard Function for Deleted Quizzes in the LMS Instance  All quizzes in the set were ultimately deleted  This linegraph shows time-to-event of when quizzes were deleted from their respective creation-dates in the LMS instance.  All quizzes listed here ultimately were deleted.  The hazard function curve sometimes shows particular time-patterns of when a quiz is most at risk of deletion…but this curve only generally shows a steep rise initially and then a gradual achievement of time-to-event. 76
  • 78. 78
  • 79. Types of Discussion Boards: Announcement vs. Default  default (66%)  announcement (34%) 79
  • 80. 80
  • 81. Workflow States of Discussion Boards  Undefined  Active  Deleted  Unpublished 81
  • 82. 82
  • 83. Active vs. Deleted Discussion Board Entries (Replies)  Active discussion board entries  Deleted discussion board entries 83
  • 84. 6. About Learner Submitted Files84
  • 85. 85
  • 86. Handling of Learner Submissions in the LMS Instance  human_graded  not_graded  auto_graded 86
  • 87. 87
  • 88. Some Common Words from Comments Made on Submissions  Lots of encouraging words in comments made on submissions 88
  • 89. 89
  • 90. Submission Comment Participation Type  Admin  Submitter  Author  So administrators all comment on learner submissions, but not all authors or submitters comment. In other words, the creator of contents may submit the file without comment. 90
  • 91. 7. About Uploaded Files91
  • 92. 92
  • 93. Uploads and Revisions of Files to the LMS Instance by Year  A sense of the university’s transition to the LMS, over multiple years (so caution) 93
  • 94. 94
  • 95. Observed Uploaded File Types  .docx  .pdf  .jpg  .png  .pptx  .xlsx  .ppt  .zip  .dat  .xl  .mp4  .html  .txt  .mp3  .sdl  .csv  .rtf  .css  .m4v, and others 95
  • 96. 96
  • 97. Word Cloud of File Contents (from the Descriptions of File Contents)  What do the words say about what people have uploaded to the LMS system? 97
  • 98. 98
  • 99. High Frequency Word Counts in the File Names Set (as onegrams)  Final  Paper  Lab  2015  Chapter  2016  Lesson  2014  Exam  Reflection  Project  Syllabus  Report  Review  Lecture  Profile  Study  Week  Analysis  Essay, and others 99
  • 100. 8. About the Wikis and Wiki Pages100
  • 101. Wikis and Wiki Pages  A “wiki” in Canvas is a page with its history captured and able to be reinstituted (enabled by wiki software)  Pages may be interconnected  A page may be set as the home page  A page may be embedded in a modular sequence  A page may contain the MediaSite video  A page may contain any number of contents: imagery, iframes, videos, and other contents 101
  • 102. 102
  • 103. Parent Types for Wiki Pages in the LMS Instance  Course  Group  In other words, the administrators (instructors) of courses are the ones who create a majority of the pages. The learners in groups create fewer of the wiki pages.  Note that the sense of a “wiki” page is different here. 103
  • 104. 104
  • 105. Wiki Page Workflow  Null (default)  Active  Unpublished  Deleted  This needs more insight, but the data dictionary does not explain the different states and what they mean. For example, is a “null” wiki page published? Is an “active” wiki page something that is included in a sequence? Is a “deleted” wiki page recoverable or not? 105
  • 106. 106
  • 107. Word Frequency Word Cloud from Wiki Page Titles  Focuses on introductions, projects, research, teams, and others 107
  • 108. 9. About Enrollment Role Types108
  • 109. About Enrollment Role Types Role Name Basic Role Type Librarian TAEnrollment StudentEnrollment StudentEnrollment TeacherEnrollment TeacherEnrollment TAEnrollment TAEnrollment DesignerEnrollment DesignerEnrollment ObserverEnrollment ObserverEnrollment Grader TAEnrollment GradeObserver TAEnrollment 109
  • 110. University-Defined Roles and Capabilities  Some unique roles  Some shared roles 110
  • 111. 111
  • 112. Frequencies of Enrollment Roles  StudentEnrollment  TeacherEnrollment  StudentViewEnrollment  TAEnrollment  ObserverEnrollment  DesignerEnrollment (in descending order) 112
  • 113. 113
  • 114. Top Dozen Computer System Configurations for Accessing LMS Instance  …and others 114
  • 115. 115
  • 116. Request Types in the LMS Instance  GET (Read)  POST (Create)  PUT (Create)  HEAD (Retrieve Resource)  DELETE (Remove)  PATCH (Update, Modify) 116
  • 118. 118
  • 119. Group Names Frequency Word Cloud  Clone  Teaching  Design  Clinical  Plan  Final  Class  Learning  Ventilation, and others 119
  • 120. 120
  • 121. Moderator Status of Learners in Groups  not_moderator  is_moderator 121
  • 122. 122
  • 123. Learner Membership Status in Groups  Accepted  Deleted  No invited  No requested 123
  • 124. 11. About Users and Workflow States124
  • 125. 125
  • 126. User “Workflow” States in the LMS Instance  registered  pre_registered  deleted  creation_pending  The “creation_pending” may well refer to a process of approval for people to have access—for a level of security. 126
  • 127. 127
  • 128. Years of Origination of User Accounts  Initial exploration in 2013  Big push in 2014  New accounts in 2015 and 2016 indicating not only students but also employment churn and stragglers slow to change to a new LMS 128
  • 129. 129
  • 130. Retired Accounts = Registered False  2013 – early May 2017  Word frequency count from unigrams (so no full names represented as such)  First names more common and so better represented  One number removed in the “stopwords” list 130
  • 131. 131
  • 132. Pseudonyms  Pseudonyms = “logins associated with users”  Seems to be the connection between the LMS and various university information systems  Seems like partial data (extracted in May 2017) 132
  • 133. 133
  • 134. Current “States” of Pseudonyms  A majority of pseudonyms “active” vs. “deleted” 134
  • 135. 12. About Course Level Grades (based on Enrollments)135
  • 136. 136
  • 137. Numbers of Attempts for Latest Submitted Assignments  Null (no scores)  One  Two  Three  Four, etc. (in descending order) 137
  • 138. 13. About Conversations (Emails)138
  • 139. 139
  • 140. Conversations with Media Objects Included  False  True  So when people use the email system inside Canvas, they do not generally attach media objects (like digital imagery, slideshows, audio, video, or other digital files). 140
  • 141. 141
  • 142. Conversations w/ or without Attachments  A majority of conversations are without attachments  A minority of conversations are with attachments 142
  • 143. 143
  • 144. Origins of Conversations / Messages  Human-generated conversations (the overwhelming majority)  System-generated messages 144
  • 145. 145
  • 146. Conversation Messages Word Frequency Count  482,339 conversation messages  Texts with 60,509,894 words  2/3 analyzed for textual contents (because of data size) 146
  • 147. 147
  • 148. Mass Conversation Message Contents  Analytic: 82.33  “Formal, logical, and hierarchical thinking” vs. “more informal, personal, here-and- now, and narrative thinking”  Clout: 80.21  “perspective of high expertise” and confidence vs. “more tentative, humble, even anxious style”  Authentic: 26.41  “more honest, personal, and disclosing text” vs. “a more guarded, distanced form of discourse”  Tone: 66.24  “a more positive, upbeat style” vs. “greater anxiety, sadness, or hostility” (emotional tone) (“Linguistic Inquiry and Word Count: LIWC2015 Operator’s Manual,” 2015, p. 22) 148
  • 149. 149
  • 150. Messaging about “Human Drives” in the Mass Conversation Messages  Affiliation (2.35)  Power (2.19)  Achievement (1.46)  Reward (1.3)  Risk (0.37)  “The focus on affiliation and social identity seems reasonable, given the typical college age of learners. The "power" language may come from faculty speaking from positions of authority. The low level of focus on risk is intriguing here (maybe young learners are not thought to have developed the efficacy and confidence to take on uncontrolled risks?). Clearly, there is a role for theorizing and interpretation, even with computation-based analytics.” 150
  • 151. 151
  • 152. Sentiment Analysis of Sample of Conversation Messaging  A smaller sample of the conversation messages were analyzed for sentiment. This set consisted of 72,377 messages.  The automated observations of sentiment showed that there were two tendencies...either very positive or moderately negative (in terms of text categories).  In this software tool, it is possible to explore which texts were categorized to which categories of sentiment (very negative, moderately negative, moderately positive, or very positive) in the comparisons between the target text and the built-in sentiment dictionary.  In other words, the actual exploration of the content is possible through both machine reading and human close reading. 152
  • 153. 153
  • 154. Auto-Extracted Theme Based Hierarchy Chart of Conversation Messaging Sample (as a Treemap)  Class  Assignment  Time  Paper  Questions  Exam  Online  Group, etc. 154
  • 155. 155
  • 156. Auto-extracted Themes from Conversation Messaging Sample  These are in alphabetical order  The themes are listed in a human-readable way going clockwise around the pie (in a pie chart) 156
  • 157. 157
  • 158. Auto-Coded Theme-Based Hierarchy Chart of Topics and Subtopics from Conversation Messaging Sample (as a Sunburst Diagram)  This sunburst diagram—in the software—is somewhat interactive  This enables digging down into a Topic by double-clicking on it and seeing the subtopic contents there  If the sliver is too thin, a mouse hovering will result in the actual subtopic and the statistics and quant data available for viewing 158
  • 159. 159
  • 160. Contexts of “Help” in a Word Tree  It is possible to analyze the various contexts in which “help” was used in the conversation messaging in the prior word tree  In the software (NVivo 11 Plus), the word tree is interactive and is linked to the original sources where the word appears, so it is possible to achieve close reading of every use of “help” from the underlying dataset  The challenge is engaging a full dataset of millions of words 160
  • 161. 14. About Third-Party External Tool Activations on the LMS Instance161
  • 162. 162
  • 163. Numbers of External Tool Activations on the LMS Instance  External tool activations  Unique tool activations 163
  • 164. 164
  • 165. Named External Tool Activations in the K-State Canvas Instance  YouTube  Ted Ed  DropBox (with name variations)  Vimeo  Quizlet (with name variations)  MyOMLab  Khan Academy  Twitter  Flat World Knowledge  SlideShare  Yellowdig  SoftChalk Cloud (with name variations)  MyLab and Mastering  Educreations  Funbrain  Wikipedia, and others… 165
  • 166. 166
  • 167. External Tool Activations in 2013 (in alphabetical order)  Attendance Tool  Chat  CodeAcademy  Dropbox  Flat World Knowledge  Flickr Search  Graph Builder  Khan Academy  Learn LTI  McGraw-Hill Campus  Public Collections  SlideShare  SoftChalk Cloud  SoftChalk Cloud App  Ted Ed  Twitter  Vimeo  YouTube  Zoom 167
  • 168. 168
  • 169. External Tool Activations in 2014  There is an increase in both variety and number of external tool activations  No deeper analysis was applied, but it could be…as to the external tool types and the changing senses of needs 169
  • 170. 170
  • 171. External Tool Activations in 2015  There is an increase in both variety and number of external tool activations  No deeper analysis was applied, but it could be…as to the external tool types and the changing senses of needs 171
  • 172. 172
  • 173. External Tool Activations in 2016  There is an increase in both variety and number of external tool activations  No deeper analysis was applied, but it could be…as to the external tool types and the changing senses of needs 173
  • 174. 15. About Course User Interface (UI) Navigation Item States174
  • 175. 175
  • 176. Course User Interface Navigation Item State  Visible  Hidden  This refers to user capabilities of enabling the pre-set functions in the left navigation of a course shell remain active or be placed in “hidden.”  There are “hidden” navigation element presets as well, which users may choose to activate. 176
  • 177. Enablements and Limits re: the LMS Data Portal Data177
  • 178. 178
  • 179. Delimiting the Analytics from the LMS Data Portal Data  The concept behind delimiting is to make conclusions more accurate by representing how confident one may be about the results.  As noted, there may be challenges and noise in the data from any step in the workflow…but there are inherent limits also to the various data analytics types—as shown in the visualization in the prior slide. 179
  • 181. Some Practical Applications  Self awareness (holding up a mirror to the campus for its use of its LMS)  Analytics  To improve usage of the LMS  To know what functions and features are desirable  To support learner usage  To support teaching and learning  To support non-teaching and learning approaches to the data  Decision-making  Instructional design  Administrative awareness, decision-making, funding, and others 181
  • 182. Moving Forward with the Data182
  • 183. What are Ways to Go Beyond? Other Analytical Methods  Reconnecting the flat files as relational files in SQL server  Design of specific cross-file queries for data analytics  Applying more and varied computational text analysis  Engaging machine learning for patterns (such as decision trees for predictivity of classifications based on available information) Bringing in More Data  Comparing macro-level data with other instances of the Canvas LMS (such as with comparable institutions of higher education)  Using additional data to enable close-in reads (but without compromising people’s privacy)  Keep confidential information confidential 183
  • 184. Some Early Lessons Learned184
  • 185. Assessing the Initial Haul of Biggish Data  Formulating askable questions  Analyzing the columnar data (and variables)  Understanding where the data comes from and how it is processed by Instructure  Analyzing the date data  Analyzing the textual data  Understanding ways to mix data in various datasets for enriched querying  Conceptualizing mixes of questions and potential findings based on the available data 185
  • 186. Assessing the Initial Haul of Biggish Data (cont.)  Understanding the types of software that may be used to engage the data  Software enables cross-sectional base rate counts from flat files  Software enables cross-tabulation analysis and assessments of statistical significance (rarity of patterns)  Software enables finding patterns through machine learning (like applying decision trees to see what variables help determine classifications)  Software enables the identification of text-based patterns 186
  • 187. Some Early Lessons Learned  Data visualizations are only summary data, and it’s important to get to the actual underlying data to understand some dynamics.  It helps to theorize or hypothesize broadly to understand what may be going on with the observed empirical data.  It is always wise to “sanity check” data extractions and data processing to see what is going on.  It is important to understand the LMS data portal’s default settings and the rationales behind those defaults to make sure that they make sense for the particular context. 187
  • 188. Some Early Lessons Learned (cont.)  Avoid double-counting for complex data with similar lead-in terms.  Watch out to not type incorrectly.  Do not ignore error messages; figure out why they’re happening and deal with the issues.  Slow down the process, so you’re certain of what is happening at every step. Be careful not to lose data.  Be careful about going to Excel, which has 1.05 million rows of data limits. Be careful also of OS clipboards, which have 65,000 record limits. Do not let such limits stall the work and result in lost data. Go to MS Access first or SQL server. 188
  • 189. Some Early Lessons Learned (cont.)  Use the LMS data portal “data dictionary” for the LMS data, but realize that it may be dated or incomplete or inaccurate. A particular instance of an LMS will be particular, so a general dictionary offers a general view, not a specific one. Use the data dictionary in an attentive way.  Realize that there are nuances in the data that may not be apparent initially.  With computational text analysis, oftentimes, foreign languages will get short shrift. There may be effective ways to address this.  With any sort of automation, there will be trade-offs. It is important to check findings against the data and conduct data queries on multiple software tools. 189
  • 190. Some Early Lessons Learned (cont.)  Data is messy. It is totally possible (even probable) to have a process going smoothly when something has glitch-ed with a data download.  No matter what, it is not possible to import the data for processing into either Microsoft Access or SQL. In that case, there may need to be a data “substitution” by extracting the “same-ish” set from the LMS data portal (days later from when the first set was extracted).  The assumption is that new data is incremented on the end of the existing data, so if the file is the proper one, a “later” version still should be accurate. Depending on the data handling, though, that assumption may not be true. It will be important to check. 190
  • 191. Some Early Lessons Learned (cont.)  Don’t just go with how software is designed. For example, with a word frequency count, don’t just go with the high counts, but analyze the “long tail” of the low counts.  The “power law” does often apply to word counts in language. The long tail shows something of outlier data in terms of single mentions (but you have to slog through misspellings, strange alphanumeric strings, and other noise first).  There are certain data visualizations that work better for certain types of data.  All data visualizations should be sufficiently labeled.  It helps to calculate not only raw numbers but percentages, where possible. 191
  • 192. Some Early Lessons Learned (cont.)  Data portals contain personally identifiable information (PII), so extra care has to be taken to ensure that people’s private information is not misused nor leaked.  What is knowable depends on what other datasets one has access to and how one sets up the analyses…  It helps to know what is possible to know from the data (full universe)  It helps to know what is politically viable to ask and capture (subset) (people may ask for the moon)  It helps to use resources wisely to pursue asks that create constructive awareness and good decision-making (sub-subset)  Recording steps is important (in notes and in macros)…so everything can be repeated as needed. 192
  • 193. To a Relational Database  So…Flat files are downloaded as compressed .gz files, opened with 7Zip as .csv files.  Microsoft offers SQL Server Express as a free tool but limits to one CPU (up to 4 cores), 1 GB RAM, and database size limits to 10 GB (“Limitations of SQL Server Express”).  Set this up on a dedicated machine, so the setup does not disrupt other work.  In shifting to SQL Server Express, the flat files have to be properly processed for the data to move without lossiness or other problems.  It may help to process the data first in MS Access (as long as the flat file data is not too large to handle in Access). Treat text columns as “Long Text,” not “Short Text.” Label Date fields not as text but “Date with Time.” The idea is to have the proper settings for appropriate receipt in SQL. 193
  • 194. To a Relational Database (cont.)  Then, export the object from Access to Excel 2016 with the formatting and proper data structure.  If the records have > 65,000 records, then MS Access is unable to export the data table. 194
  • 195. To a Relational Database (cont.)  One option is to split the dataset in Access (Highlight the table -> go to Database Tools tab -> click Access Database -> Split database.) The problem with this is that a dataset will have to be split quite a few times to get to the low 65,000 records, and then after ingestion into SQL, any repeat data will have to be deleted. This path is too onerous to be helpful, especially with LMS data portal data which can easily go into the millions and millions of rows.  A more direct option follows on the next slide. 195
  • 196. To a Relational Database (cont.)  When files are too large (anything over the 65,000 records that will fit in a clipboard), then it makes better sense to just clean data on export in SQL. The sequence goes like this: .gz -> .csv (using 7Zip) -> open SQL Management Studio -> import data (change “DT_String” columns to “DT_Text” (for a “text stream”), so there is not a 50 character constraint on the columns), and the data import generally goes well. (This solution takes up more computer memory and is inelegant, but it solves the many issues that would crop up otherwise with a straight import without the data label adjustments.)  There is no import of column names in the first row.  In SQL Server Management Studio 17, go to Databases -> System Databases -> “master” database (right-click) -> Tasks -> Import Data … and specify that the original source is from Microsoft Excel. The flat files are now database objects (dbos) in the master database. Do keep the original file names, for ease-of-reference. 196
  • 197. To a Relational Database (cont.)  Re-indexing needed?  If so, the foreign keys may have to be reconnected to the correct primary keys for the relating in a relational database to make sense and for SQL queries across the files to make sense.  Foreign keys point to primary keys in another table; they are unique identifiers that connect related data between tables.  Primary keys are unique identifiers (and “reserved” against reuse in that sense), and they indicate unique records in data tables (and databases).  If not, it may be possible to run SQL queries by loading the tables with primary keys first and those with referring foreign keys second…but I am not there yet. Working on it. 197
  • 198. To a Relational Database (cont.)  Proceed with a good basic text on SQL server. Give it a good read-through before actually going too far into a project. (Experimentation is always good, but time wastage—not so much.)  If local support with a database administrator (DBA) is available, that would be optimal. 198
  • 199. References  Pennebaker, J.W., Booth, R.J., Boyd, R.L., & Francis, M.E. (2015). Linguistic Inquiry and Word Count: LIWC2015. Operator’s Manual. Retrieved at https://s3-us-west- 2.amazonaws.com/downloads.liwc.net/LIWC2015_OperatorManual.pdf. 199
  • 200. Contact and Conclusion  Dr. Shalin Hai-Jew  iTAC  Kansas State University  212 Hale / Farrell Library  shalin@k-state.edu  785-532-5262 200