1. Richard Nurse The Open University Library Gaining Business Intelligence from User Activity Data London 14 July 2010 Too difficult? Content Management Perspective
2. "Every day I wake up and ask, 'how can I flow data better, manage data better, analyse data better?" Rollin Ford, the CIO of Wal-Mart A special report on managing information: Data, data everywhere Economist, The (London, England) - February 27, 2010 Page: 71
3. “ Look, if you really want to transform health care, you basically build a sort of health-care economy around the data that relate to people” Eric Schmidt, Google "You would not just think of data as the 'exhaust’ of providing health services, but rather they become a central asset in trying to figure out how you would improve every aspect of health care. It’s a bit of an inversion." Craig Mundie, Microsoft A special report on managing information: Data, data everywhere Economist, The (London, England) - February 27, 2010 Page: 71
4.
5.
6.
7.
8.
9.
10.
11.
12. What sort of solution? User activity data portal APIs Web Services Data standards Processes Visualizations Recommendations LMS Link resolver ERM VLE Search systems E- portfolio Student registry Finance websites E- resources IR Dashboard
13.
14. Is anything less than a comprehensive view of the world inadequate? “ A single version of the truth” One representation - reliable - authoritative Consistent Assets Complete Products Unique Customers
30. Image Credits Clevercupcakes http://www.flickr.com/photos/clevercupcakes/2475149762/ IRRI Images http://www.flickr.com/photos/ricephotos/367696112/ Paolo Margari http://www.flickr.com/photos/paolomargari/786017449/ Scorpions and Centaurs http://www.flickr.com/photos/sshb/3264845610/ Blprnt_van http://www.flickr.com/photos/blprnt/4176305484/
31. Image Credits Patrick Hoesley http://www.flickr.com/photos/zooboing/4649039510/ Juliette Culver http://www.flickr.com/photos/julietteculver/4731004168/in/photostream/ MOSAIC final report http://www.sero.co.uk/jisc-mosaic-documents.html Cushing Memorial Library and Archives http://www.flickr.com/photos/cushinglibrary/3875300483/ CompoundEye http://www.flickr.com/photos/paopix/3328841370/ Ian S’ photostream http://www.flickr.com/photos/ian-s/2152798588/
Notes de l'éditeur
I thought I’d start with a couple of quotes to illustrate how other sectors view the importance of data and data analysis. In retail data analysis has long been embedded as a critical business tool So if you take Wal-Mart – one of the largest US retailer (who own Asda in the UK) 2m staff 1m transactions per hour Revenue of $400 billion Their CIO has this to say…
Craig Mundie of Microsoft and Eric Schmidt, the boss of Google, sit on a presidential task force to reform American health care. They see data as absolutely central to improving health care I think that in that second quote there’s a key lesson for libraries and the HE sector in that we often seem to view user activity data as something that is produced as a by-product of our activities, that tells us what we have produced, sometime what impact that has had - but we don’t use the data to continually improve the design, delivery and customisation of our services
I thought I’d do a quick run through of what might constitute success from the perspective of students, researchers and institutions Then look at the sort of solution we might need, consider some of the barriers and challenges and then give some thoughts from an OU perspective.
I’ve broken down the stages for Students using our Student Journey model So step one takes the student through to making that decision about where and what to study. And this is quite an obvious area for recommendations to help with this process of finding the right course – and I thought it was interesting that the two OU entries to the MOSAIC competition last year (from Tony Hirst and Owen Stephens) were in this area – either by matching you up with a suitable course based on what you like to read or by showing you what students on a course are reading
Moving on to the time when students are studying and you’ve got recommendations based on what other students are doing, and maybe comparisons across institutions
And then, the final step of the user journey – looking at next steps and building longer-term affiliation as students follow a career path – that’s particularly important to the OU where students sign-up a module at a time – there’s the old retail adage that it’s cheaper to keep your existing customers than to find new ones
Looking at how it can help researchers then you could include resources in their field that are newly published, or data about what is being cited
For institutions there is a big element around decision making, around being able to target your marketing effectively, being able to measure the impact of your work, and to be able to use your likely to be increasingly sparse resources to the best effect
OK – so that may be what success looks like – but what sort of solution do we want? Are we looking for a solution that links everything together A solution that pulls data from all relevant sources, that links it together to makes connections between course codes, students and their activity Should that be sector-wide rather than institutional based? That then feeds out that data to where the student is (in the VLE, library search systems) or to staff via dashboards or portals And what tools, standards and applications will be need to do this? I’ve flagged up EBSM – which is Evidence-Based Stock Management – and it is something that has been developed and adopted in some public libraries to take library loans data and produce a suite of reports to help decision-making on what stock to buy or how to rotate stock around libraries
So – as a crude model what you end up with as a diagrammatic representation is something like this A great pool of institutional systems feeding data into some form of business intelligence platform – using appropriate data standards and processes and then feeding the results out a recommender systems and visualizations onto front-end systems And that leads to a question? How comprehensive does this system need to be? Is anything less than a comprehensive view of the world inadequate
So, lets start with the view that ‘only a comprehensive view of the world of data is acceptable’ Why might you say that? No one system has sufficient data in it to uncover new insights – so for example until Huddersfield matched up loan data with student achievement data no one had the evidence of the impact of library loans on student grades If there are gaps in the data then how do we know we have an accurate picture? Some students may prefer print others electronic versions – if you only have one dataset you have a distorted view And I think there’s a parallel with CRM systems - organisations have increasingly invested in Customer Relationship Management systems to pull together all their customer contacts – data transactions need to come into the same picture So, I’d suggest that we want a comprehensive view
What in the Business Intelligence world is often called “A single version of the truth” – one representation of critical data - unique, complete, and consistent, the most reliable and authoritative information for the entire organisation And it has become something akin to a search for the holy grail But it is a view that has been challenged - not least because different users will have different perspectives
Turning to some of the challenges and barriers and looking first at cultural and institutional barriers Can key decision makers be convinced that this should be a high priority Can you demonstrate the benefits clearly, in terms of improving efficiency, saving money, or improving services, or in providing unique value Can you show some exemplars of how the data can be used and what the impact is The second barrier is the data itself Do you know which system has the data Can you get access to it? Can you convince data owners and your institution to share it The third area is the view that universities are competing against each other One argument to that is to point to the retail sector who see it as a tool to increase competitive advantage I would also say that as a sector we need to make the best use of diminishing resources and sharing this data across the sector rather than every institution doing their own thing would be more cost-effective Finally, cost – probably the biggest barrier – and we have to be clear about the benefits and clearly articulate them
Looking at the technical challenges Can you get the data out of the systems? Do you have people with the right technical skills and often you will need programmers and developers rather than library systems administrators How far back does the data go? – often log files aren’t kept of data isn’t migrated when people change systems Can you feed recommendations back into your systems – can you customise them? And finally – there’s the sheer number of different systems, with different data, built for different purposes – and you may want to not only extract data from those systems but also feed it back in to OPACs, VLEs, eportfolio systems, maybe using gadgets or widgets or RSS feeds?
Once you have the data there are still more challenges you’ve got to make sure that individual students can’t be indentified so the data needs to be anonymous You can relate to courses – but need to be careful that it isn’t possible to deduce information about individuals You need to make sure that you aren’t recording the same data in different systems and duplicating it You may have to rethink what data you store where – what fields in a record do you use to match data from one source against another And then how do you cope with potentially huge volumes of data?
Are there suitable standards to help with extracting and processing this data? Are there suitable IMS standards – IMS now have a Libraries Project Group under formation who are looking at how library systems and VLEs are integrated – currently their priority has been around the adoption of standards such as Basic Learning Tools Interoperability – but there’s the opportunity for other UK HE libraries to get involved The MOSAIC documentation helps but more will need to be done And finally, there is the whole issue of Data protection, data ownership and permissions that we will hear about later
So, that’s a quick run through some of the potential benefits, solutions and challenges, and in the last few minutes I want to go through some of the OU approaches
As a distance-learning university there are some big differences The LMS isn’t central to the student experience Students rarely visit the library, borrow books and only register in the LMS if they want a SCONUL card to use another library
And – we are accelerating a move to eresources OU students sign-up to individual modules rather than a full degree course in it’s entirety So we don’t have much LMS data
But we have a vast array of data from students visits to online as students engage with the OU through course websites in the VLE, through the library website and Student Home
But different systems are run by different departments So the VLE is run by a large Learning and Teaching solutions department that create new course websites in the VLE for Faculties Student systems are run by Student Services who are responsible for contact with students Library, Online Services, AACS also are involved in providing systems for students
And all the systems have their own reporting tools which don’t make it easy to connect up the data So you rely on programmatically extracting data from the systems into another database and matching the data together instead of being able to get a single view of all the data
Some of the early work we’ve been doing include Trialling the bX recommender service from ExLibris This is collecting SFX search data from users across the world But it’s buried quite deep within systems There’s an API available which we need to investigate
We’ve also been collecting records of searches via our federated search system So we are collecting a time-stamp, the type of search, the words used in the search and the user login Image is a sample of 1,000 searches carried out in the system When we first looked at the data it surprised us how often students just typed in their course code and expected relevant search results – but thinking about it – why wouldn’t it be logical for a student to expect that typing in a course code would bring back results relevant to that course, maybe organised according to what they need to read each week? we’ve branded it One-Stop search – and put One-Stop search in the search box to explain what it is – so the most common search is ‘One-stop search’ So always beware of the quality of your data
A final couple of examples As part of the TELSTAR project we’ve built a reference management tool into the VLE that we’ve called MyReferences MyReferences uses SFX to link to resources and by using this tool for course resources we can track which resources are being used in which courses
The last example is a brand new project that we’ve just started working with the Knowledge Media Institute at the OU Project Lucero is about exposing OU data as Linked Data and we are planning to build a couple of prototypes that will include recommender systems As a future direction – Linked Data may uncover some interesting possibilities for user activity data
So – to conclude We want a comprehensive view of the world … if we can get it The main challenges to be overcome are: A comprehensive set of tools, standards and case studies Acquiring the necessary skills A commitment that this is the way forward A change in culture towards ‘open’ data The slides will be up on slideshare shortly Thank you