This presentation was provided by Stuart Maxwell of Scholarly iQ, during the NFAIS Forethought event "Artificial Intelligence #2 – Processes for Media Analysis and Extraction" The webinar was held on May 20, 2020.
2. 1. About SiQ
2. Smarter Topic Analysis
Objective
Data
Findings
Actions
3. Lessons Learned
3. Leaders in usage reporting since 2002
Authorized COUNTER R5 vendor for ALL Platform, Database, Title and Item reports
Trusted 3rd party maintaining COUNTER compliancy every year since 2003
Fully independent, flexible and client focused solutions including SQL, Hadoop, Hive, Redshift, IBM,
Tableau, Cognos, QlikView and others
Delivering platform independent reporting on HighWire, Silverchair, IDM, Safari, ingenta,
PubFactory etc as well as publisher specific custom platforms
Innovating new uses and benefits from usage data such as the integration of SiQ’s PSI Metrics for
OA Usage Reporting and Denials Reporting
5. Topic Health Monitor
Objective – Aid editorial decisions and customer communications
Why - Usage data is provided for Titles, Databases, Books, Articles etc but could this
be made more intelligible in terms of subjects, topics and concepts?
How – Integrate trusted, industry standard quantitative performance metrics over
time with descriptive taxonomic data by DOI/URI
Outcome – THM will flag and predict which topics were significantly increasing or
decreasing in usage
Future Objectives – Segment by further available dimensions (Institution, Geo, Title, Funder etc)
Feed performance data into predictive models
Integrate with SiQ’s PSI Metrics OA usage reporting to qualify open access
topic usage models
6. Data
COUNTER Compliant Usage – Downloads per article per month over time
Publisher Taxonomic data – Topic identifiers by DOI
Calculation -
𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑐ℎ𝑎𝑛𝑔𝑒 =
(𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 𝑑𝑜𝑤𝑛𝑙𝑜𝑎𝑑𝑠 𝑓𝑜𝑟 𝑀𝑜𝑛𝑡ℎ − 𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 𝑚𝑜𝑛𝑡ℎ𝑙𝑦 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑓𝑜𝑟 𝑦𝑒𝑎𝑟)
𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 𝑚𝑜𝑛𝑡ℎ𝑙𝑦 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑓𝑜𝑟 𝑦𝑒𝑎𝑟
12. Topic Health Monitor - outputs
Showed changing topic usage rate of change with speed and direction of travel with increased
intelligibility compared to Titles, Articles, Books etc
Showed quantitative performance metrics fueling ongoing reporting and analytics into
trajectory and predictive models
Showed segmentation and custom analysis to targeted questions such as interest/topics usage
over time for particular clients
Identified opportunities for gap analysis between usage, search terms, content acquisition etc
Identified opportunities for topic clustering for recommendations as well as behavioural
customer segmentation
13. Topic Health Monitor - outputs
BUT – identified significant and meaningful skew in the taxonomic data
Missing/erroneous taxonomic data across the content with usage meant that reporting,
analysis and data models were NOT reliable
So explored solutions to achieve more complete subject, topic, concept data to integrate into
THM
UNSILO partnership with SiQ to mine concepts associated with DOI/URI directly from content
Return to test phase with joint customer data
14. Objectives -
Questions -
Data -
Compliance –
Actions –
Lessons Learned – know your
Set clear, contextual reasons for doing
What answers do I need to find and what are the boundaries of what these
answers might tell me?
What data do I need to answer these questions?
Where does underlying data come from? Can different sources work
together?
What are the metrics that are showing value for us? Where does this data
come from?
Do I have comprehensive enough data or are there gaps/skews? Can it be
replicated and standardised?
Can this data be used this way and what governance should be in place?
Understand the underlying reasons for results before taking action
15. Topic Health Monitor – next steps
Improved harvesting of content/concept data –
Integration with SiQ PSI Metrics OA Usage Reporting –
Enhanced, Integrated Topic Health Monitor –
Predictive modelling, recommendations, ???