NCompass Live - 7/8/2020
http://nlc.nebraska.gov/ncompasslive/
Programming with Python can alleviate the burden of routine, time-consuming tasks for library workers. In this session, attendees will learn how Python is being used at North Carolina State University Libraries to query GOBI and produce automated monthly reports for the Collections & Research Strategy department. GOBI, our print and ebook ordering vendor platform, does not offer an API, so reports used to be compiled through manual title-by-title searching. What used to take up to 15 hours per month (and was the cause of much frustration) now takes just 30 minutes and one press of a "run" button, all thanks to Python’s diverse set of libraries and abilities. Following a presentation of this script and how it was developed, attendees will learn methods for identifying the right Python packages and methodologies for their unique needs and project ideas, even if they are new to programming.
Presenter: Katharine Frazier, University Library Technician, North Carolina State University Libraries.
17. “Python is good for repetitive work, and
working with large sets of data
programmatically instead of by hand.”
18.
19. I want to do this task today... There’s a Python library for that!
Data manipulation Pandas
Web automation Selenium
Spreadsheet writing OpenPyxl/Pandas
String matching Fuzzywuzzy
27. 1. Takes a list of items (holdings data) and
searches GOBI for matches
2. Grabs the price, binding, year of publication for
each match
3. Selects newest edition, matches to original list
of holdings data
41. ● Use regular expressions to find desired values
● Use fuzzy string matching to identify title matches
● Send matching title information to lists
42. ● Create DataFrame from dictionary
● Create sub-frames for print and ebook options
43. ● Sort each DataFrame by descending date
● Drop duplicates: keep only the newest copy
available
44. ● Merge print and ebook DataFrames
● This results in one DataFrame with both print and
ebook purchase information, and only one option
for each title