Advanced Machine Learning for Business Professionals
Extract and Analyze Culture/Trend Data from SJPL Digital Collection
1. Extract and Analyze Culture/Trend Data
from SJPL Digital Collection
Theme: Helping San Jose Public Library figure out how to make
the California Room Digital Collections more open, engaging, hackable,
linkable, browsable, tag-able, map-able and responsive.
Open Data Hack SJ
Saturday, February 21, 2015 from 9:30 AM to 5:00 PM (PST)
San Jose, CA
Hiroyuki Sato @sa2hi
2. OPEN DATA
• SJPL DIGITAL COLLECTIONS
• California room
• School Yearbooks
• http://www.sjpl.org/yearbooks
• Digital Data is available for San Jose High School Yearbook (1902-
1929)
• http://digitalcollections.sjlibrary.org/cdm/landingpage/collection/sjplyb
3.
4. What I wanted to do / I did
• Can we see a culture/trend from the Digital Collection?
• Extract the data related to athletic teams
• Count manually the numbers of people for each sport team on yearbooks…
• https://docs.google.com/spreadsheets/d/1GhCA-I6mRZ1rs-
ORHNt1ktfrd3qn9OB4h7JeqJlaYn4/edit#gid=0
• Visualize the data
7. 1905 1910 1915 1920 1925
Transition of numbers of team members for each sport
8. Issues
• A lot of missing years
• Need more meta data
• Need automated detailed metadata extraction technologies from picture and
text
• Need population/total numbers of school people to compare a data
with a data for different year
• Need other schools digital data