Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Ebook Availability Revisited: A quantitative analysis of the 2012 ebook aggregator marketplace
1.
2. Ebook Availability Revisited
A quantitative analysis of the 2012
ebook aggregator marketplace
John McDonald & Jason Price, PhD
CIO & AVP Interim Library Director
Claremont Colleges Library
4. Today’s Outline
1) availability of print book content in the eBook
aggregator marketplace: 2008 v 2012
2) eBook Aggregator market share, depth &
breadth
1) Full Collections (EBL, Ebrary, EBSCO, MyILibrary)
2) Subscribed collections (Ebrary & EBSCO)
3) Publisher coverage
4) University Press collections (UPCC & UPSO)
3) Hathi Trust (& Google Books) - for perspective
5. 2008 Study
Research Question (from a campus administrator):
“If you were to stop buying print books today (i.e. in
2008), what proportion of your purchases could be
replaced with ebooks?”
Print book purchases (Cat date 2006-2007)
• Data from 5 Libraries
• 78,000 books total
• Publication year ranged from 1912 – 2007, 82% were from 2005-07
Ebook Marketplace Data from 4 eBook vendors
• EBL, Ebrary, MyILibrary & NetLibrary source files
• 220,000 books total
• Publication year ranged from 1901 – 2009, 24% were from 2005-07
6. 2006/7 pBook purchases available as eBooks by Vendor
5% 13% 13% 10% 23% 26%
Ebrary
Library EBL Ebrary MyILibrary NetLibrary At least one
Sub
C 4.9% 11.9% 13.7% 11.4% 23.3% 27.2%
A 5.4% 10.3% 10.3% 10.8% 18.3% 21.3%
D 4.7% 15.4% 15.4% 11.8% 25.0% 29.4%
L 4.7% 14.6% 14.2% 9.7% 23.2% 27.3%
S 7.1% 13.9% 13.5% 8.0% 23.0% 26.9%
7. 2008 ≈ 30% pBook purchases available as
eBooks from major aggregators
100%
2006-2007 print book purchases
avalable as ebooks in 2008
80%
60%
40% 30.5% 32.2% 31.2% 31.7%
24.5%
20%
0%
C A D L S
Library
8. 2012 “repeat” of this study
Datasets
PRINT BOOKS ELECTRONIC BOOKS
• 2006 and 2011 pub date • Aggregator Marketplace from
• Data from 4 libraries, ebook KnowledgeBase*
combined w/ISBNs
• 34,000 books total
• 21,000 from 2006 • ISBNs run against xISBN API to
• 13,000 from 2011 get OCLCnum & then against
• Unique IDs xOCLCnum API to get Work ID
• OCLC number & WorkID • Numbers to be announced…
(instead of ISBN matching)
Thanks to Sam Kome for untold hours of data wrangling
9. Audience Poll
What proportion of 2011 pub date
purchases are available as ebooks? (vs
2008 availability)
a) Fewer (<20%)
b) About the same (20-35%)
c) More (35% - 50%)
d) Way more ( > 50%)
10. pBook purchases available as eBooks by Vendor
Ebrary MyI- At least
Year EBL Ebrary EBSCO
Sub Library one
2008 7% 15% 19% 17% 20% 24%
2012 6% 26% 33% 25% 28% 37%
So… 37% of 2011 Pub date purchases were available
in the 2012 ebook aggregator marketplace
11. YBP Select category comparison
Zero
2006 & 2011 Print Books aggregators At least one
Basic-Essential 75% 25%
Basic-Recommended 77% 23%
Research-Essential 60% 40%
Research-Recommended 61% 39%
Specialized 61% 39%
Supplementary 75% 25%
12. Part 2: Switching Gears…
to a Quantitative analysis
of the ebook aggregator
marketplace
13. Audience poll
How much has the size of the ebook
aggregator marketplace increased in the
past four years?
a) 10%
b) 25%
c) 50%
d) 100%
23. State University of New York Press 3304 Wiley 2194
1EbraryAC 1495
1EbraryAC 60
2Common 699
2Common 135
MIT Press 2186
3EbscoES 3109
1EbraryAC 130
National Academies Press 3301
2Common 638
1EbraryAC 3301 3EbscoES 1418
ABC-CLIO 2903 University of California Press 2039
3EbscoES 2903 1EbraryAC 211
Routledge 2837 2Common 769
1EbraryAC 954 3EbscoES 1059
2Common 1883 Ashgate Publishing Group 1716
Cambridge University Press 2698 1EbraryAC 30
1EbraryAC 464 2Common 1686
2Common 1379 Continuum International Publishing Group Ltd / Books
1565
3EbscoES 855 3EbscoES 1565
Brill Academic Publishers 2642 Palgrave Macmillan 1486
1EbraryAC 17 1EbraryAC 1486
2Common 1847 Continuum International Publishing 1437
3EbscoES 778 1EbraryAC 469
2Common 968
ABC-Clio - Greenwood Publishing 2450
John Benjamins Publishing Co. 1368
3EbscoES 2450
3EbscoES 1368
Oxford University Press 2395
World Bank Publications 1295
1EbraryAC 2395
1EbraryAC 1295
Nova Science Publishers, Inc. 2329 Sage Publications, Ltd. 1275
3EbscoES 2329 3EbscoES 1275
24. University of Chicago Press 1273
1EbraryAC 1273
University of Minnesota Press 1252
1EbraryAC 1252
Emerald Group Publishing Ltd 1180
1EbraryAC 1058
2Common 122
John Benjamins Publishing Company 1090
1EbraryAC 3
2Common 1087
Elsevier Science 1060
3EbscoES 1060
Indiana University Press 1038
1EbraryAC 120
2Common 464
3EbscoES 454
Kluwer Academic Publishers 1030
1EbraryAC 1030
Greenwood Press 1024
1EbraryAC 276
2Common 748
Jessica Kingsley Publishers 1014
1EbraryAC 83
2Common 796
3EbscoES 135
25.
26. Subscribed Collections Summary
• 33% Ebrary : 27% in Common : 40% EBSCO
• Ebsco’s advantage comes from older ebooks
• 1980-1999
• Many publishers subscription ebooks are
only available through one aggregator
• Ebsco – ABC Clio, Nova Science, J. Benjamin, SAGE, etc.
• Ebrary – Nat. Acad., Oxford, Palgrave, World Bank, etc.
• Subject coverage is relatively similar
27. Subscription cost less than a penny on
the dollar per year!
Subscribable Ebrary Ebooks = 77,482
purchase price $5,670,776 (single-user price)
≤ $3.75/FTE… so for 5000 fte = $18750/year
% of list price per year = 0.33% (multi-user price)
Years to buy = 300+ years!
Ebsco subscription pricing is similar…
Q.E.D. If you are investing in aggregator ebooks,
you should seriously consider both subscription
packages, and avoid buying individual books that
are (or will be!) available by subscription…
This presentation repeats a study we presented at the 2008 Charleston Conference.Available: http://www.slideshare.net/john_mcdonald/charleston2008-ebook3
UPCC = University Press Content ConsortiumUPSO = University Press Scholarship Online
Details on the 2008 study.
The 2012 study used different methods to ask the same question.Four of the five libraries from the first study were includedWe started with the OCLC xISBN & WorkIDcrosswalks from the startWe used ebook knowledgebase data instead of vendor source files**One aggregator expressed concern that the KB based numbers looked quite low—We are validating our results by acquiring all of the source files direct from the aggregators to determine whether the 2012 marketplace (and match rate) is significantly underrepresented by the knowledgebase data.
C was correct; More 35-50%Using our updated protocol, 24% of 2006 (2008’s study) pub year print books were available electronically, while37% of 2011 (2012’s study) pub year print purchases were available Not status quo, but not a quantum leap either!The increase from 24 to 37% does represent more than a 50% increase…*(subject to validation using ebook aggregator source files)
We were interested in the perceived ‘quality’ of the available ebooks (some have speculated that ebook vendors are publishing lots of books but of lower quality).Availability does appear to vary by YBP select rating- E book versions were more likely to be available for research and specialized books than for Basic or Supplementary. There were no obvious differences between the two year samples in this comparison (data not shown). We speculated that a large portion of the books in the two Basic categories are University Press publications that are less likely to be in aggregator lists.
D – 100% was the correct answer (see next slide)Follow-up work based on aggregator-direct lists may show it to be even greater
2008 inGreenvs 2012 in brown Percentages are portion of the total marketplace in that year (for green bars the proportion of the tallest green bar, and likewise with brown)Greatest market share increase was 22% ebrary, while Ebsco/Net Library decreased by 17%,perhaps not surprisingly given that their 77% share of the market in 2008 was based on their unique collection of older books rather than on that great a difference in acquisition of the current year’s monographic output. Average market share has increased slightly from 47 to 52%, but the range has dropped by 2/3rds (ie from 46 to 16%) If you were making a decision in the past on an aggregator based on the relative size of their collections, those differences are decreasing over time.So if each aggregator covers 50 % of the marketplace, it follows that two should cover most books, right?...
Unfortunately, that’s not the case--nearly 50% of aggregator-hosted ebooks are still offered by only ONE vendorAvailability across aggregators has increased significantly however: in 2008 only 1 in 25 books were available from all 4 whereas in 2012, 1 in 5 is available from all four
The vast majority of ebooks in aggregated products are those published in the past 15 years.with Ebsco/Net library showing clear dominance in terms of # of books offered each year for 1990-2004. The past 8 years (2005-2012) are worth looking at it more detail.
In 2008, MyILibrary (in blue) moved from the middle to the bottom in terms of coverage of the most recent years’ books, Starting in 2010, Ebrary (in green) outranked its competitors, and Ebsco has fallen in line with EBL over the past two years.
In 2011 Ebsco launched its own large academic subscription optionThe first competitor for ebrary’s Academic Complete subscription collection
Publication dates of subscribed collections mirror the full collectionsEbsco has many more books from the 80s & 90sWhen we take a closer look at the most recent decade…
Ebrary has a small percentage more from the first half of the decadeAnd things seem to be evening out for the most recent five yearsThere appear to be signs of a bit more differentiation for 2010 and 11 as the number of books in common is a lower proportion of the wholeAnd as expected there is about a year and a half lag before ebooks are added to these collections
The key point here is that there are a number of cases where particular publisher imprints are only listed for one aggregatorThough one might expect that this is the result of “exclusive” subscription deals, it is more likely caused by the newness of the Ebsco subscribed collection AND in some cases different publisher naming conventionsFor example ebrary does have books from Elsevier Science but uses the many subdivisions rather than grouping themThis same data is presented on the following two slides with a focus more on the publishers and less on the pattern
Keeping that caveat in mind, the blue and green bars here show cases were a publisher is listed in only one of the two collectionsAnd bar charts show the relative numbers for the publishers that appear in both collections
Keeping that caveat in mind, the blue and green bars here show cases were a publisher is listed in only one of the two collectionsAnd bar charts show the relative numbers for the publishers that appear in both collections
Based on LC class distribution to examine subject coverageOverall the distribution is quite similarEbraryACmay lean slightly toward science (see Medicine, Science and Technology) while EbscoES has more books in the humanities & social sciences
The data presented on this slide argue STRONGLY for subscribing to BOTH collections, given the differentiation presented in the previous slidesInstead of a 10 or 20 year period of subscription matching the purchase price, it would take 300(!) years of subscription costs to own the same contentFurthermore and perhaps equally important(!) books in the subscribed collections have unlimited simultaneous use, while purchased books from these two vendors are limited to a single simultaneous user (unless a premium is paid for each book)
70-85% of the ebooks available in these University press collections are available from the major aggregators (albeit with more restrictive DRM)
But of course they represent only a tiny portion of what is available in the overall marketplaceJSTOR offerings are just becoming available as we speak
Each color slice shows the proportion of collection by decade or century*from hathitrust.org website [http://www.hathitrust.org/visualizations_dates]
Caveat– this only reflects relative size -- its not book to book matching as in prior comparisons…that’s a next step
Not a fan of 3d graphs, except in this case where the EBAM is effectively carpet on the floor that wouldn’t be visible if you looked at it end on.Although some currently question the value of our owned print collections, potential future subscription models from Hathi or Google (should copyright and royalty issues prove to be solvable) would definitely call into question the value of ownership of aggregator hosted ebooks that are subsumed within them