5. What I’ll talk about
• Introduction
• The Long Tail
• The Short Head
• The Race to the Bottom
• The Academic Tail (Tale)
• Data vs. Information vs. Knowledge
6. Where is the Life we have lost in living?
Where is the wisdom we have lost in
knowledge?
Where is the knowledge we have lost in
information?
--T.S. Eliot, The Rock, 1934
Where is the information we have lost
in data?
8. Wikipedia Information
Information, in its most restricted technical sense, is a sequence of symbols that
can be interpreted as a message. Information can be recorded as signs, or
transmitted as signals. Information is any kind of event that affects the state of
a dynamic system. Conceptually, information is the message (utterance or
expression) being conveyed. The meaning of this concept varies in different
contexts.[1] Moreover, the concept of information is closely related to notions
of constraint, communication, control, data, form, instruction, knowledge, meani
ng, understanding, mental stimuli, pattern, perception, representation, and
entropy.
10. WikipediaData
Data ( /ˈdeɪtə/ DAY-tə, /ˈdætə/ DA-tə, or /ˈdɑˈtə/ DAH-tə)
are values of qualitative or quantitative variables, belonging to a set of
items. Data in computing (or data processing) are represented in a
structure, often tabular (represented by rows and columns), a tree (a set
of nodes with parent-children relationship) or a graph structure (a set of
interconnected nodes). Data are typically the results of measurements and
can be visualised using graphs or images. Data as an abstract concept can
be viewed as the lowest level of abstraction from which information and
then knowledge are derived. Raw data, i.e., unprocessed data, refers to a
collection of numbers, characters and is a relative term; data processing
commonly occurs by stages, and the "processed data" from one stage may
be considered the "raw data" of the next. Field data refers to raw data
collected in an uncontrolled in situ environment. Experimental data refers
to data generated within the context of a scientific investigation by
observation and recording.
11. These definitions are long and
complicated.
That’s because data and information
are complex concepts.
13. The Economics of Information
Changed in the Digital World
• Cheaper production costs
• More people working for less money
• Single-copy problem disappears
• But not fungible
17. The Book That Made the Long Tail
Famous:
The Long Tail: Why the Future of Business Is Selling
Less of More
by
Chris Anderson
(2006)
18. Beyond the Long Tail
Demand
Quantity
Price
Supply
A smaller number of blockbusters…
…And a growing number of
snowballs
…Create new value, which raises the
equilibrium price of media, and also
increase demand elasticity
21. Self-Publishing
• Amazon
– Kindle Direct Publishing
• High royalty %. Publishing is free and you can earn up to 70%
royalty while having the ability to set your own price.
• Quick publishing. Publishing takes less than five minutes and
your book usually appears on the Kindle store within a day.
• Easy. A final manuscript and an Amazon account are all that
you need to publish your book on Amazon.com.
• Sell globally. Publish books written in English, German,
French, Spanish, Portuguese and Italian and specify pricing in
US Dollars, Pounds Sterling, or Euros.
Source: http://www.amazon.com/gp/seller-account/mm-summary-page.html?topic=200260520
22. Economics: Self-Published eBooks
• Publication of eBooks allows quick access to
the marketplace.
• Often no upfront investment
• High profit is possible if sales are high.
• Many authors aren’t in it for the money; they
want to get out a message or showcase their
written art.
23. Graph Source: Wikipedia
Many publishers want nothing to do with the long tail authors on the far
right of the graph.
But the subsidy publishers do.
24. Subsidy Publishing
• Amazon
– Advantage
• Advantage is a self-service consignment program that
enables you to promote and sell media products
directly on Amazon.com.
• You supply the product. They sell it.
Source: https://www.amazon.com/gp/seller-account/mm-product-page.html?topic=200329780&ld=AZAdvanMakeM
25. Subsidy Publishing
• Lulu
– A smorgasbord of book preparation options
– Printing
– Marketing opportunities/services
• Other Subsidy Publishers Offer Range of
Services
26. Libraries as Publishers
• Print is not done very much.
• Most likely to be databases, e.g., digital
collections or information repositories (which
seem to be blending into single presence)
• Access may be free or free but restricted.
• On the public library side there is much
interest in exposing local resources via print or
digitally.
30. Making Apps for a Profit
• The race to the bottom pushes prices downward
in the apps store.
• It takes an investment to make apps.
• Good apps are hard to make.
• Most apps created by individuals will not be
profitable.
• Traditional game makers are also hard hit.
http://www.nytimes.com/2012/11/18/business/as-boom-lures-app-creators-
tough-part-is-making-a-living.html?pagewanted=all
31. “On the one hand information wants to be
expensive, because it's so valuable. The right
information in the right place just changes your life.
On the other hand, information wants to be
free, because the cost of getting it out is getting lower
and lower all the time. So you have these two fighting
against each other.”
—Stewart Brand
37. Big Data
• Internet of Data
• Internet/Web of Things
– RFID generate data through sensors
– Sensors generate data
– Actuators accept data
– Will see much more data from the Internet of
Things from, e.g., hobbyists
• Locally-Generated Data
38. Big (and not-so-big) Data
• The web allows us to produce, consume and
aggregate more and more data
• Libraries hold lots of data in electronic and
non-electronic form
• The data has value only as it allows us to ask
and answer questions (i.e., to form
information and knowledge)
39. Big Data
• Institutional and other repositories will see
larger and larger datasets.
• Tools will be needed to manipulate and
analyze this data accurately and effectively.
40. Data Is What You Make of It
• Undifferentiated data is useless.
• Require “analytics” to spot
trends, regularities, irregularities, connections
(and other relationships).
• Discovery tools may be useful.
• Data visualization tools may be useful.
• Specialized tools exist for subject
domains, e.g., viewing genomic data
41. Much of Big Data Is Free…
• …and it’s worth every penny.
• Why are so many datasets freely available
without cost when articles about them may be
quite expensive?
• It’s the difference between data and
information.
42. Free Big Data
• Much of the data will never be processed after
collection, e.g., environmental sensor data
• Collection and storage is cheap, so store it
just-in-case.
43. Government Free Data
• More data is becoming available online from
government at all levels.
– City of Chicago Data Portal
(https://data.cityofchicago.org/)
– New York City Data Sets
(http://cupop.columbia.edu/research/signature-
research-areas/new-york-city-data-sets)
– Federal Data (http://www.data.gov/)
44. Federal Data
• 373,029 raw and geospatial datasets
• 1,209 data tools
• 309 apps
• 137 mobile apps
• 171 agencies and subagencies
45. Government Free Data
• Data usually available in a single format for
each dataset.
• May be tools to export, view and manipulate
the data.
47. The Short Head
• Most big trade publishers make most of their
money from some highly-marketable books
from highly-marketable authors.
• There is a decline in the number of “mid-list”
authors that get published.
• Sometimes authors who can’t find an agent or
publisher turn to other channels.
50. Costs of Publication
“In justifying the margins earned, the publishers, Reed Elsevier (REL)
included, point to the highly skilled nature of the staff they employ
(to pre-vet submitted papers prior to the peer review process), the
support they provide to the peer review panels, including modest
stipends, the complex typesetting, printing and distribution
activities, including Web publishing and hosting. REL employs
around 7,000 people in its Science business as a whole. REL also
argues that the high margins reflect economies of scale and the
very high levels of efficiency with which they operate.
We believe the publisher adds relatively little value to the
publishing process. We are not attempting to dismiss what 7,000
people at REL do for a living. We are simply observing that if the
process really were as complex, costly and value-added as the
publishers protest that it is, 40% margins wouldn’t be available. ”
Source: Deutsche Bank AG, “Reed Elsevier: Moving the Supertanker,” Company Focus: Global
Equity Research Report. (January 11, 2005), 36.
51. Costs of Publication
• Journals that cost upward of $40,000 per year
• Do the vendors’ explanations account for this
predatory pricing?
• Are they worth it?
52. A Particularly Good Article
• “Is the Academic Publishing Industry on the
Verge of Disruption?”
• US News
• http://www.usnews.com/news/articles/2012/07/23/is-the-
academic-publishing-industry-on-the-verge-of-disruption
• Source for a number of the slides that follow
53. How Journals Became a Captive
Market
• Most journals are published by scientific
societies.
• In recent history they have been acquired by
profit-making entities, e.g., Elsevier.
• The scientific societies’ motives are to provide
a consistent revenue stream and eliminate the
work associated with publishing journals.
Source: http://www.usnews.com/news/articles/2012/07/23/is-the-academic-publishing-industry-on-the-
verge-of-disruption
54. The Harvard Letter to Faculty
• Harvard is paying $3.75 million annually in
journal subscriptions and they make up "10%
of all collection costs for everything the
Library acquires.“
• "Major periodical subscriptions, especially to
electronic journals published by historically
key providers, cannot be sustained."
Source: http://www.usnews.com/news/articles/2012/07/23/is-the-academic-publishing-industry-on-the-
verge-of-disruption
55. More on the Process
• Faculty and staff do the research and write the
papers.
• The submitted papers are vetted (refereed) by
scholars in the same field, who are usually
unpaid.
• The articles are published.
• The academic library must pay for the journal.
56. Irony Time
• The university or the government pays for the faculty/staff
time to do the research and write the papers.
• The university pays salaries to the referees who review the
papers.
• The author of the paper often pays the publisher a fee to
cover expenses of publication.
• The university pays for subscriptions to the journals in
which the research is published.
The Quadruple Whammy
57. The Illusionary Solution
• Abolish tenure.
– Scholarly publishing would not disappear.
– There would still be pressure for faculty to
publish.
– Thus, predatory practices would not be
eliminated, although they might be ameliorated to
a minor degree.
58. A Better Solution: Open Access (OA)
Journals
• The Directory of Open Access Journals (DOAJ)
lists some 8,976 journals.
• Of these 4,573 journals are searchable to the
article level, with 1,074,850 articles.
• So, there are lots of OA journals with lots of
articles.
Source: http://www.doaj.org/
59. Problems : Open Access Journals
• OA journals still don’t get all of the respect of
other scholarly journals.
• Many journals require payment by the
author(s) of production costs. (However, this is
also true of many non-OA journals.)
Source: http://www.usnews.com/news/articles/2012/07/23/is-the-academic-publishing-industry-on-the-
verge-of-disruption
60. Public Library of Science (PLOS)
• Began publishing in 2003, with PLOS Biology
• Peer Reviewed Journals (7 of them)
• Fully open access
• Funded through memberships and
grants/contributions
61. NIH
• Policy forces grant recipients to deposit their
articles in PubMed within a year of the
manuscript’s publication.
• Many medical researchers rely on PubMed for
some of their information needs.
62. Fair Access to Science and Technology
Research Act (FASTR)
• Introduced in current (113th) Congress
• Successor to Federal Research Public Access
Act (FRPAA)
• “Both bills would require open access (OA) to
peer-reviewed manuscripts of articles
reporting the results of federally-funded
research.”
Source:
http://cyber.law.harvard.edu/hoap/Notes_on_the_Fair_Access_to_Science_and_Technology_Research
_Act
63. Distribution
• Print Journalsstill expensive ||{examples}||
• Aggregators (e.g., Ebsco, ProQuest)
• Publishers as “aggregators”
• Economics? Journals and databases cost a lot.
64. [Potential] Conflict of Interest
• American Chemical Society (ACS) sells a
package of journals and databases.
• ACS grants “approval” of Bachelor’s programs.
• One of the requirements of approval is
“modern chemical information resources”.
• You be the judge.
• Economics? Approval as driver of sales
65. Granularity
• Publishers and aggregators can sell objects as
small as individual papers or book chapters.
• Identification by Digital object Identifier (DOI)
• “I want to get those like buying an iTunes
playlist,” says head of an academic press.
• Is there more money to be made that way?
Source: http://chronicle.com/blogs/profhacker/interview_dukeup_2/48527
66. Poster Session
When Free is Enough: Locating Quality Chemical
Information With and Without Subscription
Databases
Ariel Neff, UW-Madison, Chemistry Library