Ecological Society of America Workshop on Incentives for Data Sharing
1. Ecological Society of America
Workshop on Incentives for Data Sharing
Washington, DC
February 19-20 2009
“Vertical section drawing of Cavendish's torsion balance instrument including the building in which it was housed.” http://en.wikipedia.org/wiki/Cavendish_experiment
2. “Experiments to determine the density of the earth,” by Henry Cavendish, ESQ., F.R.S. AND
A.S. Read June 21, 1798 (From the Philosophical Transactions of the Royal Society of
London for the year 1798, Part II. , pp. 469-526)
From: http://www.archive.org/details/lawsofgravitatio00mackrich
3. Field notes from the AMNH “Lang-Chapin” expedition to the Belgian Congo (1909-1915)
http://diglib1.amnh.org/cgi-bin/database/index.cgi
4.
5. The NCAR Research Data Archive (RDA)
“The NCAR Research Data Archive (RDA) is a comparatively small
(currently 246 TB, less than 5% of the MSS [Mass Storage System] total
size), but very important, part of the MSS stored data. The RDA has
been curated by the staff in the Computational and Information
Systems Laboratory for over 40 years, [emphasis added] and as such
contains reference datasets used by large numbers of scientists.
The RDA contents are long-term atmospheric (surface and upper
air) and oceanographic observations, grid analyses of observational
datasets, operational weather prediction model output, reanalyses,
satellite derived datasets, and ancillary datasets, such as
topography/bathymetry, vegetation, and land use. The RDA is not
a static collection; it is now over 580 datasets with about 100
routinely updated and 10-20 new ones added each year. “
C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge
sharing ,” from the 4th International Digital Curation Conference December 2008, page 5.
www.dcc.ac.uk/events/dcc-2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]
6. NCAR Research Data Archive (RDA)
C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge
sharing ,” from the 4th International Digital Curation Conference December 2008 , page 7.
www.dcc.ac.uk/events/dcc-2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]
7. “Reanalyses” [or Meta-Analyses ]
“Atmospheric reanalyses are a main feature within the RDA and were
intended to be, and have become, a very valuable data resource
for a wide variety of climate and weather studies. By combining
many types of atmospheric observations with advanced data
assimilation and forecast models a “best possible” 3D estimate of
the atmospheric state over extended time periods is achieved.
“Reanalyses are supported by many historical data sources that have
been curated over time. As an illustration the major sources of
atmospheric profile data include wind only soundings beginning in
1920 (Figure 2). These are augmented with soundings of
temperature, humidity, and wind beginning in 1948. “
C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge
sharing ,” from the 4th International Digital Curation Conference December 2008, page 6.
www.dcc.ac.uk/events/dcc-2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]
9. The $3.6 billion Large Hadron
Collider (LHC) will sample and
record the results of up to 600
million proton collisions per
second, producing roughly 15
petabytes (15 million gigabytes) of
data annually in search of new
fundamental particles. To allow
thousands of scientists from around
the globe to collaborate on the
analysis of these data over the next
15 years (the estimated lifetime of
the LHC), tens of thousands of
computers located around the world
are being harnessed in a distributed
computing network called the Grid.
Within the Grid, described as the
most powerful supercomputer
system in the world, the avalanche
of data will be analyzed, shared, re-
purposed and combined in
innovative new ways designed to
reveal the secrets of the fundamental
properties of matter.
LHC source:
http://public.web.cern.ch/public/en/LHC
Source:
http://public.web.cern.ch/Public/en/LHC
10. 2-d_soil_temps.csv
surface, and sub-surface soil temperatures (at 2cm and 8cm depths) measured at one location for a few days in order to
calibrate a model of temperature propagation. Surface temperature was measured with an infrared thermometer,
subsurface temperatures with a thermocouple.
----------------------------
5-minute_light_data_for_4_continuous_days_plus_reference.xls
PPF (photosynthetic photon flux = photosynthetically active radiation 400-700nm) measured with an array of photodiodes
calibrated to a Licor sensor, along a linear transect for a few days. used to get an idea of how much light plants along
the transect are receiving.
----------------------------
DATA CO2_of_air_at_different_heights_July_9.xls
concentration of CO2 in the air during the evening for one day, measured with a Licor infrared gas analyzer and a series of
relays and tubes with a pump. used to examine the gradient of CO2 coming from the soil when the air is still during the
evening.
SETS
----------------------------
Fern_light_response.xls
Light response curves for bracken ferns, measured with a Licor photosynthesis system. Fronds are exposed to different light
levels and their instantaneous photosynthesis and conductance is measured. used in conjunction with the induction
data (below) for physiological characterization of the ferns.
----------------------------
La_Selva_species_photosyntheis_table.xls
incomplete data set on instantaneous photosynthesis rates for various tropical understory and epiphytic species grown in a
shade house in Costa Rica.
----------------------------
some manzanita_sapflow_12-5-07_to_7-7-08.xls
instantaneous sap flow data (as temperature differences on a constant temperature heat dissipation probe) for multiple
branches of Manzanita, collected with a datalogger. used to correlate physiological activity with below-ground
examples
measures of root grown and CO2 production.
----------------------------
moisture_release_curves.xls
with “native
percentage of water content, water potential (in MegaPascals) and temperature of soil samples, measured in the laboratory
for calibration of water content with water potential. soil is from the James Reserve in California.
----------------------------
Photosynthetic_induction.xls
metadata” 2
O
C
.
5
3
v
l
d
n
y
h
p
f
s
r
u
o
c
-
e
m
i
t
a
�
m/2/s and light level is probably 1000 micromoles. used to determine physiological characteristics of bracken ferns.
----------------------------
run_2_24-h_data_for_mesh.xls
measurements of micrometeorological parameters on a moving shuttle, going from a clearing across a forest edge and into the
forest for about 30 meters. Pyronometers facing up and down, pyrgeometer facing up and down, PAR, air temperature,
relative humidity. Also data from a station fixed in the clearing and some derived variables calculated. used for
examining edge effects in forests.
----------------------------
Segment_of_wallflower_compare_colorspaces_blur.xls
pixel counts from images of wallflowers that were segmented into flower/not-flower under different color spaces.
segmentation was made using a probability matrix of hand-segmented images. used to automatically count flowers in
images collected after this training data was collected (and used to determine the best color space for this task).
12. “A mishmash of non-standardized
databases of raw results and unevenly
reported study designs is not a strong
foundation for clinical research data
sharing.”
Sim, et al “Keeping Raw Data in Context” (letter to) Science VOL 323
6 FEBRUARY 2009 www.sciencemag.org
13. The “small science,” independent investigator approach traditionally has
characterized a large area of experimental laboratory sciences, such as
chemistry or biomedical research, and field work and studies, such as
biodiversity, ecology, microbiology, soil science, and anthropology. The data
or samples are collected and analyzed independently, and the resulting data
independently
sets from such studies generally are heterogeneous and unstandardized, with
unstandardized
few of the individual data holdings deposited in public data repositories or
openly shared.
The data exist in various twilight states of accessibility, depending on
accessibility
the extent to which they are published, discussed in papers but not revealed, or
just known about because of reputation or ongoing work, but kept under
absolute or relative secrecy. The data are thus disaggregated components of
an incipient network that is only as effective as the individual transactions
that put it together. Openness and sharing are not ignored, but they are not
together
necessarily dominant either. These values must compete with strategic
considerations of self-interest, secrecy, and the logic of mutually beneficial
exchange, particularly in areas of research in which commercial applications
are more readily identifiable.
The Role of Scientific and Technical Data and Information in the Public Domain: Proceedings of a Symposium. Julie
M. Esanu and Paul F. Uhlir, Eds. Steering Committee on the Role of Scientific and Technical Data and Information in the
Public Domain Office of International Scientific and Technical Information Programs Board on International Scientific
Organizations Policy and Global Affairs Division, National Research Council of the National Academies, p. 8
14. By Serge Bloch in NYT: Natalie Anger “Tracking forest creatures on the move.” NYT Feb 2, 2009 SEE:
http://www.nytimes.com/2009/02/03/science/03angier.html?_r=1&scp=1&sq=tracking%20mammals&st=cse
http://www.jamesreserve.edu/webcams.lasso?CameraID=Cam14
15. Rheinardia ocellata, the Crested Argus. Photographed at night by an
automatic camera-trap in the Ngoc Linh foothills (Quang Nam Province).
Courtesy AMNH Center for Biodiversity and Conservation
16.
17.
18. VAQUITA
• AGS Alto Golfo Sustentable STAKEHOLDERS Attorney for Environmental
• Profepa Federal
• ASM American Society of Mammalogists Protection
• CEC Commission for Environmental Cooperation • Secretariat of Agriculture, Livestock, Rural
Development, Fisheries, and Food (Mexico)
• CEDO Intercultural Center for the Study of
Salud Secretariat of Health (Mexico)
Deserts and Oceans
CI Conservation International • COSEWIC Committee on the Status of
•
Endangered Wildlife in Canada
• CIRVA International Committee for the Recovery
• Department of Fisheries and Oceans (Canada)
of the Vaquita
• United States Department of the Interior
• CICESE Centro de Investigación Científica y
Ecuación Superior de Ensenada • European Cetacean Society
• CILA International Boundary and Water • US Environmental Protection Agency
Commission • US Food and Drug Administration
• CITES Convention on International Trade in • GEF Global Environmental
Endangered Species of Wild Fauna and Flora • IBWC International Boundary and Water
• Conagua National Water Commission Commission
• Conanp National Commission for Protected • National Institute of Ecology, Semarnat
Natural Areas, • Inapesca National Fisheries Institute, Sagarpa
• Semarnat (Comisión Nacional de Áreas • IUCN World Conservation Union
Naturales Protegida—Semarnat) • International Whaling Commission
• Conapesca National Fisheries and Aquaculture
• Local Economic and Employment Development
Commission program
• Sagarpa (Comisión Nacional de Pesca y • United States Marine Mammal Commission
Acuacultura, Sagarpa)
19. • Marine Stewardship Council
• Somemma Mexican Society for Marine
• NAMPAN North American Marine Protected
Mammalogy
Areas Network (CEC)
• SWFSC Southwest Fisheries Science Center( US
• US National Academy of Sciences
NMFS, NOAA)
• North American Wildlife Enforcement Group
• The Nature Conservancy
(CEC)
• Universidad Autónoma de Baja California Sur
• US National Marine Fisheries Service, NOAA,
Department of Commerce • University of California
• US National Oceanic and Atmospheric • United Nations
Administration, Department of Commerce • United States Coast Guard
• United States National Ocean Service (NOAA) • United States Fish and Wildlife Service
• PACE Species Conservation Action Programs, • World Wildlife Fund
Conanp
• PGR Attorney General Office (Mexico)
• POEMGC Marine Ecological Planning of the Gulf
of California Program, Semarnat
• Procer Conservation Program for Species at
Risk
• Secretariat of Economy (Mexico)
• Sectur Secretariat of Tourism (Mexico)
• Sedesol Secretariat for Social Development
(Mexico)
• Semar Secretariat of the Navy
• Semarnat Secretariat of the Environment and
Natural Resources
• Society for Marine Mammalogy
• Solamac Latin American Society for Aquatic
Mammals
22. OECD Follow Up Group on Issues of Access to Publicly Funded Research Data. Promoting Access
to Public Research Data for Scientific,Economic, and Social Development: Final Report March 2003
23. “Research Commons”
The Public Domain
Knowledge
Commons
THE ROLE OF SCIENTIFIC AND TECHNICAL DATA AND INFORMATION IN THE PUBLIC DOMAIN PROCEEDINGS OF A
SYMPOSIUM Julie M. Esanu and Paul F. Uhlir, Editors Steering Committee on the Role of Scientific and Technical Data and Information
in the Public Domain Office of International Scientific and Technical Information Programs Board on International Scientific
Organizations Policy and Global Affairs Division, National Research Council of the National Academies, p. 5
24. What is the “logical structure” of incentives
for these
institutions/ organizations?
25. The Social Enterprise Spectrum
Purely Philanthropic Purely Commercial
Appeal to Mixed Motives Appeal to Self
Motives Goodwill
Interest
Mission
Mission and Market Driven
Driven
Methods Market Driven
Social
Goals Value Social and Economic Value
Economic Value
JG Dees, “Enterprising Non-profits" in Harvard Business Review on Non-Profits Harvard, Cambridge, 1999, p.147
26. The Social Enterprise Spectrum: Key Stakeholders
Purely Philanthropic Purely Commercial
Beneficiaries Pay Nothing Mixed Market rate prices
Capital Donations and Mixed Market Rate Capital
Grants (TAXES?)
Workforce Nonprofit Prof’s / Mixed Market Rate Compensations
Volunteers
Suppliers In-Kind Donations Mixed / Market Rate Prices
Special Discounts
JG Dees, “Enterprising Non-profits" in Harvard Business Review on Non-Profits Harvard, Cambridge, 1999, p.147
28. Stages of Digital Library Development
Stage Date Sponsor Purpose
NSF/ARPA/NASA
I: Experiments on collections of digital materials
1994
Experimental
1998/199
II: Begin to consider custodianship, sustainability, user
9 NSF/ARPA/NASA, DLF/CLIR
Developing communities
?
Funded through normal
III: Mature Real sustainable interoperable digital libraries
channels?
Howard Besser. Adapted from The Next Stage: Moving from Isolated Digital Collections to
Interoperable Digital Libraries by First Monday, volume 7, number 6 (June 2002),
URL: http://firstmonday.org/issues/issue7_6/besser/index.html
29. “…government is not the solution to our problem;
government is the problem.”
Ronald Reagan
First Inaugural Address
January 20, 1981
http://www.reaganlibrary.com/reagan/speeches/first.asp
For much of the past 30 years we have worked in a
climate of increasing concern and skepticism
about public investment and public science…
31. Is scientific knowledge a “commodity” ???
???
Julian Birkinshaw and Tony Sheehan, “Managing the Knowledge Life Cycle,”
MIT Sloan Management Review, 44 (2) Fall, 2002: 77.
32. United States Patent
1,781,541
Nov. 11, 1930
ALBERT EINSTEIN, OF
BERLIN, AND LEO SZILARD,
OF BERLIN-WILMERSDORF,
GERMANY.
ASSIGNORS TO
ELECTROLUX SERVEL
CORPORATION, OF NEW
YORK, N.Y., A
CORPORATION OF
DELAWARE
REFRIGERATION
Application filed December
16,1927. Serial No.240,566,
and in Germany December 16,
1926.
http://www.bekkoame.ne.jp/~o-pat/ein-zu2.htm
33. References to “Intellectual Property”
in U.S. federal cases
“Professor Hank Greely” Cited in Lessig, L. The future of ideas: the fate of the commons in
a connrcted world. NY, Random House, 2001. P. 294.
34. Differing Interpretations of IPR Regulation
Current Norms Maximalists
Reductionists Expansionists
BENEFITS
Intellectual Property Rights
Brotherhood of Painters, Decorators, and Paperhangers of America.; Screen Cartoonists Local Union No. 852
(Hollywood, Calif.); Animation Guild and Affiliated Optical Electronic and Graphic Arts, Local 839 I.A.T.S.E. (North
Hollywood, Los Angeles, Calif.); Motion Pictures Screen Cartoonists Local 839, I.A.T.S.E.
36. The ethical case for sharing
scientific knowledge resources
has long been well established!
37. “The field of knowledge is the common
property of all mankind “
Thomas Jefferson 1807
38. Ethical Context for Sharing
• Knowledge Equity as a fundamental good
• Ethos of Science
• Ethos of Conservation
• Human Rights
• Governmental / Organizational Transparency
and Accountability
• Civic Responsibility and Science Literacy
39. “The substantive findings of science are a product of social
collaboration and are assigned to the community. They
constitute a common heritage in which the equity of the
individual producer is severely limited…”
“The scientist’s claim to “his” intellectual “property” is limited to
that of recognition and esteem which, if the institution
functions with a modicum of efficiency, is roughly
commensurate with the significance of the increments brought
to the common fund of knowledge.”
Robert K. Merton, “A Note on Science and Democracy,”
Journal of Law and Political Sociology 1 (1942): 121.
40. “The field of knowledge is the common
property of all mankind “
Thomas Jefferson 1807
41. ALL knowledge? Or perhaps, an Ethical Spectrum ? –
Support for Scientific Knowledge
Commons
Human Health Agriculture Science- [Biotechnology]
Tech
Earth Education [ Nuclear Technology ]
Science/Conse
rvation
43. RIO DECLARATION ON ENVIRONMENT AND
DEVELOPMENT (1992)
Principle 10
Environmental issues are best handled with participation of all
concerned citizens, at the relevant level. At the national
level, each individual shall have appropriate access to
information concerning the environment that is held by
public authorities, including information on hazardous
materials and activities in their communities, and the
opportunity to participate in decision-making processes.
States shall facilitate and encourage public awareness and
participation by making information widely available.
Effective access to judicial and administrative proceedings,
including redress and remedy, shall be provided
44. Convention on Biological Diversity: Article 17
Exchange of Information
1. The Contracting Parties shall facilitate the exchange of
information, from all publicly available sources, relevant to
the conservation and sustainable use of biological
diversity, taking into account the special needs of
developing countries.
2. Such exchange of information shall include exchange
of results of technical, scientific and socio-economic
research, as well as information on training and
surveying programmes, specialized knowledge,
indigenous and traditional knowledge as such and in
combination with the technologies referred to in
Article 16, paragraph 1. It shall also, where feasible,
include repatriation of information.
http://www.biodiv.org/convention/articles.asp?lg=0&a=cbd-17
46. For hundreds of years, libraries have been the
“protected areas” of the knowledge commons.
The “public library” is a commons or zone of “fair
use” that makes knowledge freely and
equitably available to all.
47. “Between 1886 and 1919,
Carnegie’s donations of
more than $40 million paid
for 1,679 new library
buildings in communities
large and small across
America.”
http://www.nps.gov/history/NR/twhp/wwwlps/lessons/50carnegie/50visual3.htm
48. Table 1: Distribution of Carnegie Libraries, 1920
State Pop Libraries Libraries/M State Pop Libraries Libraries/M
AL 2,348,174 14 6.0 MT 548,889 17 31.0
AZ 334,162 4 12.0 NE 1,296,372 69 53.2
AR 1,752,204 4 2.3 NV 77,407 1 12.9
CA 3,426,861 142 41.4 NH 443,083 9 20.3
CO 939,629 35 37.2 NJ 3,155,900 35 11.1
CT 1,380,631 11 8.0 NM 360,350 3 8.3
DE 223,003 0 0 NY 10,385,230 106 10.2
DC 437,571 4 9.1 NC 2,559,123 10 3.9
FL 968,470 10 10.3 ND 646,872 8 12.3
GA 2,895,832 24 8.3 OH 5,759,394 105 18.2
ID 431,866 10 23.2 OK 2,028,283 24 11.8
IL 6,485,280 106 16.3 OR 783,389 31 39.6
IN 2,930,390 164 56.0 PA 8,720,017 58 6.6
IA 2,404,021 101 42.0 RI 604,397 0 0
KS 1,769,257 59 33.3 SC 1,683,724 14 8.3
KY 2,416,630 23 9.5 SD 636,547 25 39.3
LA 1,798,509 9 5.0 TN 2,337,885 13 5.5
ME 768,014 17 22.1 TX 4,663,228 32 6.9
MD 1,449,661 14 9.6 UT 449,396 23 51.2
MA 3,852,356 43 11.2 VT 352,428 4 11.3
MI 3,668,412 61 16.6 VA 2,309,187 3 1.3
MN 2,387,125 65 27.2 WA 1,356,621 43 31.7
MS 1,790,618 11 6.1 WV 1,463,701 3 2.0
MO 3,404,055 33 9.7 WI 2,632,067 63 23.9
MT 548,889 17 31.0 WY 194,402 16 82.3
http://www.nps.gov/history/NR/twhp/wwwlps/lessons/50carnegie/50visual3.htm
49. Irony…?
In fact, policy for sharing knowledge resources
is not a “left”/”right” (or “red”/”blue”) issue…
Robert Minor, St Louis Post-Dispatch (1908)
51. Poder Politico y Conocimiento
Alto
Políticos ???
Responsabilidad y Poder
Administradores
o Gestores
Analistas-
Técnicos
Científicos
Alto
Bajo
Conocimiento (en términos científicos-occidentales)
(Sutton, 1999)
From: Organizaciones que aprenden, paises que aprenden: lecciones y AP en Costa Rica by Andrea
Ballestero Directora ELAP
52. “Science Literacy” ?
“...the capacity to use scientific knowledge, to
identify questions, and to draw evidence-
based conclusions in order to understand
and help make decisions about the
natural world and the changes made to it
through human activity.”
Organization for Economic Cooperation and Development. (1999). Measuring Student
Knowledge and Skills: A New Framework for Assessment. Paris: Author.
http://www.oecd.org/dataoecd/45/32/33693997.pdf
53. An Inconvenient Truth?
“Compared with practical science literacy, the
achievement of a functional level of civic science
literacy is a more protracted endeavor. Yet, it is a
job that sooner or later must be done, for as time
goes on human events will become even more
entwined in science, and science-related public
issues in the future can only increase in number
and in importance. Civic science literacy is a
cornerstone of informed public policy.”
B. S. P. Shen, “Scientific Literacy and the Public Understanding of Science,” in Communication of
Scientific Information, ed. S. Day (Basel: Karger, 1975), 44–52 Quoted in: Jon D. Miller, “The
measurement of civic scientific literacy.” Public Understand. Sci. 7 (1998) 203–223.
http://pascal.iseg.utl.pt/~ccti/Documents/Miller1998.pdf
55. Standards?
An old quip about “standards” notes that the
good thing about them is that there are so
many to choose from…
Why are standards practically necessary?
Whether in the public or private sector, they are
efficient and cost effective.
56.
57. Consequence of a lack of standardization?
Cell Phone Dead Spots
Map of reported cell phone problems in Queens provided by the NY City Dept. of Information,
Technology and Telecommunications.
http://www.queenstribune.com/guides/insiders2004/pages/CellPhoneDeadSpots.htm [07/06/05]
61. A work is “open” if its manner of distribution
satisfies the following conditions
• Access
• Redistribution
• Reuse
• Absence of Technological Restriction
• Attribution
• Integrity
• No Discrimination Against Persons or Groups
• No Discrimination Against Fields of Endeavor
• Distribution of License
• License Must Not Be Specific to a Package
• License Must Not Restrict the Distribution of Other Works
http://opendefinition.org/1.0 [February 20, 2009]
62. 1. Access: The work shall be available as a whole and at no more than a reasonable reproduction cost,
preferably downloading via the Internet without charge. The work must also be available in a
convenient and modifiable form.
[Comment: This can be summarized as 'social' openness - not only are you allowed to get the work but
you can get it. 'As a whole' prevents the limitation of access by indirect means, for example by only
allowing access to a few items of a database at a time.]
2. Redistribution: The license shall not restrict any party from selling or giving away the work either on
its own or as part of a package made from works from many different sources. The license shall not
require a royalty or other fee for such sale or distribution.
3. Reuse: The license must allow for modifications and derivative works and must allow them to be
distributed under the terms of the original work. The license may impose some form of attribution
and integrity requirements: see principle 5 (Attribution) and principle 6 (Integrity) below.
[Comment: Note that this clause does not prevent the use of 'viral' or share-alike licenses that require
redistribution of modifications under the same terms as the original.]
4. Absence of Technological Restriction: The work must be provided in such a form that there are no
technological obstacles to the performance of the above activities. This can be achieved by the
provision of the work in an open data format, i.e. one whose specification is publicly and freely
available and which places no restrictions monetary or otherwise upon its use.
5. Attribution: The license may require as a condition for redistribution and re-use the attribution of the
contributors and creators to the work. If this condition is imposed it must not be onerous. For
example if attribution is required a list of those requiring attribution should accompany the work.
6. Integrity: The license may require as a condition for the work being distributed in modified form that
the resulting work carry a different name or version number from the original work.
http://opendefinition.org/1.0 [February 20, 2009]
63. 7. No Discrimination Against Persons or Groups: The license must not discriminate against any person or
group of persons.
[Comment: In order to get the maximum benefit from the process, the maximum diversity of persons and groups should be
equally eligible to contribute to open knowledge. Therefore we forbid any open-knowledge license from locking anybody
out of the process.]
8. No Discrimination Against Fields of Endeavor: The license must not restrict anyone from making use of the
work in a specific field of endeavor. For example, it may not restrict the work from being used in a
business, or from being used for military research.
[Comment: The major intention of this clause is to prohibit license traps that prevent open source from being used
commercially. We want commercial users to join our community, not feel excluded from it.]
9. Distribution of License: The rights attached to the work must apply to all to whom the work is redistributed
without the need for execution of an additional license by those parties.
[Comment: This clause is intended to forbid closing of the work by indirect means such as requiring a non-disclosure
agreement.]
10. License Must Not Be Specific to a Package: The rights attached to the work must not depend on the work
being part of a particular package. If the work is extracted from that package and used or distributed
within the terms of the work's license, all parties to whom the work is redistributed should have the same
rights as those that are granted in conjunction with the original package.
11. License Must Not Restrict the Distribution of Other Works: The license must not place restrictions on
other works that are distributed along with the licensed work. For example, the license must not insist
that all other works distributed on the same medium are open.
[Comment: Distributors of open knowledge have the right to make their own choices. Note that 'share-alike' licenses are
conformant since those provisions only apply if the whole forms a single work.]
http://opendefinition.org/1.0 [February 20, 2009]
64. http://sciencecommons.org/projects/publishing/open-access-data-protocol/
Protocol for Implementing Open Access Data
1. Intellectual foundation for the protocol
The motivation behind this memorandum is interoperability of scientific data.
The volume of scientific data, and the interconnectedness of the systems under study, makes integration
of data a necessity. For example, life scientists must integrate data from across biology and chemistry
to comprehend disease and discover cures, and climate change scientists must integrate data from
wildly diverse disciplines to understand our current state and predict the impact of new policies.
The technical challenge of such integration is significant, although emerging technologies appear to be
helping. But the forest of terms and conditions around data make integration difficult to legally
perform in many cases. One approach might be to develop and recommend a single license: any data
with this license can be integrated with any other data under this license.
But this approach, which implicitly builds on intellectual property rights and the ideas of licensing as
understood in software and culture, is difficult to scale for scientific uses. There are too many
databases under too many terms already, and it is unlikely that any one license or suite of licenses
will have the correct mix of terms to gain critical mass and allow massive-scale machine integration of
data.
Therefore we instead lay out principles for open access data and a protocol for implementing those
principles, and we distribute an Open Access Data Mark and metadata for use on databases and data
available under a successful implementation of the protocol.
68. US NSF “DataNet” Program
“the full data preservation and access lifecycle”
• “acquisition”
• “documentation”
• “protection”
• “access”
• “analysis and dissemination”
• “migration”
• “disposition”
“Sustainable Digital Data Preservation and Access Network Partners (DataNet) Program
Solicitation” NSF 07-601 US National Science Foundation Office of Cyberinfrastructure
Directorate for Computer & Information Science & Engineering
69. “Sustainable data curation”
“There are several main elements necessary to sustain data curation:
“Robust data storage facilities (hardware and software) that are capable of
accurately handling data migration across generations of media.
“Backup plans, that are tested, so irreplaceable data are not at risk.
Unintended data loss can occur for many reasons: some major causes are:
poor stewardship leading to the loss of metadata to understand where the
data is located and documentation to understand the content, physical
facility and equipment failure (fire, flood, irrecoverable hardware crashes),
accidental data overwrite or deletion.
“Science-educated staff with knowledge to match the data discipline is
important for checking data integrity, choosing archive organization, creating
adequate metadata, consulting with users, and designing access systems
that meet user expectations. Staff responsible for stewardship and curation
must understand the digital data content and potential scientific uses. “
C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge
sharing ,” from the 4th International Digital Curation Conference December 2008 , page 10.
www.dcc.ac.uk/events/dcc-2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]
70. Sustainable data curation (cont.)
“Non-proprietary data formats that will ensure data access capability
for many decades and will help avoid data losses resulting from
software incompatibilities…
“Consistent staffing levels and people dedicated to best practices in
archiving, access, and stewardship…
“National and International partnerships and interactions greatly aids in
shared achievements for broad scale user benefits, e.g. reanalyses,
TIGGE…
“Stable funding not focused on specific projects, but data management
in general…”
C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge
sharing ,” from the 4th International Digital Curation Conference December 2008 , page 10-11.
www.dcc.ac.uk/events/dcc-2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]
72. BCIS (a predecessor):
the Biodiversity Conservation Information System
• Initiated in 1995
• 12 Partner Organizations
• Experimented with Data Sharing
• Published Principles of Data Management (in 3
languages)
74. The Conservation Commons
promotes and enables
conscious, effective and equitable sharing
of knowledge resources
to advance conservation.
75. PRINCIPLES OF THE CONSERVATION COMMONS
Open Access
The Conservation Commons promotes free and open access to data, information
and knowledge for all conservation purposes.
Mutual Benefit
The Conservation Commons welcomes and encourages participants to both use
resources and to contribute data, information and knowledge.
Rights and Responsibilities
Contributors to the Conservation Commons have full right to attribution for any
uses of their data, information, or knowledge, and the right to ensure that the
original integrity of their contribution to the Commons is preserved. Users of the
Conservation Commons are expected to comply, in good faith, with terms of uses
specified by contributors.
http://www.conservationcommons.org/section.php?section=principle&sous-section=endorsement&langue=en
76. Organizations that have formally endorsed the Principles
American Museum of Natural History
National Geographic Society
ARKive: The Wildscreen Trust (UK) (Website of the year)
Nature Protection Trust of Seychelles
BirdLife International
Nature Serve *
BP
PALNet - Protected Areas Learning Network (from WCPA of IUCN)
Centre for Sustainable Watersheds (Canada)
Philippine Society for the Protection of Animals (Web link not available)
Chevron-Texaco
Réseau Africain pour la conservation de la Mangrove (RAM)
Chevron-Texaco Specific Endorsement Letter
Red Hat
CIFOR
Regional Centre for Development Cooperation (RCDC), Centre for Forestry and Gover
CONABIO - Mexico
Rio Tinto
Conservation Biology Institute, USA
Salim Ali Centre for Ornithilogy and Natural History (SACON-India)
Conservation International *
Shell Exploration
CRIA - Brazil *
Society for Conservation GIS
DIDG Information Systems Ltd. (Australia)
South African National Biodiversity Institute - SANBI *
Earth Conservation Toolbox
The African Conservation Foundation
Environmental Education Center - Russia "Zapoveniks“
The Big Sky Conservation Institute
Erawan Interactive: Digital Publishing
The Natural History Museum, London
ETI BioInformatics
The Nature Conservancy *
Fauna & Flora International
The Rainforest Alliance
Friends of Nature - Bolivia
The Smithsonian Institution
GBIF - Global Biodiversity Information Facility *
The World Conservation Union, Pakistan
Global Invasive Species Programme (GISP)
The Zoological Society of London
Global Transboundary Protected Areas Network of IUCN
TRAFFIC International
GreenFacts
TROPI-DRY: forest research network (based in U.Alberta) UNDP
INBio, National Biodiversity Institute of Costa Rica
UNEP WCMC
Information Center for the Environment (ICE), U. of California, Davis
Unesco
INSnet, Internetwork for Sustainability
University of Maryland - Global Land Cover Facility *
Instituto de Biología, U.N.A.M. Mexico
US NASA *
Instituto de Investigación de Recursos Biológicos Alexander von Humboldt (Colombia)
Wetlands of India (hosted by SACON-India)
International Center for Himalayan Biodiversity (link unavailable for now)
Wild Bird Club of the Philippines
International Commission on Zoological Nomenclature
Wildlife Conservation Society
Invasive Species Specialist Group of IUCN/SSC (Species Survival Commission) World Commission on Protected Areas (WCPA of IUCN)
IUCN - The World Conservation Union * WWF Brazil
My Nature (based in Romania) WWF International
77. Commons-Consistent Initiatives and Projects
• CONSERVEONLINE SEE: http://conserveonline.org/
• Global Biodiversity Information Facility (GBIF) SEE: http://www.gbif.org/
• World Database on Protected Areas (WDPA) SEE: http://www.unep-
wcmc.org/wdpa/
• Biodiversity Heritage Library (BHL) SEE: http://bhl.si.edu/
• Protected Areas Learning Network (PALNet) SEE: http://www.parksnet.org/
New Initiatives:
Development of open data standards for Biodiversity (with OASIS
SEE: http://www.oasis-open.org/home/index.php )
Conservation GIS developments (GLCF / Univ of Md.)
World Conservation Base Map
http://conserveonline.org/workspaces/conservation.basemap
Development of model contractual language supporting commons principles
San Francisco Bay Conservation Commons (Calif. Conservation Commons?)
SEE: http://sfbayarea.calconservationcommons.net/
78. As a result of the Darwin Core analysis…
GBIF UDDI Registry
* registration
* update information
________________________________________
Data Providers 259
Datasets 7481
Searchable Records 147,539,975
http://www.gbif.org/ [clipped Oct 8, 2008]
79. How do we Incentivize Change ?
• Individually
• Professionally / Disciplinarily
• Organizationally / Institutionally
81. Cost Benefit Calculations of Change
High Cost
Cell A Cell B
-- Clear, direct benefits -- Intangible, indirect benefits
--Change is difficult --Change if difficult
--Balancing communication -- Try to reposition into “Cell
with a strong support D” – leveraging enthusiasm /
system is key supply-side persuasion
Tangible Intangible
Societal
Cell C Cell D
Personal
-- Clear, direct benefits -- Intangible direct benefits Benefit
Benefit
-- Change is easy -- Change is easy
-- Communication & -- Ultimate benefit should
information are key be stressed
--Convenience is key
Low Cost
Adapted from VK Rangan et al. “Do better at doing good,” in in Harvard Business Review on Non-Profits Harvard, Cambridge,
1999, p. 173- ff.
82. Personal Incentives for Sharing?
(The “Reputational Economy”)
• Ethics and the ethos of conservation or of
science
– Ethical imperative
• The “Reputation Economy”
– Personal recognition: priority/ prestige ( evidence
of substantial increases in citation)
– Professional credential for hiring and for job
security (tenure & promotion) (also requires
professional/disciplinary change)
83. Individual’s willingness to share:
the Core functions of Scholarly Communication
• “Registration, which allows claims of precedence for a
scholarly finding.
• “Certification, which establishes the validity of a registered
scholarly claim.
• “Awareness, which allows participants in the scholarly system
to remain aware of new claims and findings.
• “Archiving, which preserves the scholarly record over time.
• “Rewarding, which rewards participants for their
performance in the communication system based on metrics
derived from that system.
Roosendaal, H., Geurts, P in Cooperative Research Information Systems in Physics (Oldenburg, Germany, 1997).
84. The Benefits of Open Access
“The influence of OA is more modest than many
have proposed, at ~8% for recently published
research, but our work provides clear support
for its ability to widen the global circle of
those who can participate in science and
benefit from it. “
J. A. Evans and J. Reimer, Open access and global participation in science.
Science v. 323 20 February, 2009 p. 1025.
86. • Expectations of sharing vary by discipline
• In “big science” (astrophysics / astronomy /
meteorology / oceanography / genomics) sharing is
expected (if not required) and contributions to a
common fund of knowledge are assumed (See also:
GENBANK )
– Standards are relatively clear
– Mechanisms for sharing are well-developed
• In “small science” such capacity is weaker
87. Small Science: Data Deposit and Access
• Data are typically held in many formats
• Discovery of data is very weakly supported by
standards-development
• Access to and use of data are highly variable
• [ However progress has been made respecting
museum specimen data in the past 20 years [SEE for
ex. : GBIF and many allied projects] ]
• Some progress has been made respecting
observational and other data
• Ecological and conservation field data remain highly
problematic
88. Data Citation and Access?
-- Even common standards for data citation are weak
Hence for example:
M. Altman and G. King “A Proposed Standard for the
Scholarly Citation of Quantitative Data” D-Lib Magazine
March/April 2007 Vol.13:3/4
http://www.dlib.org/dlib/march07/altman/03altman.html
90. The Social Enterprise Spectrum
Purely Philanthropic Purely Commercial
Appeal to Mixed Motives Appeal to Self
Motives Goodwill
Interest
Mission
Mission and Market Driven
Driven
Methods Market Driven
Social
Goals Value Social and Economic Value
Economic Value
JG Dees, “Enterprising Non-profits" in Harvard Business Review on Non-Profits Harvard, Cambridge, 1999, p.147
91. Perhaps, an Ethical Spectrum ? –
Support for Scientific Knowledge Commons
Human Health Agriculture Biotechnology
Earth Education Nuclear Technology
Science
/Conservation
93. “NATIVE”
METADATA
DEAD HARBOR SEAL
and
5
CALIFORNIA
CONDORS !!!
94.
95.
96. The Science of Science Policy: a Federal Research Roadmap. Report on the
Science of Science Policy to the Subcommittee on Social, Behavioral and
Economic Sciences. Committee on Science. National Science and Technology
Council. Office of Science and Technology Policy. November, 2008. p.11.
103. Disintermediation of the traditional value chain:
“…a clash of business models.” -- Kevin Kelly
“But a new regime of digital technology has now disrupted all business
models based on mass-produced copies, including individual livelihoods
of artists. The contours of the electronic economy are still emerging, but while they do,
the wealth derived from the old business model is being spent to try to protect that old
model, through legislation and enforcement. Laws based on the mass-produced
copy artifact are being taken to the extreme, while desperate measures
to outlaw new technologies in the marketplace "for our protection" are
introduced in misguided righteousness. (This is to be expected. The fact is, entire
industries and the fortunes of those working in them are threatened with demise.
Newspapers and magazines, Hollywood, record labels, broadcasters and many hard-working
and wonderful creative people in those fields have to change the model of how they earn
money. Not all will make it.)”
Kevin Kelly, “Scan This Book!” NYT. Published: May 14, 2006
105. Ralph Baxter, CEO of security company ClusterSeve: "Although fraud is not the primary reason
for the precarious state of the current economy, it is still a cause of concern to banks because
most of them incorrectly believe their current security measures are adequate and they are
preoccupied with surviving and may have inadvertently lowered their guard when it comes
to fraud.”
• ,
“Spreadsheets where fraud is often committed, are very accident prone, especially when
they have thousands of lines of data. Baxter notes, "If for example, someone changes one
cell to boost a future bonus, the bank will still need to prove the employee did not make an
'honest' mistake and intended to commit fraud."
• “To make matters worse in detecting this kind of fraud, the departments responsible for
rooting out fraud tend to have very high turnover and are considered "low priority" for
funding and training. Baxter says he sees morale is usually low, and the high turnover
requires higher than average training resources, which aren't often available. This further
reduces the effectiveness of institutions' security measures.
106. There are three types of fraud that are growing in popularity:
Presentation fraud - is an increasingly common form of criminal activity and involves modifying
the way a spreadsheet is viewed. Sometimes whole lines of data are made invisible, or a
number in a cell is displayed using a white font on a similarly colored background.
"Fraudsters with a great deal of experience using Excel can lay a false number over the real
one. This type of fraud is quick and easy to do and occurs right before bonuses are
calculated," he Baxter says.
Adjustment fraud - involves incorrectly recording numbers on a spreadsheet as part of the
process of updating information about the markets a bank is involved in. Ongoing
adjustments are a normal part of the banking business and an employee who is committing
adjustment fraud may actually appear to be doing a very thorough job. This type of fraud
involves making multiple false data entries over a period of time and ultimately removing all
evidence of fraud by the end of the manipulation process.
Gradual fabrication fraud - involves inserting false data that is only slightly higher or lower than
the actual number so that it does not attract attention from other employees or auditors.
This scheme is meant to slowly inflate a bank's assets or worth. Once the false numbers have
been accepted and a higher bonus check issued, the employee corrects the false number
slowly, over time, once again to avoid raising any suspicion.
108. “Barclays Spreadsheet Error
Results In Lehman Chaos”
“It pays to have good spreadsheet skills. We're just now learning
that Barclays wound up with scores of Lehman Brothers
trading positions that it never meant to buy when a pair of
very junior lawyers attempted to reformat an Excel
spreadsheet and convert it into a pdf document. The result
was that a "hidden" column of 179 contracts no intended to
be purchased became unhidden, and when Barclays filed the
document with the court it wound up picking up the
contracts.”
http://www.businessinsider.com/2008/10/barclays-excel-error-results-in-lehman-chaos
John Carney|Oct. 16, 2008, 8:49 AM