SlideShare une entreprise Scribd logo
1  sur  42
What does “Full Life-Cycle” Data
Management Mean ?
“BIG DATA”
US Office of Personnel Management
March 14, 2013
“As required by the National Archives and
Records Administration (NARA) in 36 CFR
Chapter XII, Subchapter B, Records
Management, Federal agencies are
responsible for creating and maintaining
authentic, reliable, and usable records and
ensure that they remain so for the length of
their authorized retention period.”
http://www.archives.gov/records-mgmt/toolkit/pdf/ID373.pdf
First, a brief digression concerning graphics…
Edward Tufte’s favorite…
DISCRETION…
Exercise care in the selection of graphic formats
– not all graphics enhance understanding
some may confuse…
Lacking effective compound graphics, simplicity
and the use of multiple graphic images may be
more effective.
The New York Times often produces exemplary
graphics that compress complex data and
complex relationships…
NYT: “LEADING CAUSES OF CANCER DEATHS”
http://www.nytimes.com/imagepages/2007/07/29/health/29cancer.graph.web.html
“Data” ? [technical definition]
“…’data’ are defined as any information that can be stored in
digital form and accessed electronically, including, but not
limited to, numeric data, text, publications, sensor streams,
video, audio, algorithms, software, models and simulations,
images, etc.”-- Program Solicitation 07-601
“Sustainable Digital Data Preservation and Access Network Partners (DataNet)”
Taken in this broadest possible sense, “data” are thus simply
electronic coded forms of information. And virtually anything
can be represented as “data” so long as it is electronically
machine-readable.
“Data” [epistemicdefinition – addressing the meaning of data]
“Measurements, observations or descriptions of
a referent -- such as an individual, an event, a
specimen in a collection or an
excavated/surveyed object -- created or
collected through human interpretation
(whether directly “by hand” or through the use
of technologies)”
-- AnthroDPA Working Group on Metadata (May, 2009)
[funded by Wenner-Gren Foundation and US NSF]
“Experiments to determine the density of the earth,” by Henry Cavendish, ESQ., F.R.S. AND A.S. Read
June 21, 1798 (From the Philosophical Transactions of the Royal Society of London for the year
1798, Part II. , pp. 469-526)
From: http://www.archive.org/details/lawsofgravitatio00mackrich
USDA – NATURAL RESOURCES CONSERVATION SERVICE
2 12.365 1196796112 2018.8 0.5585 0.51029 0.55517 0.54354 0.6067 0.52858 0.55351 0.59008 0.59506 0.60337 0.56514 12/4/07 11:21 4.47351
3 12.348 1196796232 2017.9 0.55682 0.51028 0.5535 0.54352 0.60669 0.52857 0.55017 0.59007 0.59505 0.60336 0.56513 12/4/07 11:23 0 4.47490
4 12.357 1196796352 2018.6 0.55514 0.51027 0.55348 0.54351 0.60501 0.52855 0.55016 0.59005 0.59504 0.60501 0.56512 12/4/07 11:25 0 4.47628
5 12.354 1196796472 2017.6 0.55514 0.51026 0.55181 0.5435 0.60334 0.52855 0.54849 0.59004 0.59503 0.60334 0.56511 12/4/07 11:27 0 4.47767
6 12.334 1196796592 2018.3 0.55347 0.51026 0.55015 0.5435 0.60333 0.52854 0.54682 0.59004 0.59502 0.605 0.56511 12/4/0711:29 0 4.47906
7 12.34 1196796712 2018.5 0.55014 0.50859 0.55014 0.54349 0.60332 0.53019 0.54349 0.59003 0.59501 0.60498 0.56676 12/4/07 11:31 0 4.48045
8 12.337 1196796832 2017.8 0.55013 0.50692 0.55013 0.54348 0.60332 0.53019 0.54182 0.59002 0.59501 0.60498 0.56675 12/4/07 11:33 0 4.48184
9 12.328 1196796952 2017.5 0.5468 0.50691 0.5468 0.54347 0.60331 0.53018 0.53849 0.59001 0.595 0.60497 0.56674 12/4/0711:35 0 4.48323
10 12.323 1196797072 2017 0.54679 0.50524 0.54679 0.54347 0.59998 0.53017 0.53682 0.59 0.59499 0.60496 0.56674 12/4/07 11:37 0 4.48462
11 12.328 1196797192 2018.9 0.54679 0.50191 0.54512 0.5418 0.59665 0.53017 0.53349 0.59 0.59498 0.60496 0.56673 12/4/0711:39 0 4.48601
12 12.319 1196797312 2017.7 0.54345 0.49857 0.54178 0.54178 0.59663 0.53015 0.53015 0.58998 0.5933 0.60327 0.56671 12/4/07 11:41 0 4.48740
13 12.311 1196797432 2017.3 0.54343 0.4969 0.54011 0.54177 0.59661 0.53014 0.52848 0.58997 0.59329 0.6016 0.5667 12/4/07 11:43 0 4.48878
14 12.316 1196797552 2018.6 0.5401 0.49357 0.53678 0.54176 0.59328 0.53013 0.5268 0.58995 0.59328 0.60325 0.56669 12/4/07 11:45 0 4.49017
15 12.31 1196797672 2016.8 0.53844 0.4919 0.53511 0.54176 0.59494 0.53013 0.52514 0.58995 0.59328 0.60325 0.56503 12/4/07 11:47 0 4.49156
16 12.31 1196797792 2017.1 0.53676 0.48856 0.53343 0.54174 0.59326 0.53011 0.5218 0.58993 0.59326 0.60323 0.56501 12/4/07 11:49 0 4.49295
17 12.31 1196797912 2017.1 0.53342 0.48523 0.5301 0.54173 0.59324 0.5301 0.51846 0.58826 0.59324 0.60321 0.56499 12/4/07 11:51 0 4.49434
18 12.301 1196798031 2017.5 0.53174 0.48521 0.52842 0.53839 0.59156 0.53008 0.51845 0.58824 0.59323 0.6032 0.56498 12/4/07 11:53 0 4.49573
19 12.301 1196798151 2016.3 0.53007 0.48188 0.52509 0.53838 0.59155 0.53007 0.51512 0.58823 0.59321 0.60152 0.5633 12/4/07 11:55 0 4.49712
20 12.303 1196798271 2016.6 0.5284 0.47855 0.52175 0.53837 0.59154 0.5284 0.5151 0.58821 0.59154 0.60151 0.56163 12/4/07 11:57 0 4.49851
sbid battery datetime heater_voltage Manz1Sap1 Manz1Sap2 Manz1Sap3 Manz1Sap4 Manz2Sap5 Manz2Sap6 Manz2Sap7 Manz3Sap10 Manz3Sap8 Manz3Sap9 Manz4Sap11 timestamp Datagap Julian
manzanita_sapflow_12-5-07_to_7-7-08.xls
instantaneous sap flow data (as temperature differences on a constant temperature heat
dissipation probe) for multiple branches of Manzanita, collected with a datalogger. used to
correlate physiological activity with below-ground measures of root grown and CO2 production.
Datum: “0.59998”
DATA
SETS
some
examples
with “native
metadata”
2-d_soil_temps.csv
surface, and sub-surface soil temperatures (at 2cm and 8cm depths) measured at one location for a few days in order to
calibrate a model of temperature propagation. Surface temperature was measured with an infrared thermometer,
subsurface temperatures with a thermocouple.
----------------------------
5-minute_light_data_for_4_continuous_days_plus_reference.xls
PPF (photosynthetic photon flux = photosynthetically active radiation 400-700nm) measured with an array of photodiodes
calibrated to a Licor sensor, along a linear transect for a few days. used to get an idea of how much light plants along
the transect are receiving.
----------------------------
CO2_of_air_at_different_heights_July_9.xls
concentration of CO2 in the air during the evening for one day, measured with a Licor infrared gas analyzer and a series of
relays and tubes with a pump. used to examine the gradient of CO2 coming from the soil when the air is still during the
evening.
----------------------------
Fern_light_response.xls
Light response curves for bracken ferns, measured with a Licor photosynthesis system. Fronds are exposed to different light
levels and their instantaneous photosynthesis and conductance is measured. used in conjunction with the induction
data (below) for physiological characterization of the ferns.
----------------------------
La_Selva_species_photosyntheis_table.xls
incomplete data set on instantaneous photosynthesis rates for various tropical understory and epiphytic species grown in a
shade house in Costa Rica.
----------------------------
manzanita_sapflow_12-5-07_to_7-7-08.xls
instantaneous sap flow data (as temperature differences on a constant temperature heat dissipation probe) for multiple
branches of Manzanita, collected with a datalogger. used to correlate physiological activity with below-ground
measures of root grown and CO2 production.
----------------------------
moisture_release_curves.xls
percentage of water content, water potential (in MegaPascals) and temperature of soil samples, measured in the laboratory
for calibration of water content with water potential. soil is from the James Reserve in California.
----------------------------
Photosynthetic_induction.xls
a time-course of photosynthetic induction for a leaf over 35 minutes. instantaneous photosynthesis measured as �mol CO2
m/2/s and light level is probably 1000 micromoles. used to determine physiological characteristics of bracken ferns.
----------------------------
run_2_24-h_data_for_mesh.xls
measurements of micrometeorological parameters on a moving shuttle, going from a clearing across a forest edge and into
the forest for about 30 meters. Pyronometers facing up and down, pyrgeometer facing up and down, PAR, air
temperature, relative humidity. Also data from a station fixed in the clearing and some derived variables calculated.
used for examining edge effects in forests.
----------------------------
Segment_of_wallflower_compare_colorspaces_blur.xls
pixel counts from images of wallflowers that were segmented into flower/not-flower under different color spaces.
segmentation was made using a probability matrix of hand-segmented images. used to automatically count flowers in
images collected after this training data was collected (and used to determine the best color space for this task).
Data Development:
“Data Reduction - Processing Level Definitions” (an example)
http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19860021622_1986021622.pdf
Report of the EOS Data Panel Vol IIA, NASA, 1986 (Tech Memorandum 87777)
Tom Moritz, OPM “Big Data” July, 2012
Data in Public Service
The Federal government manages data in
satisfaction of three primary requirements:
1) To account transparently for government
operations
2) To provide citizen access to the products of
government activities
3) To fulfill mandated tasks for which the
government has no original data (this
requires data acquisition)
The basic goal is to make all data held by the US
government fully reliable and “audit-worthy”.
All data and all derived data products should be able
to withstand exacting examination and testing.
All descriptive information required for auditing
should be fully disclosed, readily available and
easily accessible in standard reporting formats.
http://www2.cec.org/nampan/species/vaquita
• AGS Alto Golfo Sustentable
• ASM American Society of Mammalogists
• CEC Commission for Environmental Cooperation
• CEDO Intercultural Center for the Study of
Deserts and Oceans
• CI Conservation International
• CIRVA International Committee for the Recovery
of the Vaquita
• CICESE Centro de Investigación Científica y
Ecuación Superior de Ensenada
• CILA International Boundary and Water
Commission
• CITES Convention on International Trade in
Endangered Species of Wild Fauna and Flora
• Conagua National Water Commission
• Conanp National Commission for Protected
Natural Areas,
• Semarnat (Comisión Nacional de Áreas
Naturales Protegida—Semarnat)
• Conapesca National Fisheries and Aquaculture
Commission
• Sagarpa (Comisión Nacional de Pesca y
Acuacultura, Sagarpa)
• Profepa Federal Attorney for Environmental
Protection
• Secretariat of Agriculture, Livestock, Rural
Development, Fisheries, and Food (Mexico)
Salud Secretariat of Health (Mexico)
• COSEWIC Committee on the Status of
Endangered Wildlife in Canada
• Department of Fisheries and Oceans (Canada)
• United States Department of the Interior
• European Cetacean Society
• US Environmental Protection Agency
• US Food and Drug Administration
• GEF Global Environmental
• IBWC International Boundary and Water
Commission
• National Institute of Ecology, Semarnat
• Inapesca National Fisheries Institute, Sagarpa
• IUCN World Conservation Union
• International Whaling Commission
• Local Economic and Employment Development
program
• United States Marine Mammal Commission
VAQUITA STAKEHOLDERS
• Marine Stewardship Council
• NAMPAN North American Marine Protected
Areas Network (CEC)
• US National Academy of Sciences
• North American Wildlife Enforcement Group
(CEC)
• US National Marine Fisheries Service, NOAA,
Department of Commerce
• US National Oceanic and Atmospheric
Administration, Department of Commerce
• United States National Ocean Service (NOAA)
• PACE Species Conservation Action Programs,
Conanp
• PGR Attorney General Office (Mexico)
• POEMGC Marine Ecological Planning of the Gulf
of California Program, Semarnat
• Procer Conservation Program for Species at Risk
• Secretariat of Economy (Mexico)
• Sectur Secretariat of Tourism (Mexico)
• Sedesol Secretariat for Social Development
(Mexico)
• Semar Secretariat of the Navy
• Semarnat Secretariat of the Environment and
Natural Resources
• Society for Marine Mammalogy
• Solamac Latin American Society for Aquatic
Mammals
• Somemma Mexican Society for Marine
Mammalogy
• SWFSC Southwest Fisheries Science Center( US
NMFS, NOAA)
• The Nature Conservancy
• Universidad Autónoma de Baja California Sur
• University of California
• United Nations
• United States Coast Guard
• United States Fish and Wildlife Service
• World Wildlife Fund
Values: “Data Quality” ???
In the most general colloquial terms, “Data Quality” is the fundamental issue
of concern to scientists, policy makers, managers/decision makers and
the general public.
“Quality” can be considered in terms of three primary values:
• Validity: logical in terms of intended hypothesis to be tested (all potential
types of data that could be chosen should be weighed for probative
value,,,)
• Competence (Reliability) : consideration of the proper choice of expert
staff, methods, apparatus/gear, calibration, deployment and operation
• Integrity: the maintenance of original integrity of data as well as tracking
and documenting of all transformations and sequences of transformation
of data
Auditing – A Case History
“InterAcademyCouncil Names IPCC Review Committee”
“AMSTERDAM, Netherlands – The InterAcademy Council (IAC), an
organization of the world’s science academies, announced today that
Harold T. Shapiro, an economist and former president of Princeton
University and the University of Michigan, will chair a 12-member
committee to conduct an independent review of the procedures and
processes of the Intergovernmental Panel on Climate Change (IPCC). The
review was requested in March by U.N. Secretary-General Ban Ki-moon
and IPCC Chair Rajendra K. Pachauri.
“The committee will review IPCC procedures for preparing its assessment
reports. Among the issues to be reviewed are data quality assurance and
control; the type of literature that may be cited in IPCC reports; expert and
government review of IPCC materials; handling of the full range of
scientific views; and the correction of errors that are identified after a
report has been completed. The committee also will review overall IPCC
processes, including management functions and communication strategies
(the full statement of task is available at
www.interacademycouncil.net/ipccreview).”
http://reviewipcc.interacademycouncil.net/IACNamesIPCCReviewCommittee.html
Climate Change Assessments:
Review of the Processes and Procedures of the IPCC
(InterAcademyCouncil)
U.N. Press Conference Aug. 30, 2010
“Opening Statement”
by Harold T. Shapiro
President Emeritus and Professor of Economics
and Public Affairs, Princeton University and
Chair, InterAcademy Council Committee to
Review the IPCC
http://reviewipcc.interacademycouncil.net/OpeningStatement.html
US BLM Manual 1283
”Data Administration and Management”
“Every employee is responsible for the quality, integrity,
relevancy, accuracy, and currency of the data that is
created, collected, or maintained, whether the data are
in manual (paper copy) or electronic format. Managers
will employ good data management practices to
manage the data collected and maintained by their
program specialists. The program specialist who uses,
manages, and distributes the data must ensure that
data are collected according to established standards
and maintained to ensure accuracy and integrity. This
section identifies specific responsibilities in support of
the data management program.”
Rel. No. 1-1742 Supersedes Rel. No. 1-1678 Date: 7/10/2012
http://www.blm.gov/pgdata/etc/medialib/blm/wo/Information_Resources_Management/pol
icy/blm_manual.Par.77674.File.dat/BLM_1283_manual_final.pdf
A Gallery of Efforts to Depict
Full Life Cycle Data Management
Source: DDI Structural Reform Group. “DDI Version 3.0 Conceptual Model." DDI
Alliance. 2004. Accessed on 11 August 2008.
http://www.icpsr.umich.edu/DDI/committee-info/Concept-Model-WD.pdf
US NSF “DataNet” Program
“the full data preservation and access lifecycle”
• “acquisition”
• “documentation”
• “protection”
• “access”
• “analysis and dissemination”
• “migration”
• “disposition”
“Sustainable Digital Data Preservation and Access Network Partners (DataNet) Program Solicitation” NSF 07-
601 US National Science Foundation Office of Cyberinfrastructure Directorate for Computer & Information
Science & Engineering
www.dcc.ac.uk/docs/publications/DCCLifecycle.pdf
“JISC DCC Curation Lifecycle Model”
Tom Moritz, OPM “Big Data” July, 2012http://www.dcc.ac.uk/docs/publications/DCCLifecycle.pdf
http://wiki.esipfed.org/images/c/c4/IWGDD.ppt
Interagency Working
Group on Digital Data
IWGDD“DIGITAL DATA LIFE CYCLE”
Exhibit B-2. Life Cycle Functions for Digital Data*
• Plan
−− Determine what data need to be created or collected to support a research agenda or a mission function
-- Identify and evaluate existing sources of needed data
−− Identify standards for data and metadata format and quality
−− Specify actions and responsibilities for managing the data over their life cycle
• Create
−− Produce or acquire data for intended purposes
−− Deposit data where they will be kept, managed and accessed for as long as needed to support their intended
purpose
−− Produce derived products in support of intended purposes; e.g., data summaries, data aggregations, reports,
publications
• Keep
−− Organize and store data to support intended purposes
-- Integrate updates and additions into existing collections
-- Ensure the data survive intact for as long as needed
• Acquire and implement technology
−− Refresh technology to overcome obsolescence and to improve performance
−− Expand storage and processing capacity as needed
−− Implement new technologies to support evolving needs for ingesting, processing, analysis, searching and accessing
data
• Disposition
−− Exit Strategy: plan for transferring data to another entity should the current repository no longer be able to keep it
−− Once intended purposes are satisfied, determine whether to destroy data or transfer to another organization
suited to addressing other needs or opportunities
http://www.nitrd.gov/about/harnessing_power_web.pdf
Tom Moritz, OPM “Big Data” July, 2012
http://www.dataone.org/best-practices
DataOne:
The Data Life Cycle: An Overview
The data life cycle has eight components:
Plan: description of the data that will be compiled, and how the data will be
managed and made accessible throughout its lifetime
Collect: observations are made either by hand or with sensors or other
instruments and the data are placed a into digital form
Assure: the quality of the data are assured through checks and inspections
Describe: data are accurately and thoroughly described using the appropriate
metadata standards
Preserve: data are submitted to an appropriate long-term archive (i.e. data
center)
Discover: potentially useful data are located and obtained, along with the
relevant information about the data (metadata)
Integrate: data from disparate sources are combined to form one
homogeneous set of data that can be readily analyzed
Analyze:data are analyzed
DataOne Best Practices Primer:
http://www.dataone.org/sites/all/documents/DataONE_BP_Primer_020212.pdf
W. K. Michener “Meta-information concepts for ecological data management”
Ecological Informatics 1 (2006) 3-7
Tom Moritz, OPM “Big Data” July, 2012http://tinyurl.com/d49f3vm
Federal Geographic
Data Committee
”Stages of the Geospatial
Data Lifecycle pursuant to
OMB Circular A–16, sections
8(e)(d), 8(e)(f), and 8(e)(g)”
http://www.fgdc.gov/policyandplanning/a-16/stages-of-geospatial-data-lifecycle-a16.pdf
“The Geospatial Data Lifecycle is not intended to
be rigidly sequential or linear. The quality
assurance and (or) quality control (QA/QC)
functions for the data should be included at
every stage of the Geospatial Data Lifecycle.”
[emphasis added]
--”Stages of the Geospatial Data Lifecycle pursuant to OMB Circular A–16, sections
8(e)(d), 8(e)(f), and 8(e)(g)”
http://www.fgdc.gov/policyandplanning/a-16/stages-of-geospatial-data-lifecycle-a16.pdf
Interagency Science Working Group
National Archives and Records Administration
http://www.archives.gov/records-mgmt/toolkit/pdf/ID373.pdf
“Establishing Trustworthy Digital Repositories: A Discussion Guide Based on the ISO Open
Archival Information System (OAIS) Standard Reference Model January 19, 2011”
“Sustainable data curation”
“There are several main elements necessary to sustain data curation:
 “Robust data storage facilities (hardware and software) that are capable of
accurately handling data migration across generations of media.
 “Backup plans, that are tested, so irreplaceable data are not at risk.
Unintended data loss can occur for many reasons: some major causes are:
poor stewardship leading to the loss of metadata to understand where the
data is located and documentation to understand the content, physical
facility and equipment failure (fire, flood, irrecoverable hardware crashes),
accidental data overwrite or deletion.
 “Science-educated staff with knowledge to match the data discipline is
important for checking data integrity, choosing archive organization, creating
adequate metadata, consulting with users, and designing access systems that
meet user expectations. Staff responsible for stewardship and curation must
understand the digital data content and potential scientific uses. “
C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge
sharing ,” from the 4th International Digital Curation Conference December 2008 , page 10. www.dcc.ac.uk/events/dcc-
2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]
Sustainable data curation(cont.)
 “Non-proprietary data formats that will ensure data access capability for
many decades and will help avoid data losses resulting from software
incompatibilities…
 “Consistent staffing levels and people dedicated to best practices in
archiving, access, and stewardship…
 “National and International partnerships and interactions greatly aids in
shared achievements for broad scale user benefits, e.g. reanalyses,
TIGGE…
 “Stable fundingnot focused on specific projects, but data management in
general…”
C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge
sharing ,” from the 4th International Digital Curation Conference December 2008 , page 10-11. www.dcc.ac.uk/events/dcc-
2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]
Database Lifecycle Management
“The Database Lifecycle Management covers the entire
lifecycle of the databases, including:
• Discovery and Inventory tracking: the ability to discover
your assets, and track them
• Initial provisioning, the ability to rollout databases in
minutes
• Ongoing Change Management, End-to-end management of
patches , upgrades, schema and data changes
• Configuration Management, track inventory, configuration
drift and detailed configuration search
• Compliance Management, reporting and management of
industry and regulatory compliance standards
• Site level Disaster Protection Automation”
http://www.oracle.com/technetwork/oem/pdf/511949.pdf
Tom Moritz, OPM “Big Data”
Design
Define
Conceptualise
Plan
Produce
Create
Acquire
Receive
Collect
Preserve
Protect
Curate
Maintain
Archive
Appraise
Select
Analyze
Distribute
Access
Use
Reuse
Store
Discover
Dispose
Transform
Describe
Repurpose
Metadata
standards Add
Metadata
Assure
“Data Quality” ???
“In the most general colloquial terms, ‘Data Quality’ is the fundamental issue
of concern to scientists, policy makers, managers/decision makers and the
general public.
‘Data Quality’can be considered in terms of three primary values:
• Validity: logical in terms of intended hypothesis to be tested (all potential
types of data that could be chosen should be weighed for probative
value,,,)
• Competence (Reliability) :consideration of the proper choice of expert
staff, methods, apparatus/gear, calibration, deployment and operation
• Integrity: the maintenance of original integrity of data as well as tracking
and documenting of all recording, migration, transformations and
sequences of transformation of data”
Tom Moritz, OPM “Big Data” July, 2012
“…the “validation” of any scientific hypotheses rests
upon the sum integrity of all original data and
of all sequences of data transformation
to which original data have been subject. “
– Tom Moritz
“The Burden of Proof”
Tom Moritz, OPM “Big Data”
http://imsgbif.gbif.org/CMS_NEW/get_file.php?FI
LE=2b032cf8212d19a720f21465df0686
Tom Moritz
Los Angeles
tom.moritz@gmail.com
310 963 0199
http://www.linkedin.com/in/tmoritz

Contenu connexe

Similaire à "What does 'Full Life-Cycle' Data Management Mean ?"

Moritz esip2011
Moritz esip2011Moritz esip2011
Moritz esip2011Tom Moritz
 
Spatial_Data_Analysis_with_open_source_softwares[1]
Spatial_Data_Analysis_with_open_source_softwares[1]Spatial_Data_Analysis_with_open_source_softwares[1]
Spatial_Data_Analysis_with_open_source_softwares[1]Joachim Nkendeys
 
Schulken_Swiernik-Archaeological_Site_Survey
Schulken_Swiernik-Archaeological_Site_SurveySchulken_Swiernik-Archaeological_Site_Survey
Schulken_Swiernik-Archaeological_Site_SurveyEric Schulken
 
Report on Enhancing the performance of WSN
Report on Enhancing the performance of WSNReport on Enhancing the performance of WSN
Report on Enhancing the performance of WSNDheeraj Kumar
 
Surface Analysis in GIS
Surface Analysis in GISSurface Analysis in GIS
Surface Analysis in GISRituSaha3
 
AI for Earth: Analyzing Global Data with Azure
AI for Earth: Analyzing Global Data with AzureAI for Earth: Analyzing Global Data with Azure
AI for Earth: Analyzing Global Data with AzureMicrosoft Tech Community
 
A Comprehensive Review Of Datasets For Statistical Research In Probability An...
A Comprehensive Review Of Datasets For Statistical Research In Probability An...A Comprehensive Review Of Datasets For Statistical Research In Probability An...
A Comprehensive Review Of Datasets For Statistical Research In Probability An...Jessica Henderson
 
ANALYSIS OF LAND USE AND LAND COVER CHANGE OF BANGALORE URBAN USING REMOTE SE...
ANALYSIS OF LAND USE AND LAND COVER CHANGE OF BANGALORE URBAN USING REMOTE SE...ANALYSIS OF LAND USE AND LAND COVER CHANGE OF BANGALORE URBAN USING REMOTE SE...
ANALYSIS OF LAND USE AND LAND COVER CHANGE OF BANGALORE URBAN USING REMOTE SE...Cynthia King
 
Ijarcet vol-2-issue-4-1393-1397
Ijarcet vol-2-issue-4-1393-1397Ijarcet vol-2-issue-4-1393-1397
Ijarcet vol-2-issue-4-1393-1397Editor IJARCET
 

Similaire à "What does 'Full Life-Cycle' Data Management Mean ?" (20)

Moritz esip2011
Moritz esip2011Moritz esip2011
Moritz esip2011
 
thesis.compressed
thesis.compressedthesis.compressed
thesis.compressed
 
Process Model
Process ModelProcess Model
Process Model
 
Statistic report
Statistic reportStatistic report
Statistic report
 
Spatial_Data_Analysis_with_open_source_softwares[1]
Spatial_Data_Analysis_with_open_source_softwares[1]Spatial_Data_Analysis_with_open_source_softwares[1]
Spatial_Data_Analysis_with_open_source_softwares[1]
 
Big data
Big dataBig data
Big data
 
Biodata analysis
Biodata analysisBiodata analysis
Biodata analysis
 
Schulken_Swiernik-Archaeological_Site_Survey
Schulken_Swiernik-Archaeological_Site_SurveySchulken_Swiernik-Archaeological_Site_Survey
Schulken_Swiernik-Archaeological_Site_Survey
 
Report on Enhancing the performance of WSN
Report on Enhancing the performance of WSNReport on Enhancing the performance of WSN
Report on Enhancing the performance of WSN
 
Surface Analysis in GIS
Surface Analysis in GISSurface Analysis in GIS
Surface Analysis in GIS
 
Pollution
PollutionPollution
Pollution
 
GIS
GISGIS
GIS
 
AI for Earth: Analyzing Global Data with Azure
AI for Earth: Analyzing Global Data with AzureAI for Earth: Analyzing Global Data with Azure
AI for Earth: Analyzing Global Data with Azure
 
Dissertation_Full
Dissertation_FullDissertation_Full
Dissertation_Full
 
A Comprehensive Review Of Datasets For Statistical Research In Probability An...
A Comprehensive Review Of Datasets For Statistical Research In Probability An...A Comprehensive Review Of Datasets For Statistical Research In Probability An...
A Comprehensive Review Of Datasets For Statistical Research In Probability An...
 
Mlhil ljr.web.285
Mlhil ljr.web.285Mlhil ljr.web.285
Mlhil ljr.web.285
 
Remote+Sensing
Remote+SensingRemote+Sensing
Remote+Sensing
 
Thesis-DelgerLhamsuren
Thesis-DelgerLhamsurenThesis-DelgerLhamsuren
Thesis-DelgerLhamsuren
 
ANALYSIS OF LAND USE AND LAND COVER CHANGE OF BANGALORE URBAN USING REMOTE SE...
ANALYSIS OF LAND USE AND LAND COVER CHANGE OF BANGALORE URBAN USING REMOTE SE...ANALYSIS OF LAND USE AND LAND COVER CHANGE OF BANGALORE URBAN USING REMOTE SE...
ANALYSIS OF LAND USE AND LAND COVER CHANGE OF BANGALORE URBAN USING REMOTE SE...
 
Ijarcet vol-2-issue-4-1393-1397
Ijarcet vol-2-issue-4-1393-1397Ijarcet vol-2-issue-4-1393-1397
Ijarcet vol-2-issue-4-1393-1397
 

Plus de Tom Moritz

ESA Science Commons
ESA Science CommonsESA Science Commons
ESA Science CommonsTom Moritz
 
Marine microbiology
Marine microbiologyMarine microbiology
Marine microbiologyTom Moritz
 
Pelagic Environments and Ecology (3) copy
Pelagic Environments and Ecology (3) copyPelagic Environments and Ecology (3) copy
Pelagic Environments and Ecology (3) copyTom Moritz
 
Pelagic environment and ecology (2)
Pelagic environment and ecology (2) Pelagic environment and ecology (2)
Pelagic environment and ecology (2) Tom Moritz
 
Pelagic Environments and Ecosystems (1)
Pelagic Environments and Ecosystems (1)Pelagic Environments and Ecosystems (1)
Pelagic Environments and Ecosystems (1)Tom Moritz
 
Chaparral and Coastal Scrub Ecology
Chaparral and Coastal Scrub EcologyChaparral and Coastal Scrub Ecology
Chaparral and Coastal Scrub EcologyTom Moritz
 
The Intertidal and Kelp Forests - Pacific Coast
The Intertidal and Kelp Forests  - Pacific CoastThe Intertidal and Kelp Forests  - Pacific Coast
The Intertidal and Kelp Forests - Pacific CoastTom Moritz
 
A Universe of Data
A Universe of DataA Universe of Data
A Universe of DataTom Moritz
 
Climate Change
Climate ChangeClimate Change
Climate ChangeTom Moritz
 
Climate change
Climate changeClimate change
Climate changeTom Moritz
 
The commons???
The commons???The commons???
The commons???Tom Moritz
 
Ecological Society of America Science Commons
Ecological Society of America Science CommonsEcological Society of America Science Commons
Ecological Society of America Science CommonsTom Moritz
 
Epidemiology cholera, ebola, hiv aids
Epidemiology cholera, ebola, hiv aidsEpidemiology cholera, ebola, hiv aids
Epidemiology cholera, ebola, hiv aidsTom Moritz
 
The Human Biome
The Human BiomeThe Human Biome
The Human BiomeTom Moritz
 
Epistemology, ontology, knowledge x
Epistemology, ontology, knowledge xEpistemology, ontology, knowledge x
Epistemology, ontology, knowledge xTom Moritz
 
Ids 330 "Environmental Leadership" Basic Introduction (University of the West)
Ids 330 "Environmental Leadership" Basic Introduction (University of the West)Ids 330 "Environmental Leadership" Basic Introduction (University of the West)
Ids 330 "Environmental Leadership" Basic Introduction (University of the West)Tom Moritz
 
Charles Darwin: The Galapagos Finches and the Emergence of Evolutionary Theory
Charles Darwin: The Galapagos Finches and the Emergence of Evolutionary TheoryCharles Darwin: The Galapagos Finches and the Emergence of Evolutionary Theory
Charles Darwin: The Galapagos Finches and the Emergence of Evolutionary TheoryTom Moritz
 
Trauma and violence
Trauma and violenceTrauma and violence
Trauma and violenceTom Moritz
 
Children and Trauma in the International World (UWest Psych 490 November 7, 2...
Children and Trauma in the International World (UWest Psych 490 November 7, 2...Children and Trauma in the International World (UWest Psych 490 November 7, 2...
Children and Trauma in the International World (UWest Psych 490 November 7, 2...Tom Moritz
 

Plus de Tom Moritz (20)

ESA Science Commons
ESA Science CommonsESA Science Commons
ESA Science Commons
 
Microbiology
MicrobiologyMicrobiology
Microbiology
 
Marine microbiology
Marine microbiologyMarine microbiology
Marine microbiology
 
Pelagic Environments and Ecology (3) copy
Pelagic Environments and Ecology (3) copyPelagic Environments and Ecology (3) copy
Pelagic Environments and Ecology (3) copy
 
Pelagic environment and ecology (2)
Pelagic environment and ecology (2) Pelagic environment and ecology (2)
Pelagic environment and ecology (2)
 
Pelagic Environments and Ecosystems (1)
Pelagic Environments and Ecosystems (1)Pelagic Environments and Ecosystems (1)
Pelagic Environments and Ecosystems (1)
 
Chaparral and Coastal Scrub Ecology
Chaparral and Coastal Scrub EcologyChaparral and Coastal Scrub Ecology
Chaparral and Coastal Scrub Ecology
 
The Intertidal and Kelp Forests - Pacific Coast
The Intertidal and Kelp Forests  - Pacific CoastThe Intertidal and Kelp Forests  - Pacific Coast
The Intertidal and Kelp Forests - Pacific Coast
 
A Universe of Data
A Universe of DataA Universe of Data
A Universe of Data
 
Climate Change
Climate ChangeClimate Change
Climate Change
 
Climate change
Climate changeClimate change
Climate change
 
The commons???
The commons???The commons???
The commons???
 
Ecological Society of America Science Commons
Ecological Society of America Science CommonsEcological Society of America Science Commons
Ecological Society of America Science Commons
 
Epidemiology cholera, ebola, hiv aids
Epidemiology cholera, ebola, hiv aidsEpidemiology cholera, ebola, hiv aids
Epidemiology cholera, ebola, hiv aids
 
The Human Biome
The Human BiomeThe Human Biome
The Human Biome
 
Epistemology, ontology, knowledge x
Epistemology, ontology, knowledge xEpistemology, ontology, knowledge x
Epistemology, ontology, knowledge x
 
Ids 330 "Environmental Leadership" Basic Introduction (University of the West)
Ids 330 "Environmental Leadership" Basic Introduction (University of the West)Ids 330 "Environmental Leadership" Basic Introduction (University of the West)
Ids 330 "Environmental Leadership" Basic Introduction (University of the West)
 
Charles Darwin: The Galapagos Finches and the Emergence of Evolutionary Theory
Charles Darwin: The Galapagos Finches and the Emergence of Evolutionary TheoryCharles Darwin: The Galapagos Finches and the Emergence of Evolutionary Theory
Charles Darwin: The Galapagos Finches and the Emergence of Evolutionary Theory
 
Trauma and violence
Trauma and violenceTrauma and violence
Trauma and violence
 
Children and Trauma in the International World (UWest Psych 490 November 7, 2...
Children and Trauma in the International World (UWest Psych 490 November 7, 2...Children and Trauma in the International World (UWest Psych 490 November 7, 2...
Children and Trauma in the International World (UWest Psych 490 November 7, 2...
 

Dernier

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 

Dernier (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 

"What does 'Full Life-Cycle' Data Management Mean ?"

  • 1. What does “Full Life-Cycle” Data Management Mean ? “BIG DATA” US Office of Personnel Management March 14, 2013
  • 2. “As required by the National Archives and Records Administration (NARA) in 36 CFR Chapter XII, Subchapter B, Records Management, Federal agencies are responsible for creating and maintaining authentic, reliable, and usable records and ensure that they remain so for the length of their authorized retention period.” http://www.archives.gov/records-mgmt/toolkit/pdf/ID373.pdf
  • 3. First, a brief digression concerning graphics… Edward Tufte’s favorite…
  • 4. DISCRETION… Exercise care in the selection of graphic formats – not all graphics enhance understanding some may confuse… Lacking effective compound graphics, simplicity and the use of multiple graphic images may be more effective. The New York Times often produces exemplary graphics that compress complex data and complex relationships…
  • 5. NYT: “LEADING CAUSES OF CANCER DEATHS” http://www.nytimes.com/imagepages/2007/07/29/health/29cancer.graph.web.html
  • 6. “Data” ? [technical definition] “…’data’ are defined as any information that can be stored in digital form and accessed electronically, including, but not limited to, numeric data, text, publications, sensor streams, video, audio, algorithms, software, models and simulations, images, etc.”-- Program Solicitation 07-601 “Sustainable Digital Data Preservation and Access Network Partners (DataNet)” Taken in this broadest possible sense, “data” are thus simply electronic coded forms of information. And virtually anything can be represented as “data” so long as it is electronically machine-readable.
  • 7. “Data” [epistemicdefinition – addressing the meaning of data] “Measurements, observations or descriptions of a referent -- such as an individual, an event, a specimen in a collection or an excavated/surveyed object -- created or collected through human interpretation (whether directly “by hand” or through the use of technologies)” -- AnthroDPA Working Group on Metadata (May, 2009) [funded by Wenner-Gren Foundation and US NSF]
  • 8. “Experiments to determine the density of the earth,” by Henry Cavendish, ESQ., F.R.S. AND A.S. Read June 21, 1798 (From the Philosophical Transactions of the Royal Society of London for the year 1798, Part II. , pp. 469-526) From: http://www.archive.org/details/lawsofgravitatio00mackrich
  • 9. USDA – NATURAL RESOURCES CONSERVATION SERVICE
  • 10. 2 12.365 1196796112 2018.8 0.5585 0.51029 0.55517 0.54354 0.6067 0.52858 0.55351 0.59008 0.59506 0.60337 0.56514 12/4/07 11:21 4.47351 3 12.348 1196796232 2017.9 0.55682 0.51028 0.5535 0.54352 0.60669 0.52857 0.55017 0.59007 0.59505 0.60336 0.56513 12/4/07 11:23 0 4.47490 4 12.357 1196796352 2018.6 0.55514 0.51027 0.55348 0.54351 0.60501 0.52855 0.55016 0.59005 0.59504 0.60501 0.56512 12/4/07 11:25 0 4.47628 5 12.354 1196796472 2017.6 0.55514 0.51026 0.55181 0.5435 0.60334 0.52855 0.54849 0.59004 0.59503 0.60334 0.56511 12/4/07 11:27 0 4.47767 6 12.334 1196796592 2018.3 0.55347 0.51026 0.55015 0.5435 0.60333 0.52854 0.54682 0.59004 0.59502 0.605 0.56511 12/4/0711:29 0 4.47906 7 12.34 1196796712 2018.5 0.55014 0.50859 0.55014 0.54349 0.60332 0.53019 0.54349 0.59003 0.59501 0.60498 0.56676 12/4/07 11:31 0 4.48045 8 12.337 1196796832 2017.8 0.55013 0.50692 0.55013 0.54348 0.60332 0.53019 0.54182 0.59002 0.59501 0.60498 0.56675 12/4/07 11:33 0 4.48184 9 12.328 1196796952 2017.5 0.5468 0.50691 0.5468 0.54347 0.60331 0.53018 0.53849 0.59001 0.595 0.60497 0.56674 12/4/0711:35 0 4.48323 10 12.323 1196797072 2017 0.54679 0.50524 0.54679 0.54347 0.59998 0.53017 0.53682 0.59 0.59499 0.60496 0.56674 12/4/07 11:37 0 4.48462 11 12.328 1196797192 2018.9 0.54679 0.50191 0.54512 0.5418 0.59665 0.53017 0.53349 0.59 0.59498 0.60496 0.56673 12/4/0711:39 0 4.48601 12 12.319 1196797312 2017.7 0.54345 0.49857 0.54178 0.54178 0.59663 0.53015 0.53015 0.58998 0.5933 0.60327 0.56671 12/4/07 11:41 0 4.48740 13 12.311 1196797432 2017.3 0.54343 0.4969 0.54011 0.54177 0.59661 0.53014 0.52848 0.58997 0.59329 0.6016 0.5667 12/4/07 11:43 0 4.48878 14 12.316 1196797552 2018.6 0.5401 0.49357 0.53678 0.54176 0.59328 0.53013 0.5268 0.58995 0.59328 0.60325 0.56669 12/4/07 11:45 0 4.49017 15 12.31 1196797672 2016.8 0.53844 0.4919 0.53511 0.54176 0.59494 0.53013 0.52514 0.58995 0.59328 0.60325 0.56503 12/4/07 11:47 0 4.49156 16 12.31 1196797792 2017.1 0.53676 0.48856 0.53343 0.54174 0.59326 0.53011 0.5218 0.58993 0.59326 0.60323 0.56501 12/4/07 11:49 0 4.49295 17 12.31 1196797912 2017.1 0.53342 0.48523 0.5301 0.54173 0.59324 0.5301 0.51846 0.58826 0.59324 0.60321 0.56499 12/4/07 11:51 0 4.49434 18 12.301 1196798031 2017.5 0.53174 0.48521 0.52842 0.53839 0.59156 0.53008 0.51845 0.58824 0.59323 0.6032 0.56498 12/4/07 11:53 0 4.49573 19 12.301 1196798151 2016.3 0.53007 0.48188 0.52509 0.53838 0.59155 0.53007 0.51512 0.58823 0.59321 0.60152 0.5633 12/4/07 11:55 0 4.49712 20 12.303 1196798271 2016.6 0.5284 0.47855 0.52175 0.53837 0.59154 0.5284 0.5151 0.58821 0.59154 0.60151 0.56163 12/4/07 11:57 0 4.49851 sbid battery datetime heater_voltage Manz1Sap1 Manz1Sap2 Manz1Sap3 Manz1Sap4 Manz2Sap5 Manz2Sap6 Manz2Sap7 Manz3Sap10 Manz3Sap8 Manz3Sap9 Manz4Sap11 timestamp Datagap Julian manzanita_sapflow_12-5-07_to_7-7-08.xls instantaneous sap flow data (as temperature differences on a constant temperature heat dissipation probe) for multiple branches of Manzanita, collected with a datalogger. used to correlate physiological activity with below-ground measures of root grown and CO2 production. Datum: “0.59998”
  • 11. DATA SETS some examples with “native metadata” 2-d_soil_temps.csv surface, and sub-surface soil temperatures (at 2cm and 8cm depths) measured at one location for a few days in order to calibrate a model of temperature propagation. Surface temperature was measured with an infrared thermometer, subsurface temperatures with a thermocouple. ---------------------------- 5-minute_light_data_for_4_continuous_days_plus_reference.xls PPF (photosynthetic photon flux = photosynthetically active radiation 400-700nm) measured with an array of photodiodes calibrated to a Licor sensor, along a linear transect for a few days. used to get an idea of how much light plants along the transect are receiving. ---------------------------- CO2_of_air_at_different_heights_July_9.xls concentration of CO2 in the air during the evening for one day, measured with a Licor infrared gas analyzer and a series of relays and tubes with a pump. used to examine the gradient of CO2 coming from the soil when the air is still during the evening. ---------------------------- Fern_light_response.xls Light response curves for bracken ferns, measured with a Licor photosynthesis system. Fronds are exposed to different light levels and their instantaneous photosynthesis and conductance is measured. used in conjunction with the induction data (below) for physiological characterization of the ferns. ---------------------------- La_Selva_species_photosyntheis_table.xls incomplete data set on instantaneous photosynthesis rates for various tropical understory and epiphytic species grown in a shade house in Costa Rica. ---------------------------- manzanita_sapflow_12-5-07_to_7-7-08.xls instantaneous sap flow data (as temperature differences on a constant temperature heat dissipation probe) for multiple branches of Manzanita, collected with a datalogger. used to correlate physiological activity with below-ground measures of root grown and CO2 production. ---------------------------- moisture_release_curves.xls percentage of water content, water potential (in MegaPascals) and temperature of soil samples, measured in the laboratory for calibration of water content with water potential. soil is from the James Reserve in California. ---------------------------- Photosynthetic_induction.xls a time-course of photosynthetic induction for a leaf over 35 minutes. instantaneous photosynthesis measured as �mol CO2 m/2/s and light level is probably 1000 micromoles. used to determine physiological characteristics of bracken ferns. ---------------------------- run_2_24-h_data_for_mesh.xls measurements of micrometeorological parameters on a moving shuttle, going from a clearing across a forest edge and into the forest for about 30 meters. Pyronometers facing up and down, pyrgeometer facing up and down, PAR, air temperature, relative humidity. Also data from a station fixed in the clearing and some derived variables calculated. used for examining edge effects in forests. ---------------------------- Segment_of_wallflower_compare_colorspaces_blur.xls pixel counts from images of wallflowers that were segmented into flower/not-flower under different color spaces. segmentation was made using a probability matrix of hand-segmented images. used to automatically count flowers in images collected after this training data was collected (and used to determine the best color space for this task).
  • 12. Data Development: “Data Reduction - Processing Level Definitions” (an example) http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19860021622_1986021622.pdf Report of the EOS Data Panel Vol IIA, NASA, 1986 (Tech Memorandum 87777) Tom Moritz, OPM “Big Data” July, 2012
  • 13. Data in Public Service The Federal government manages data in satisfaction of three primary requirements: 1) To account transparently for government operations 2) To provide citizen access to the products of government activities 3) To fulfill mandated tasks for which the government has no original data (this requires data acquisition)
  • 14.
  • 15. The basic goal is to make all data held by the US government fully reliable and “audit-worthy”. All data and all derived data products should be able to withstand exacting examination and testing. All descriptive information required for auditing should be fully disclosed, readily available and easily accessible in standard reporting formats.
  • 17. • AGS Alto Golfo Sustentable • ASM American Society of Mammalogists • CEC Commission for Environmental Cooperation • CEDO Intercultural Center for the Study of Deserts and Oceans • CI Conservation International • CIRVA International Committee for the Recovery of the Vaquita • CICESE Centro de Investigación Científica y Ecuación Superior de Ensenada • CILA International Boundary and Water Commission • CITES Convention on International Trade in Endangered Species of Wild Fauna and Flora • Conagua National Water Commission • Conanp National Commission for Protected Natural Areas, • Semarnat (Comisión Nacional de Áreas Naturales Protegida—Semarnat) • Conapesca National Fisheries and Aquaculture Commission • Sagarpa (Comisión Nacional de Pesca y Acuacultura, Sagarpa) • Profepa Federal Attorney for Environmental Protection • Secretariat of Agriculture, Livestock, Rural Development, Fisheries, and Food (Mexico) Salud Secretariat of Health (Mexico) • COSEWIC Committee on the Status of Endangered Wildlife in Canada • Department of Fisheries and Oceans (Canada) • United States Department of the Interior • European Cetacean Society • US Environmental Protection Agency • US Food and Drug Administration • GEF Global Environmental • IBWC International Boundary and Water Commission • National Institute of Ecology, Semarnat • Inapesca National Fisheries Institute, Sagarpa • IUCN World Conservation Union • International Whaling Commission • Local Economic and Employment Development program • United States Marine Mammal Commission VAQUITA STAKEHOLDERS
  • 18. • Marine Stewardship Council • NAMPAN North American Marine Protected Areas Network (CEC) • US National Academy of Sciences • North American Wildlife Enforcement Group (CEC) • US National Marine Fisheries Service, NOAA, Department of Commerce • US National Oceanic and Atmospheric Administration, Department of Commerce • United States National Ocean Service (NOAA) • PACE Species Conservation Action Programs, Conanp • PGR Attorney General Office (Mexico) • POEMGC Marine Ecological Planning of the Gulf of California Program, Semarnat • Procer Conservation Program for Species at Risk • Secretariat of Economy (Mexico) • Sectur Secretariat of Tourism (Mexico) • Sedesol Secretariat for Social Development (Mexico) • Semar Secretariat of the Navy • Semarnat Secretariat of the Environment and Natural Resources • Society for Marine Mammalogy • Solamac Latin American Society for Aquatic Mammals • Somemma Mexican Society for Marine Mammalogy • SWFSC Southwest Fisheries Science Center( US NMFS, NOAA) • The Nature Conservancy • Universidad Autónoma de Baja California Sur • University of California • United Nations • United States Coast Guard • United States Fish and Wildlife Service • World Wildlife Fund
  • 19. Values: “Data Quality” ??? In the most general colloquial terms, “Data Quality” is the fundamental issue of concern to scientists, policy makers, managers/decision makers and the general public. “Quality” can be considered in terms of three primary values: • Validity: logical in terms of intended hypothesis to be tested (all potential types of data that could be chosen should be weighed for probative value,,,) • Competence (Reliability) : consideration of the proper choice of expert staff, methods, apparatus/gear, calibration, deployment and operation • Integrity: the maintenance of original integrity of data as well as tracking and documenting of all transformations and sequences of transformation of data
  • 20. Auditing – A Case History “InterAcademyCouncil Names IPCC Review Committee” “AMSTERDAM, Netherlands – The InterAcademy Council (IAC), an organization of the world’s science academies, announced today that Harold T. Shapiro, an economist and former president of Princeton University and the University of Michigan, will chair a 12-member committee to conduct an independent review of the procedures and processes of the Intergovernmental Panel on Climate Change (IPCC). The review was requested in March by U.N. Secretary-General Ban Ki-moon and IPCC Chair Rajendra K. Pachauri. “The committee will review IPCC procedures for preparing its assessment reports. Among the issues to be reviewed are data quality assurance and control; the type of literature that may be cited in IPCC reports; expert and government review of IPCC materials; handling of the full range of scientific views; and the correction of errors that are identified after a report has been completed. The committee also will review overall IPCC processes, including management functions and communication strategies (the full statement of task is available at www.interacademycouncil.net/ipccreview).” http://reviewipcc.interacademycouncil.net/IACNamesIPCCReviewCommittee.html
  • 21. Climate Change Assessments: Review of the Processes and Procedures of the IPCC (InterAcademyCouncil) U.N. Press Conference Aug. 30, 2010 “Opening Statement” by Harold T. Shapiro President Emeritus and Professor of Economics and Public Affairs, Princeton University and Chair, InterAcademy Council Committee to Review the IPCC http://reviewipcc.interacademycouncil.net/OpeningStatement.html
  • 22. US BLM Manual 1283 ”Data Administration and Management” “Every employee is responsible for the quality, integrity, relevancy, accuracy, and currency of the data that is created, collected, or maintained, whether the data are in manual (paper copy) or electronic format. Managers will employ good data management practices to manage the data collected and maintained by their program specialists. The program specialist who uses, manages, and distributes the data must ensure that data are collected according to established standards and maintained to ensure accuracy and integrity. This section identifies specific responsibilities in support of the data management program.” Rel. No. 1-1742 Supersedes Rel. No. 1-1678 Date: 7/10/2012 http://www.blm.gov/pgdata/etc/medialib/blm/wo/Information_Resources_Management/pol icy/blm_manual.Par.77674.File.dat/BLM_1283_manual_final.pdf
  • 23. A Gallery of Efforts to Depict Full Life Cycle Data Management
  • 24. Source: DDI Structural Reform Group. “DDI Version 3.0 Conceptual Model." DDI Alliance. 2004. Accessed on 11 August 2008. http://www.icpsr.umich.edu/DDI/committee-info/Concept-Model-WD.pdf
  • 25. US NSF “DataNet” Program “the full data preservation and access lifecycle” • “acquisition” • “documentation” • “protection” • “access” • “analysis and dissemination” • “migration” • “disposition” “Sustainable Digital Data Preservation and Access Network Partners (DataNet) Program Solicitation” NSF 07- 601 US National Science Foundation Office of Cyberinfrastructure Directorate for Computer & Information Science & Engineering
  • 27. “JISC DCC Curation Lifecycle Model” Tom Moritz, OPM “Big Data” July, 2012http://www.dcc.ac.uk/docs/publications/DCCLifecycle.pdf
  • 29. IWGDD“DIGITAL DATA LIFE CYCLE” Exhibit B-2. Life Cycle Functions for Digital Data* • Plan −− Determine what data need to be created or collected to support a research agenda or a mission function -- Identify and evaluate existing sources of needed data −− Identify standards for data and metadata format and quality −− Specify actions and responsibilities for managing the data over their life cycle • Create −− Produce or acquire data for intended purposes −− Deposit data where they will be kept, managed and accessed for as long as needed to support their intended purpose −− Produce derived products in support of intended purposes; e.g., data summaries, data aggregations, reports, publications • Keep −− Organize and store data to support intended purposes -- Integrate updates and additions into existing collections -- Ensure the data survive intact for as long as needed • Acquire and implement technology −− Refresh technology to overcome obsolescence and to improve performance −− Expand storage and processing capacity as needed −− Implement new technologies to support evolving needs for ingesting, processing, analysis, searching and accessing data • Disposition −− Exit Strategy: plan for transferring data to another entity should the current repository no longer be able to keep it −− Once intended purposes are satisfied, determine whether to destroy data or transfer to another organization suited to addressing other needs or opportunities http://www.nitrd.gov/about/harnessing_power_web.pdf Tom Moritz, OPM “Big Data” July, 2012
  • 31. DataOne: The Data Life Cycle: An Overview The data life cycle has eight components: Plan: description of the data that will be compiled, and how the data will be managed and made accessible throughout its lifetime Collect: observations are made either by hand or with sensors or other instruments and the data are placed a into digital form Assure: the quality of the data are assured through checks and inspections Describe: data are accurately and thoroughly described using the appropriate metadata standards Preserve: data are submitted to an appropriate long-term archive (i.e. data center) Discover: potentially useful data are located and obtained, along with the relevant information about the data (metadata) Integrate: data from disparate sources are combined to form one homogeneous set of data that can be readily analyzed Analyze:data are analyzed DataOne Best Practices Primer: http://www.dataone.org/sites/all/documents/DataONE_BP_Primer_020212.pdf
  • 32. W. K. Michener “Meta-information concepts for ecological data management” Ecological Informatics 1 (2006) 3-7 Tom Moritz, OPM “Big Data” July, 2012http://tinyurl.com/d49f3vm
  • 33. Federal Geographic Data Committee ”Stages of the Geospatial Data Lifecycle pursuant to OMB Circular A–16, sections 8(e)(d), 8(e)(f), and 8(e)(g)” http://www.fgdc.gov/policyandplanning/a-16/stages-of-geospatial-data-lifecycle-a16.pdf
  • 34. “The Geospatial Data Lifecycle is not intended to be rigidly sequential or linear. The quality assurance and (or) quality control (QA/QC) functions for the data should be included at every stage of the Geospatial Data Lifecycle.” [emphasis added] --”Stages of the Geospatial Data Lifecycle pursuant to OMB Circular A–16, sections 8(e)(d), 8(e)(f), and 8(e)(g)” http://www.fgdc.gov/policyandplanning/a-16/stages-of-geospatial-data-lifecycle-a16.pdf
  • 35. Interagency Science Working Group National Archives and Records Administration http://www.archives.gov/records-mgmt/toolkit/pdf/ID373.pdf “Establishing Trustworthy Digital Repositories: A Discussion Guide Based on the ISO Open Archival Information System (OAIS) Standard Reference Model January 19, 2011”
  • 36. “Sustainable data curation” “There are several main elements necessary to sustain data curation:  “Robust data storage facilities (hardware and software) that are capable of accurately handling data migration across generations of media.  “Backup plans, that are tested, so irreplaceable data are not at risk. Unintended data loss can occur for many reasons: some major causes are: poor stewardship leading to the loss of metadata to understand where the data is located and documentation to understand the content, physical facility and equipment failure (fire, flood, irrecoverable hardware crashes), accidental data overwrite or deletion.  “Science-educated staff with knowledge to match the data discipline is important for checking data integrity, choosing archive organization, creating adequate metadata, consulting with users, and designing access systems that meet user expectations. Staff responsible for stewardship and curation must understand the digital data content and potential scientific uses. “ C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge sharing ,” from the 4th International Digital Curation Conference December 2008 , page 10. www.dcc.ac.uk/events/dcc- 2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]
  • 37. Sustainable data curation(cont.)  “Non-proprietary data formats that will ensure data access capability for many decades and will help avoid data losses resulting from software incompatibilities…  “Consistent staffing levels and people dedicated to best practices in archiving, access, and stewardship…  “National and International partnerships and interactions greatly aids in shared achievements for broad scale user benefits, e.g. reanalyses, TIGGE…  “Stable fundingnot focused on specific projects, but data management in general…” C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge sharing ,” from the 4th International Digital Curation Conference December 2008 , page 10-11. www.dcc.ac.uk/events/dcc- 2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]
  • 38. Database Lifecycle Management “The Database Lifecycle Management covers the entire lifecycle of the databases, including: • Discovery and Inventory tracking: the ability to discover your assets, and track them • Initial provisioning, the ability to rollout databases in minutes • Ongoing Change Management, End-to-end management of patches , upgrades, schema and data changes • Configuration Management, track inventory, configuration drift and detailed configuration search • Compliance Management, reporting and management of industry and regulatory compliance standards • Site level Disaster Protection Automation” http://www.oracle.com/technetwork/oem/pdf/511949.pdf Tom Moritz, OPM “Big Data”
  • 40. “Data Quality” ??? “In the most general colloquial terms, ‘Data Quality’ is the fundamental issue of concern to scientists, policy makers, managers/decision makers and the general public. ‘Data Quality’can be considered in terms of three primary values: • Validity: logical in terms of intended hypothesis to be tested (all potential types of data that could be chosen should be weighed for probative value,,,) • Competence (Reliability) :consideration of the proper choice of expert staff, methods, apparatus/gear, calibration, deployment and operation • Integrity: the maintenance of original integrity of data as well as tracking and documenting of all recording, migration, transformations and sequences of transformation of data” Tom Moritz, OPM “Big Data” July, 2012
  • 41. “…the “validation” of any scientific hypotheses rests upon the sum integrity of all original data and of all sequences of data transformation to which original data have been subject. “ – Tom Moritz “The Burden of Proof” Tom Moritz, OPM “Big Data” http://imsgbif.gbif.org/CMS_NEW/get_file.php?FI LE=2b032cf8212d19a720f21465df0686
  • 42. Tom Moritz Los Angeles tom.moritz@gmail.com 310 963 0199 http://www.linkedin.com/in/tmoritz

Notes de l'éditeur

  1. All data go through processes of development. This 1986 NASA publication is still an excellent guide to basics of scientific data management…
  2. The text accompanying the DCC model is very helpful in differentiating “full life cycle” actions / “sequential actions” and “occasional actions” -- the graphic is much less effective…
  3. The accompanying text is more helpful but still not comprehensive…
  4. Michener’s chart from 2006 makes a better effort at suggesting constant elements and feedback loops…
  5. This Oracle “model” focuses on “databases” – not on “data” per se…