Call Girls In Faridabad(Ballabgarh) Book ☎ 8168257667, @4999
Critical infrastructure to promote data synthesis
1. Critical Infrastructure
to Promote Data
Synthesis into
Evidence-Based
Nutrient Management
Sylvie M. Brouder
Jeff. Volenec
Agronomy Department, Purdue University, West Lafayette, IN
72nd SWCS International Annual Conference, Madison, WI
July 31, 2017
5. Data in the stacks: Libraries and
the research data afterlife
6. Why Libraries? The skill sets, the
thought process, professional value
system (“public good”), & public
expectation of infallibility /
persistence are right
for the problem…
7. I like
Perez
He has
an ugly
girlfriend
An ugly
girlfriend
means he has
no confidence
Inexplicable human
behavior = buy lottery
tickets but also
insurance
Vanity Fair, 12/2011: Michael
Lewis asks “Why do professional
baseball executives make such
colossal mistakes?”
Best metric: on-base %
Moneyball=myscience/
knowledgetranslationepiphany…
9. Agriculture and the non-big data problem: Short
data life cycles, long-tail data, and data lost to the
dark side… enlightenment from medicine
Number of data sets
Data
Size
Organ-
ized
big
data
Long-tail data
Dark data
Schematic adapted from Ferguson et al., 2014, “Big data from small
data: data sharing in the long-tail of neuroscience”
Literature limit
10. Examples of valuable but dark data in Agricultural Research ~
Recommendations must come from the “preponderance” of all
evidence (not just the novel result that makes it to a journal…)
Dark research data?
• Orphaned data ~ data collected
but not used in experimental
analysis (increasingly prevalant)
• Null or failed studies (reproving
the null hypothesis) ~ no impact
studies need to contribute to a
“preponderance” of evidence
• Confirmatory studies ~ not novel
so may not be publishable but still
needed for preponderance of
evidence
Dark non-research data
• Data from on-farm
collaboratives and farmer-driven
research efforts
• Data collected by farmers, CCAs,
etc. in current management
protocols (e.g. farm records)
• Monitoring data off equipment,
etc.
• Other??
11. The long and winding road,
That leads to your door… J. Lennon, P. McCartney
Non-compliant
“Digital Natives”
Persistent Players: M.S. Bracke, J.J. Volenec, R. Turco, S. Brandt, T.S. Murrell
Assoc. Dean Plaut, Dean Mullins, M. Witt, P. Fixen, J. Carlson, …
2004 proposal rejection
Natural
hazards
Encouraging
directional
indicators
12. My current vision for evidence-based nutrient recommendations: 10 steps
to real-time data uptake, analysis & customized recommendations (working
backwards)
10. Customized,
credible, nutrient
management
recommendation …
Self-improving
References the
users’ data
Can be modified for
non agronomic
priorities (risk
consideration, time
horizons, etc.)
13. Steps 6 – 9: The cool stuff via the Analytical Framework
6. Automatic reanalysis w/ accruing data
7. Machine learning / artificial intelligence
strategies to minimize human resources
8. Combination analytical strategies that
are directed by scientist using proven
theories & data mining (“unsupervised”)
strategies to surface overlooked linkages,
drivers & proxy measures
9. Tools for “unpacking” the analytical
result to explore new/unexpected
results & discoveries
14. The Foundation: A bit less cool but essential…
1. User enters data via web
portal
2. Portal has imbedded
workflows for ease of use &
auto quality assurance/quality
control (QA/QC)
3. Data anonymized at entry
according to mutually
acceptable terms & conditions
4. User data combined with
existing research data
5. Data archived and preserved
in a “trusted” repository
The Data Repository….
15. Impediments/Challenges Confronting Data
Generators and Downstream Data Users
Meta-data standards
Data standards
Minimum data sets
Provenance
Repositories
Data publishing
Dataset versioning
Data discovery and retrieval
Data granularity
Scholarship of data publishing
Data ownership
Business models for data
Education about data
management, including re-
education
16. Our Focus: Pressing technological challenges to informatics
for all agronomic efforts concern data workflow…
• Data dispersion
– Take advantage of small
datasets collected by many
researchers (not everything
is “BIG”)
• Data heterogeneity
– Varied protocols reflecting
local culture & variation in
1o purpose
• Data provenance
– Need to track data through
multi-step process of
aggregation, modeling,
analysis
Storage is not enough!!!!!
17. What is a data repository
• It is: an emerging mechanisms
for extending data lifecycles
• Moving beyond storage to
preservation & curation
• Example: Research Repositories,
Data Publications
• It is not: just storage, nor a
website, a database, a network…
18. Repository Issues – No Perfect Solution (yet) for Data, a Public
Good
Examples considered:
DataOne-NSF, Soft money-renewed for a second 5 yr term; become a node?
DRYAD: http://www.datadryad.org/; requires linkage to a publication; what
happens to unpublished, negative results critical to systematic reviews?
Professional Societies: Association of Crops, Soils, & Environmental Science
Societies (ACSESS) - expand Digital Library of e-pubs into a repository?
Enhance data discovery.
New Ag Data Commons at USDA National Agricultural Libraries
Purdue Univ. Research Repository (PURR) & the 4R-RR: Attached to an
Institution with a long legacy; Storage for at least 10 yrs -then what?
19. Where are we (PURR / 4R-RR) focusing in the
“data value chain” ~ working behind or upstream
of the “interoperability curtain”
Conceive
• Exp. Des.
• Data Mgt.
Planning
Collect
• Clean
• Rectify
Describe
• Data
Dictionaries
• Meta Data
Discover
Aggregate
• Code / APIs
• Derivative
Data
Synthesize
• BD Analytics
• Statistical
Meta Anal..
Create
New
Knowledge
Interoperability
Produce “Transformative” (Headline) Results
→ Advance Science
Prepare (Preprocess) Data
→ Create tools & workflows
Largely out of sight; sparsely
populated w/ expertise &
solutions
High visibility; crowded
w/ expertise & solutions
20. PURR / 4R-RR Goals: Facilitating best practices
for data sharing…
• Discoverable ~ findable with common search engines
• Accessible ~ downloadable and subject to manipulation
• Intelligible ~ human and machine readable, suitably described, access
rights clearly stated
• Assessable ~ provenance clear & quality/reliability should be evident
• Usable ~ data should be in a generically “actionable” format (not a
pdf!)
• No-nos:
• Simply posting to a website (non-persistent)
• Requirements: New curriculum and infrastructure…
21. Purdue University Research Repository (PURR)
most useful agronomic tool since the RCBD
PURR can assign
unique DOI to aid
data discovery and
provenance
PURR is a “Hub”
Cyber-environment;
includes tools,
models, workspace
along with storage
and publication
capabilities.
So much more than “data storage”….
22. Purdue University Research Repository: What libraries were/are
to books, PURR is/will be to data (plus so much more!)
23. You can search for (“google”, web of
science, …) data published via PURR
NAL terms; important unique terms (Grant #)
24. The workflow is predetermined when
publishing ~ you are prompted to be
comprehensive in the info you provide ~
PU Lib. Information Specialists review it
prior to publication…
26. LOCKSS: PURR relieves the researcher of the
responsibilities of ensuring data security
Per PURR Policy:
You cannot post
sensitive data
unless you have
removed
identifiers…
27. The 4R Fund Research Repository: Foundational infrastructure for collaboration & synthesis
in nutrient management research & recommendation development (a repository w/in a
repository)
Scott Brandt,
Purdue
University
Libraries
And not
by me…
Includes librarians
who possess the
professional skills
to design
workflows that will
help organize &
store things
(data!!) so
something can be
discovered /
accessed / used.
PURR
Process:
Plan,
Collaborate,
Publish,
Archive
28. Key attribute: Linking of project with archival space ~
data are not accessible to others until you “publish”
Write
Data
Mngmt.
Plan
Create
Project
in
PURR
Collab.
w/
Research
Team
Upload
Data,
Working
Files
Finalize
Dataset
(version)
Upload
Support.
Materials
Create
Data
Pub.
Publish
w/ DOI
PURR
Archives
10+ Yr
Private ~ viewable only to your “team” Searchable,
Accessible,
Retrievable,
Reusable
Policy discussion point with the 4R Fund/IPNI: how long of an embargo post
project completion….????
29. Hands-on Help: Ag. Research Librarians will help 4R-funded Researchers with
Workflows, Policies and Procedures for Curation, Preservation and Publication of
Their Data Including:
Persistent data formats
Licensing data (privacy requirements & policy)
Meta-data / other tags for data discovery
Versioning of accruing data sets
Supporting documentation
Data publishing
Assigning DOIs
PURR has a 10yr commitment to data set preservation / options beyond 10yr
Policies / mechanisms for novel public/private partnerships for data stewardship
Business models for open access data
30. Hands-on help from Agronomists: Best practices
have to become easier to do than not to do…
• Tough Lessons: Single biggest mistake we
have & can make is “build it and they will
come” & not providing enough help
• Many datasets need “special treatment”
• 4R-RR “Data Buddy”
Work one-on-one w/ PIs to help transition
their data from their computers to PURR
Assist with standards: data and meta-data
Make certain minimum data sets are
acquired.
• Challenge: “Data buddies” are
hard to find!
• Youth: not enough wisdom about
the culture of the science
• Established Scientists: don’t have
the data skills, time or both
Meadow Creek Students Partner as Data Buddies
www.hebisd.eduhttp://www.hebisd.edu/media/images/articles/2763f.jpg
31. Final thought: Description not prescription
•Second biggest
mistake…?
•Templates!!!!
•Solution:
•Data dictionaries