SlideShare une entreprise Scribd logo
1  sur  65
Dark Data In the Long Tail of Science:   Examples in Biology September 2, 2009 National Institute of Standards and Technology P. Bryan Heidorn NSF University of Illinois  University of Arizona
Introduction ,[object Object],[object Object],[object Object],[object Object]
Cyberinfrastructure Vision ,[object Object],[object Object]
Recognition of need for data curation ,[object Object],[object Object]
[object Object],[object Object],[object Object],Interagency Working Group on Digital Data
New Information Disciplines ,[object Object],[object Object],[object Object],[object Object]
Library Skills
Economics of the long tail ,[object Object],[object Object],[object Object],[object Object]
Naive View of Science Data GenBank PDB f ( x )= ax k + o ( x k ) Power Law of Science Data f ( x )= ax k + o ( x k )| X<.20 Data Volume Science Projects and Initiatives
Does NSF’s Data Follow the Power Law? I do not know but if  $1 = X bytes…..
20-80  Rule The small are big! $350,000- $831 $6,892,810-$350,000 Range $938,548,595 $1,199,088,125 Total Dollars 7478 1869 Number Grants 80% 20% 9347  $2,137,636,716 Total Grants
[object Object],Hubble Space Telescope composite image &quot;ring&quot; of dark matter in the galaxy cluster Cl 0024+17
Related Ideas ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why is the tail also important ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Technical Solutions: Move the tail to the head (increase k) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Solutions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Institutional Solutions ,[object Object],[object Object],[object Object],[object Object],Library director John Hanson told the  Associated Press that a couple of dozen people are cited each year for failure to return materials or pay fines. The incident cost Dalibor about $30 for the two overdue paperbacks. It cost her mother $172 to free her. Book and Bake Sale at the Mary E. Tippitt Memorial Library in Townsend.  Sailing Yacht  Maltese Falco owned by Tom Perkins
Organizational Solutions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Questions about the long-tail ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Barriers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
My Solutions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Automatic Metadata Extraction (Darwin Core) From Museum Specimen Labels 2008 Dublin Core Conference P. Bryan Heidorn, Qin Wei University of Illinois at Urbana-Champaign … <co> Curtis,  </co><hdlc>  North American Pl </hdlc><cnl> No.</cnl><cn> 503*</cn> <gn> Polygala</gn><sp> ambigua,</sp><sa> Nutt.,</sa><val> var.</val> <hb> Coral soil,</hb><lc> Cudjoe Key, South Florida. </lc><col> Legit</col><co> A. H. Curtiss.</co><dt>February</dt>…
The problem ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why care about the specimens? ,[object Object],[object Object]
http://www.ncdc.noaa.gov/img/climate/globalwarming/ar4-fig-3-9.gif
Why care ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
A real-life example:  Baronia brevicornis  and its single food plant,  Acacia cochliacantha (Soberon)
B. brevicornis  Abiotic Niche using BS Garp
Natural History Specimens
S ample records
Sample OCR Output ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Label Labels ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Label Labels ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example Training Record ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Supervised Learning Framework Gold Classified Labels Training Phase Application Phase Machine Learner Unclassified Labels Segmented Text Silver  Classified Labels Segmentation  Machine  Classifier Unclassified  Labels Human Editing Trained  Model
Herbis Experimental Data ,[object Object],[object Object],[object Object]
Performances of NB and HMM
Element Identifiers
Improved Performance With Field Element Identifiers
 
Learning w/ pre categorization Gold Labels Machine Learner Model n Classified Labels Class 1 Labels Categor- ization Class 2 Labels Class n Labels Machine Learner Machine Learner Model 2 Model 1 Class 1 Labels Categor- ization Class 2 Labels Class n Labels Machine Classification Machine Classification Machine Classification Classified Labels Classified Labels Unclassified Labels
FIG. 5. Improved Performance of Specialist Model Specialist100 Curtiss VS 100 General
P. Bryan Heidorn 1 , Hong Zhang 1 , Eugene Chung 2   and   BGWG 1 Graduate School of Library and Information Science,  2 Linguistics, University of Illinois  Machine Learning in BioGeomancer’s Locality Specification SPNHC & NSCA 2006
BioGeomancer Working Group (BGWG)  http://203.202.1.217/bgwebsite/index.html ,[object Object],[object Object],[object Object],[object Object]
Participants
Example Locality Types F; NF; FS Seward Peninsula; vic. Bluff, S coast 204 FPOH 0.4 mi N Collinston on LA 138 181 FOO WALTMAN, 9 MI N, 2.5 MI W OF  160 P; FOH; NP TIESMA RD, 1.5 MI NW EDGEWATER; OFF LAKE MICHIGAN R  109 P; POH INDIAN CREEK, 11 MI. W HWY 160 100 NF; FH near Aleutian Islands; S of Amukta Pass  86 FOH; F dario 7 mi wnw of; RIO VIEJO 43 Locality Type Specification of Location   Record #
 
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],FRAME
Xiaoya Tang and P. Bryan Heidorn ,[object Object],Long leaves … ...  Leaves  20–75, many-ranked, spreading and recurved, not twisted, gray-green (rarely variegated with linear cream stripes), to 1 m    1.5–3.5 cm, ……...  Inflorescences:  ……. spikes very laxly 6–11-flowered, erect to spreading, 2–3-pinnate, ……. User query Description of leaf Length in texts
Information Extraction From FNA Templates for  useful information Extraction Rules Structured  information  Leaf_Shape obovate Leaf_Shape orbiculate Blade_Dimension 3—9 x 3—8 cm   ………… .. ………… .. Original documents ……… .. Leaf blade obovate to nearly orbiculate, 3--9 × 3--8 cm, leathery, base obtuse to broadly cuneate, margins flat, coarsely and often irregularly doubly serrate to nearly dentate,   . ……………… Knowledge bases … .. PartBlade: Leaf blade Blades blade …… Pattern:: * <PartBlade> ' ' <leafShape> * ( <leafShape> ) ',' *  Output:: leaf {leafShape $1} Pattern:: * <PartBlade> * ', ' ( <Range> ' ' * <LengUnit> ) * <PartBase> Output:: leaf {bladeDimension $1} User log analysis Leaf_Shape Leaf_Margin Leaf_Apex     Leaf_Base Blade_Dimension … .. … .. 
Results – System Performance NT: number of tasks accomplished in total NTH: number of tasks accomplished per hour TSR: task success rate SSR: search success rate NSST: number of searches to accomplish a task TST: time spent to accomplish a task NDVST: number of documents viewed to  accomplish a task 0.162 14.75 11.16 NDVST 0.72 435.2 338.8 TST 0.000 9.584 4.779 NSST 0.053 0.568 3.598 4.50 SEARF 0.011 0.000 0.005 0.005 Sig.(ANOVA) 0.210 0.860 8.078 6.75 SEARFA SSR TSR NTH NT Group
Education Programs ,[object Object],[object Object],[object Object],[object Object]
Biological Information Specialists ,[object Object],[object Object],[object Object],[object Object],[object Object]
Master of Science in Biological Informatics ,[object Object],[object Object],[object Object],[object Object]
What does a BIS need to know? ,[object Object],[object Object],[object Object],[object Object],[object Object]
UIUC bioinformatics core coursework ,[object Object],[object Object],[object Object],[object Object]
Sample of existing LIS courses ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
MSLIS Data Curation Concentration ,[object Object],[object Object],[object Object],[object Object]
New research directions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example Service ,[object Object],[object Object],[object Object]
JRS Biodiversity Foundation ,[object Object],[object Object],[object Object]
JRS Biodiversity Foundation ,[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],JRS Biodiversity Foundation
National Science Foundation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Taxonomic Database Working Group ,[object Object],[object Object],[object Object],[object Object]

Contenu connexe

En vedette

The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...Hilmar Lapp
 
Library and data lecture for inf21306
Library and data lecture for  inf21306Library and data lecture for  inf21306
Library and data lecture for inf21306Hugo Besemer
 
Reproducible Science - Panel at iEvoBio 2014
Reproducible Science - Panel at iEvoBio 2014 Reproducible Science - Panel at iEvoBio 2014
Reproducible Science - Panel at iEvoBio 2014 Hilmar Lapp
 
Introduction to Research Data Management at Lancaster University
Introduction to Research Data Management at Lancaster UniversityIntroduction to Research Data Management at Lancaster University
Introduction to Research Data Management at Lancaster UniversityLancaster University Library
 
Open Bioinformatics Foundation: 2014 Update & Some Introspection
Open Bioinformatics Foundation: 2014 Update & Some IntrospectionOpen Bioinformatics Foundation: 2014 Update & Some Introspection
Open Bioinformatics Foundation: 2014 Update & Some IntrospectionHilmar Lapp
 
Sharing Data: An Introductory Workshop from OpenAIRE and Foster
Sharing Data: An Introductory Workshop from OpenAIRE and FosterSharing Data: An Introductory Workshop from OpenAIRE and Foster
Sharing Data: An Introductory Workshop from OpenAIRE and FosterOpenAIRE
 
The Needs of Stakeholders in the RDM Process - the role of LEARN
The Needs of Stakeholders in the RDM Process - the role of LEARNThe Needs of Stakeholders in the RDM Process - the role of LEARN
The Needs of Stakeholders in the RDM Process - the role of LEARNLEARN Project
 
Open science and the individual researcher
Open science and the individual researcherOpen science and the individual researcher
Open science and the individual researcherBram Zandbelt
 
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...LIBER Europe
 

En vedette (11)

The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...
 
Library and data lecture for inf21306
Library and data lecture for  inf21306Library and data lecture for  inf21306
Library and data lecture for inf21306
 
Reproducible Science - Panel at iEvoBio 2014
Reproducible Science - Panel at iEvoBio 2014 Reproducible Science - Panel at iEvoBio 2014
Reproducible Science - Panel at iEvoBio 2014
 
Introduction to Research Data Management at Lancaster University
Introduction to Research Data Management at Lancaster UniversityIntroduction to Research Data Management at Lancaster University
Introduction to Research Data Management at Lancaster University
 
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
 
Open Bioinformatics Foundation: 2014 Update & Some Introspection
Open Bioinformatics Foundation: 2014 Update & Some IntrospectionOpen Bioinformatics Foundation: 2014 Update & Some Introspection
Open Bioinformatics Foundation: 2014 Update & Some Introspection
 
Sharing Data: An Introductory Workshop from OpenAIRE and Foster
Sharing Data: An Introductory Workshop from OpenAIRE and FosterSharing Data: An Introductory Workshop from OpenAIRE and Foster
Sharing Data: An Introductory Workshop from OpenAIRE and Foster
 
Data Metadata and Data Citation - Emma Ganley (PLoS)
Data Metadata and Data Citation - Emma Ganley (PLoS)Data Metadata and Data Citation - Emma Ganley (PLoS)
Data Metadata and Data Citation - Emma Ganley (PLoS)
 
The Needs of Stakeholders in the RDM Process - the role of LEARN
The Needs of Stakeholders in the RDM Process - the role of LEARNThe Needs of Stakeholders in the RDM Process - the role of LEARN
The Needs of Stakeholders in the RDM Process - the role of LEARN
 
Open science and the individual researcher
Open science and the individual researcherOpen science and the individual researcher
Open science and the individual researcher
 
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...
 

Similaire à Dark Data In the Long Tail of Science:   Examples in Biology

Mblwhoil2010 Heidorn
Mblwhoil2010 HeidornMblwhoil2010 Heidorn
Mblwhoil2010 HeidornBryan Heidorn
 
The Perils and Promise of Environmental Data Science
The Perils and Promise of Environmental Data ScienceThe Perils and Promise of Environmental Data Science
The Perils and Promise of Environmental Data ScienceDawn Wright
 
Biodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBiodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBryan Heidorn
 
Digital Data Sharing: Opportunities and Challenges of Opening Research
Digital Data Sharing: Opportunities and Challenges of Opening ResearchDigital Data Sharing: Opportunities and Challenges of Opening Research
Digital Data Sharing: Opportunities and Challenges of Opening ResearchMartin Donnelly
 
Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open...
Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open...Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open...
Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open...Stephane Fellah
 
Module 1 - Data Around Us .pptx
Module 1 - Data Around Us .pptxModule 1 - Data Around Us .pptx
Module 1 - Data Around Us .pptxesta2310819
 
Minimal viable data reuse
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reusevoginip
 
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...Bryan Heidorn
 
The Path to Enlightened Solutions for Biodiversity's Dark Data
The Path to Enlightened Solutions for Biodiversity's Dark DataThe Path to Enlightened Solutions for Biodiversity's Dark Data
The Path to Enlightened Solutions for Biodiversity's Dark Datavbrant
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositoriesChris Rusbridge
 
Geographic Information Retrieval From Disparate Data Sources
Geographic Information Retrieval From Disparate Data SourcesGeographic Information Retrieval From Disparate Data Sources
Geographic Information Retrieval From Disparate Data SourcesIan Turton
 
CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730jeffreylancaster
 
Goldschmidt2019 Samples Workshop
Goldschmidt2019 Samples WorkshopGoldschmidt2019 Samples Workshop
Goldschmidt2019 Samples WorkshopKerstin Lehnert
 
Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Martin Donnelly
 
Research Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two ParadigmsResearch Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two Paradigmstarastar
 

Similaire à Dark Data In the Long Tail of Science:   Examples in Biology (20)

Mblwhoil2010 Heidorn
Mblwhoil2010 HeidornMblwhoil2010 Heidorn
Mblwhoil2010 Heidorn
 
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
 
The Perils and Promise of Environmental Data Science
The Perils and Promise of Environmental Data ScienceThe Perils and Promise of Environmental Data Science
The Perils and Promise of Environmental Data Science
 
Christine borgman keynote
Christine borgman keynoteChristine borgman keynote
Christine borgman keynote
 
Biodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBiodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary Challenge
 
Digital Data Sharing: Opportunities and Challenges of Opening Research
Digital Data Sharing: Opportunities and Challenges of Opening ResearchDigital Data Sharing: Opportunities and Challenges of Opening Research
Digital Data Sharing: Opportunities and Challenges of Opening Research
 
Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open...
Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open...Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open...
Constructing Semantic Gazetteers: Managing GeoSpatial Vocabularies Using Open...
 
Module 1 - Data Around Us .pptx
Module 1 - Data Around Us .pptxModule 1 - Data Around Us .pptx
Module 1 - Data Around Us .pptx
 
Minimal viable data reuse
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reuse
 
FAIRy Stories
FAIRy StoriesFAIRy Stories
FAIRy Stories
 
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
 
The Path to Enlightened Solutions for Biodiversity's Dark Data
The Path to Enlightened Solutions for Biodiversity's Dark DataThe Path to Enlightened Solutions for Biodiversity's Dark Data
The Path to Enlightened Solutions for Biodiversity's Dark Data
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
 
Geographic Information Retrieval From Disparate Data Sources
Geographic Information Retrieval From Disparate Data SourcesGeographic Information Retrieval From Disparate Data Sources
Geographic Information Retrieval From Disparate Data Sources
 
share23webversion-1
share23webversion-1share23webversion-1
share23webversion-1
 
Open Science
Open Science Open Science
Open Science
 
CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730
 
Goldschmidt2019 Samples Workshop
Goldschmidt2019 Samples WorkshopGoldschmidt2019 Samples Workshop
Goldschmidt2019 Samples Workshop
 
Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms:
 
Research Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two ParadigmsResearch Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two Paradigms
 

Dernier

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Dernier (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

Dark Data In the Long Tail of Science:   Examples in Biology

  • 1. Dark Data In the Long Tail of Science:   Examples in Biology September 2, 2009 National Institute of Standards and Technology P. Bryan Heidorn NSF University of Illinois University of Arizona
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 8.
  • 9. Naive View of Science Data GenBank PDB f ( x )= ax k + o ( x k ) Power Law of Science Data f ( x )= ax k + o ( x k )| X<.20 Data Volume Science Projects and Initiatives
  • 10. Does NSF’s Data Follow the Power Law? I do not know but if $1 = X bytes…..
  • 11. 20-80 Rule The small are big! $350,000- $831 $6,892,810-$350,000 Range $938,548,595 $1,199,088,125 Total Dollars 7478 1869 Number Grants 80% 20% 9347 $2,137,636,716 Total Grants
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. Automatic Metadata Extraction (Darwin Core) From Museum Specimen Labels 2008 Dublin Core Conference P. Bryan Heidorn, Qin Wei University of Illinois at Urbana-Champaign … <co> Curtis, </co><hdlc> North American Pl </hdlc><cnl> No.</cnl><cn> 503*</cn> <gn> Polygala</gn><sp> ambigua,</sp><sa> Nutt.,</sa><val> var.</val> <hb> Coral soil,</hb><lc> Cudjoe Key, South Florida. </lc><col> Legit</col><co> A. H. Curtiss.</co><dt>February</dt>…
  • 23.
  • 24.
  • 26.
  • 27. A real-life example: Baronia brevicornis and its single food plant, Acacia cochliacantha (Soberon)
  • 28. B. brevicornis Abiotic Niche using BS Garp
  • 31.
  • 32.
  • 33.
  • 34.
  • 35. Supervised Learning Framework Gold Classified Labels Training Phase Application Phase Machine Learner Unclassified Labels Segmented Text Silver Classified Labels Segmentation Machine Classifier Unclassified Labels Human Editing Trained Model
  • 36.
  • 39. Improved Performance With Field Element Identifiers
  • 40.  
  • 41. Learning w/ pre categorization Gold Labels Machine Learner Model n Classified Labels Class 1 Labels Categor- ization Class 2 Labels Class n Labels Machine Learner Machine Learner Model 2 Model 1 Class 1 Labels Categor- ization Class 2 Labels Class n Labels Machine Classification Machine Classification Machine Classification Classified Labels Classified Labels Unclassified Labels
  • 42. FIG. 5. Improved Performance of Specialist Model Specialist100 Curtiss VS 100 General
  • 43. P. Bryan Heidorn 1 , Hong Zhang 1 , Eugene Chung 2 and BGWG 1 Graduate School of Library and Information Science, 2 Linguistics, University of Illinois Machine Learning in BioGeomancer’s Locality Specification SPNHC & NSCA 2006
  • 44.
  • 46. Example Locality Types F; NF; FS Seward Peninsula; vic. Bluff, S coast 204 FPOH 0.4 mi N Collinston on LA 138 181 FOO WALTMAN, 9 MI N, 2.5 MI W OF 160 P; FOH; NP TIESMA RD, 1.5 MI NW EDGEWATER; OFF LAKE MICHIGAN R 109 P; POH INDIAN CREEK, 11 MI. W HWY 160 100 NF; FH near Aleutian Islands; S of Amukta Pass 86 FOH; F dario 7 mi wnw of; RIO VIEJO 43 Locality Type Specification of Location Record #
  • 47.  
  • 48.
  • 49.
  • 50. Information Extraction From FNA Templates for useful information Extraction Rules Structured information Leaf_Shape obovate Leaf_Shape orbiculate Blade_Dimension 3—9 x 3—8 cm ………… .. ………… .. Original documents ……… .. Leaf blade obovate to nearly orbiculate, 3--9 × 3--8 cm, leathery, base obtuse to broadly cuneate, margins flat, coarsely and often irregularly doubly serrate to nearly dentate, . ……………… Knowledge bases … .. PartBlade: Leaf blade Blades blade …… Pattern:: * <PartBlade> ' ' <leafShape> * ( <leafShape> ) ',' * Output:: leaf {leafShape $1} Pattern:: * <PartBlade> * ', ' ( <Range> ' ' * <LengUnit> ) * <PartBase> Output:: leaf {bladeDimension $1} User log analysis Leaf_Shape Leaf_Margin Leaf_Apex    Leaf_Base Blade_Dimension … .. … .. 
  • 51. Results – System Performance NT: number of tasks accomplished in total NTH: number of tasks accomplished per hour TSR: task success rate SSR: search success rate NSST: number of searches to accomplish a task TST: time spent to accomplish a task NDVST: number of documents viewed to accomplish a task 0.162 14.75 11.16 NDVST 0.72 435.2 338.8 TST 0.000 9.584 4.779 NSST 0.053 0.568 3.598 4.50 SEARF 0.011 0.000 0.005 0.005 Sig.(ANOVA) 0.210 0.860 8.078 6.75 SEARFA SSR TSR NTH NT Group
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.

Notes de l'éditeur

  1. Change to new front image
  2. Add jobs from the interagency working group preport.
  3. Rework with new librarian image
  4. Insert lake victoria overlay
  5. Insert lake victoria overlay
  6. Insert lake victoria overlay