SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
INRA's Big Data perspectives
and implementation challenges
Pascal Neveu
UMR MISTEA
INRA - Montpellier
11 avril 2013
Pascal Neveu 2
What is INRA?
French Institute for Agricultural Research
• Specialties:
– Agriculture
– Environment
– Food
● 10000 Persons
(4000 researchers)
● 18 Centres (> 40 sites)
11 avril 2013
Pascal Neveu 3
What is INRA ?
Technical staff -> data producer
INRA Data characteristics:
- Spread over territory
- Many disciplines
- National and international
collaborations
Pascal Neveu / AG MIA 2014 4
Raises integrated issues and challenges:
– How to adapt agriculture to climate change?
– How agriculture impacts environment?
– Agroecology «producing and supplying food in a
different way »
– Global food security and needs of adaptation
– Plant treatment and food safety
– ...
Agronomic Sciences
11 avril 2013
Pascal Neveu 5
Data challenges in Science
Modern science must deal:
● More data production
● A lot of experimental datasets available on the Web
● More collaborative and integrative approaches
→ Management, sharing and data analysis play an
increasing role in research
Discover, combine and analyse
these data
→ Big Data Challenges
11 avril 2013
Pascal Neveu 6
Agronomic Big data
V characteristics
● Volume: massive data and growing size
→ hard to store, manage and analyze
● Variety and Complexity: different sources, scales, disciplines
different semantics, schemas and formats etc.
→ hard to understand, combine, integrate,
● Velocity: speed of data generation
→ have to be process on line
● Validity, Veracity, Vulnerability, Volatility, Visibility,
Visualisation, etc.
11 avril 2013
Pascal Neveu 7
Why Big Data is important in
Agronomic Sciences?
Production of a lot of heterogeneous data for understanding
● Open new insights
● Allow to know:
– Which theories are consistent and which ones are not!
– When data did not quite match what we expect…
→ Decision support needs
(integrative and predictive approaches)
11 avril 2013
Pascal Neveu 8
Illustration: High throughpout phenotyping
High throughput?
Many Environments
Many Plant Genotypes
High frequency and
many trait observations
of Phenotypes
Interactions
11 avril 2013
Pascal Neveu 9
Why high throughput phenotyping
is important for agriculture?
● Adaptation to climate change
● More efficient use of natural resources (including water
and soil) in our farming practices
● Sustainable management and equity
● Food security
Crop performance (yields are globally decreasing)
● …
Genotyping and Phenotyping
Plant phenotyping has become a bottleneck for progress in plant
science and plant breeding
11 avril 2013
Pascal Neveu 10
Phenome
High throughput plant phenotyping
French Infrastructure
9 multi-species plateforms
● 2 controlled platforms
● 5 field platforms
● 2 high throughout omics
11 avril 2013
Pascal Neveu 11
5 Field Platforms
Various scales and data types
● Cell, organ, plant, canopy, population
● Images, hyperspectral, spectral, sensors, actuators, human
readings...
Thousands of micro-plots
time
11 avril 2013
Pascal Neveu 12
2 Controlled Platforms
-1
Time (d)
0 20 40 60 80 100
0
10
20
30
40
50
60
Plantbiomassg
Various scales and data types
time
11 avril 2013
Pascal Neveu 13
2 « Omics » platforms
●
Grinding
weighting (-80°C)
Extraction
Fractionation
Pipetting
Incubation Reading
Various data complex types
composition and the structure of
biopolymers
Quantification of metabolites and
enzyme activities
11 avril 2013
Pascal Neveu 14
Data management challenges
in Phenome: Volume growth
40 Tbytes in 2013, 100 Tbytes in 2014, …
● Volume is a relative concept
– Exponential growth makes hard
● Storage
● Management
● Analysis
Phenome HPC and Storage→ Cloud (FranceGrille, EGI)
– Easy to use with a sort of « unlimited scalability »
– On-demand infrastructure and Elasticity (season)
– Virtualization technologies
– Data-Based parallelism
(same operation on different data)
11 avril 2013
Pascal Neveu 15
Data management challenges
in Phenome: Variety
– Can be produce by differents communities (geneticians,
ecophysiologists, farmers, breeders, etc)
– Data integration needs extensive connections to other
types of data (genotypes, environments, experimental
methods, etc.)
– Different semantics, data schemas, …
– Can be associated in many ways (environments,
individuals, populations, etc.)
● Extremely diverse data
→ Web API, Ontology sets, NoSQL and
Semantic Web methods
11 avril 2013
Pascal Neveu 16
Data management in Phenome:
Velocity
– Controlled platforms produce tens of thousands
images/day (200 days per year)
– Field platforms produce tens of thousands images/day
(100 days per year)
– Omics platforms produce tens of Gbytes/day
(300 days per year)
Scientific Workflow
– OpenAlea /provenance module (Virtual Plant INRIA team)
– Scifloware (Zenith INRIA team)
11 avril 2013
Pascal Neveu 17
Data management challenges
in Phenome: Validity
Data cleaning
● Automatically diagnose and manage:
– Consistency?, duplicate? Wrong?
– annotation consistency?
– Outliers?
– Disguised missing data?
– ...
Some approaches
– Unsupervised Curve clustering (Zenith INRIA team)
– Curve fitting over dynamic constrains
– Clustering of Image histograms
11 avril 2013
Pascal Neveu 18
Conclusion
High throughput phenotyping data:
– Hard to produce
– Hard to manage
– Also hard to analyse
Thank you for your attention

Contenu connexe

Similaire à SC2 Workshop 1: INRA's Big Data perspectives and implementation challenges

Smart opendata ISESS 2013
Smart opendata ISESS 2013Smart opendata ISESS 2013
Smart opendata ISESS 2013
Karel Charvat
 
10th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v210th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v2
Alex Hardisty
 

Similaire à SC2 Workshop 1: INRA's Big Data perspectives and implementation challenges (20)

Smart opendata ISESS 2013
Smart opendata ISESS 2013Smart opendata ISESS 2013
Smart opendata ISESS 2013
 
Data requirements of researchers in agri-food sector
Data requirements of researchers in agri-food sectorData requirements of researchers in agri-food sector
Data requirements of researchers in agri-food sector
 
Agro-Know & the European agricultural research information ecosystem
Agro-Know & the European agricultural research information ecosystemAgro-Know & the European agricultural research information ecosystem
Agro-Know & the European agricultural research information ecosystem
 
Vince smith-delivering biodiversity knowledge in the information age-notext
Vince smith-delivering biodiversity knowledge in the information age-notextVince smith-delivering biodiversity knowledge in the information age-notext
Vince smith-delivering biodiversity knowledge in the information age-notext
 
Big Data in Agriculture, the SemaGrow and agINFRA experience
Big Data in Agriculture, the SemaGrow and agINFRA experienceBig Data in Agriculture, the SemaGrow and agINFRA experience
Big Data in Agriculture, the SemaGrow and agINFRA experience
 
ITEM 12. Soils4EU - Identification of priority areas for improving consisten...
ITEM 12. Soils4EU -  Identification of priority areas for improving consisten...ITEM 12. Soils4EU -  Identification of priority areas for improving consisten...
ITEM 12. Soils4EU - Identification of priority areas for improving consisten...
 
10th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v210th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v2
 
Data intensive agricultural sciences : requirements based on Aginfra+ Project...
Data intensive agricultural sciences : requirements based on Aginfra+ Project...Data intensive agricultural sciences : requirements based on Aginfra+ Project...
Data intensive agricultural sciences : requirements based on Aginfra+ Project...
 
Ontology-Based Services and Knowledge Management in the Agricultural Domain, ...
Ontology-Based Services and Knowledge Management in the Agricultural Domain, ...Ontology-Based Services and Knowledge Management in the Agricultural Domain, ...
Ontology-Based Services and Knowledge Management in the Agricultural Domain, ...
 
AgroPortal : a vocabulary and ontology repository for agronomy, plant science...
AgroPortal : a vocabulary and ontology repository for agronomy, plant science...AgroPortal : a vocabulary and ontology repository for agronomy, plant science...
AgroPortal : a vocabulary and ontology repository for agronomy, plant science...
 
Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...
 
The case for cloud computing in Life Sciences
The case for cloud computing in Life SciencesThe case for cloud computing in Life Sciences
The case for cloud computing in Life Sciences
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?
 
Stories of “Glocality"—Nations in a Global Infrastructure
Stories of “Glocality"—Nations in a Global InfrastructureStories of “Glocality"—Nations in a Global Infrastructure
Stories of “Glocality"—Nations in a Global Infrastructure
 
2nd e-ROSA Stakeholder workshop: Hansen, Global Wheat Rust
2nd e-ROSA Stakeholder workshop: Hansen, Global Wheat Rust2nd e-ROSA Stakeholder workshop: Hansen, Global Wheat Rust
2nd e-ROSA Stakeholder workshop: Hansen, Global Wheat Rust
 
Farm level case studies Tanzania
Farm level case studies TanzaniaFarm level case studies Tanzania
Farm level case studies Tanzania
 
OpenAIRE-Connect: Open Science as a Service for repositories and research com...
OpenAIRE-Connect: Open Science as a Service for repositories and research com...OpenAIRE-Connect: Open Science as a Service for repositories and research com...
OpenAIRE-Connect: Open Science as a Service for repositories and research com...
 
Requirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research DataRequirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research Data
 
SUPERSMART pipeline intro
SUPERSMART pipeline introSUPERSMART pipeline intro
SUPERSMART pipeline intro
 
Masterclass Green Energy by Inria
Masterclass Green Energy by InriaMasterclass Green Energy by Inria
Masterclass Green Energy by Inria
 

Plus de BigData_Europe

Plus de BigData_Europe (20)

Luigi Selmi - The Big Data Integrator Platform
Luigi Selmi - The Big Data Integrator PlatformLuigi Selmi - The Big Data Integrator Platform
Luigi Selmi - The Big Data Integrator Platform
 
Josep Maria Salanova - Introduction to BDE+SC4
Josep Maria Salanova - Introduction to BDE+SC4Josep Maria Salanova - Introduction to BDE+SC4
Josep Maria Salanova - Introduction to BDE+SC4
 
Rajendra Akerkar - LeMO Project
Rajendra Akerkar - LeMO ProjectRajendra Akerkar - LeMO Project
Rajendra Akerkar - LeMO Project
 
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
 
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
 
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
 
BDE SC3.3 Workshop - BDE review: Scope and Opportunities
 BDE SC3.3 Workshop -  BDE review: Scope and Opportunities BDE SC3.3 Workshop -  BDE review: Scope and Opportunities
BDE SC3.3 Workshop - BDE review: Scope and Opportunities
 
BDE SC3.3 Workshop - Agenda
 BDE SC3.3 Workshop - Agenda BDE SC3.3 Workshop - Agenda
BDE SC3.3 Workshop - Agenda
 
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
 BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re... BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
 
BDE SC3.3 Workshop - Data management in WT testing and monitoring
 BDE SC3.3 Workshop - Data management in WT testing and monitoring  BDE SC3.3 Workshop - Data management in WT testing and monitoring
BDE SC3.3 Workshop - Data management in WT testing and monitoring
 
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition Monitoring
 BDE SC3.3 Workshop -  Big Data in Wind Turbine Condition Monitoring BDE SC3.3 Workshop -  Big Data in Wind Turbine Condition Monitoring
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition Monitoring
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overview
 
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
 
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
 BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics  BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
 
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
 
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
 
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
 
BDE SC1 Workshop 3 - MIDAS (Michaela Black)
BDE SC1 Workshop 3 - MIDAS (Michaela Black)BDE SC1 Workshop 3 - MIDAS (Michaela Black)
BDE SC1 Workshop 3 - MIDAS (Michaela Black)
 

Dernier

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 

Dernier (20)

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

SC2 Workshop 1: INRA's Big Data perspectives and implementation challenges

  • 1. INRA's Big Data perspectives and implementation challenges Pascal Neveu UMR MISTEA INRA - Montpellier
  • 2. 11 avril 2013 Pascal Neveu 2 What is INRA? French Institute for Agricultural Research • Specialties: – Agriculture – Environment – Food ● 10000 Persons (4000 researchers) ● 18 Centres (> 40 sites)
  • 3. 11 avril 2013 Pascal Neveu 3 What is INRA ? Technical staff -> data producer INRA Data characteristics: - Spread over territory - Many disciplines - National and international collaborations
  • 4. Pascal Neveu / AG MIA 2014 4 Raises integrated issues and challenges: – How to adapt agriculture to climate change? – How agriculture impacts environment? – Agroecology «producing and supplying food in a different way » – Global food security and needs of adaptation – Plant treatment and food safety – ... Agronomic Sciences
  • 5. 11 avril 2013 Pascal Neveu 5 Data challenges in Science Modern science must deal: ● More data production ● A lot of experimental datasets available on the Web ● More collaborative and integrative approaches → Management, sharing and data analysis play an increasing role in research Discover, combine and analyse these data → Big Data Challenges
  • 6. 11 avril 2013 Pascal Neveu 6 Agronomic Big data V characteristics ● Volume: massive data and growing size → hard to store, manage and analyze ● Variety and Complexity: different sources, scales, disciplines different semantics, schemas and formats etc. → hard to understand, combine, integrate, ● Velocity: speed of data generation → have to be process on line ● Validity, Veracity, Vulnerability, Volatility, Visibility, Visualisation, etc.
  • 7. 11 avril 2013 Pascal Neveu 7 Why Big Data is important in Agronomic Sciences? Production of a lot of heterogeneous data for understanding ● Open new insights ● Allow to know: – Which theories are consistent and which ones are not! – When data did not quite match what we expect… → Decision support needs (integrative and predictive approaches)
  • 8. 11 avril 2013 Pascal Neveu 8 Illustration: High throughpout phenotyping High throughput? Many Environments Many Plant Genotypes High frequency and many trait observations of Phenotypes Interactions
  • 9. 11 avril 2013 Pascal Neveu 9 Why high throughput phenotyping is important for agriculture? ● Adaptation to climate change ● More efficient use of natural resources (including water and soil) in our farming practices ● Sustainable management and equity ● Food security Crop performance (yields are globally decreasing) ● … Genotyping and Phenotyping Plant phenotyping has become a bottleneck for progress in plant science and plant breeding
  • 10. 11 avril 2013 Pascal Neveu 10 Phenome High throughput plant phenotyping French Infrastructure 9 multi-species plateforms ● 2 controlled platforms ● 5 field platforms ● 2 high throughout omics
  • 11. 11 avril 2013 Pascal Neveu 11 5 Field Platforms Various scales and data types ● Cell, organ, plant, canopy, population ● Images, hyperspectral, spectral, sensors, actuators, human readings... Thousands of micro-plots time
  • 12. 11 avril 2013 Pascal Neveu 12 2 Controlled Platforms -1 Time (d) 0 20 40 60 80 100 0 10 20 30 40 50 60 Plantbiomassg Various scales and data types time
  • 13. 11 avril 2013 Pascal Neveu 13 2 « Omics » platforms ● Grinding weighting (-80°C) Extraction Fractionation Pipetting Incubation Reading Various data complex types composition and the structure of biopolymers Quantification of metabolites and enzyme activities
  • 14. 11 avril 2013 Pascal Neveu 14 Data management challenges in Phenome: Volume growth 40 Tbytes in 2013, 100 Tbytes in 2014, … ● Volume is a relative concept – Exponential growth makes hard ● Storage ● Management ● Analysis Phenome HPC and Storage→ Cloud (FranceGrille, EGI) – Easy to use with a sort of « unlimited scalability » – On-demand infrastructure and Elasticity (season) – Virtualization technologies – Data-Based parallelism (same operation on different data)
  • 15. 11 avril 2013 Pascal Neveu 15 Data management challenges in Phenome: Variety – Can be produce by differents communities (geneticians, ecophysiologists, farmers, breeders, etc) – Data integration needs extensive connections to other types of data (genotypes, environments, experimental methods, etc.) – Different semantics, data schemas, … – Can be associated in many ways (environments, individuals, populations, etc.) ● Extremely diverse data → Web API, Ontology sets, NoSQL and Semantic Web methods
  • 16. 11 avril 2013 Pascal Neveu 16 Data management in Phenome: Velocity – Controlled platforms produce tens of thousands images/day (200 days per year) – Field platforms produce tens of thousands images/day (100 days per year) – Omics platforms produce tens of Gbytes/day (300 days per year) Scientific Workflow – OpenAlea /provenance module (Virtual Plant INRIA team) – Scifloware (Zenith INRIA team)
  • 17. 11 avril 2013 Pascal Neveu 17 Data management challenges in Phenome: Validity Data cleaning ● Automatically diagnose and manage: – Consistency?, duplicate? Wrong? – annotation consistency? – Outliers? – Disguised missing data? – ... Some approaches – Unsupervised Curve clustering (Zenith INRIA team) – Curve fitting over dynamic constrains – Clustering of Image histograms
  • 18. 11 avril 2013 Pascal Neveu 18 Conclusion High throughput phenotyping data: – Hard to produce – Hard to manage – Also hard to analyse Thank you for your attention