Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Big Data in Agriculture, the SemaGrow and agINFRA experience

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 29 Publicité

Big Data in Agriculture, the SemaGrow and agINFRA experience

Télécharger pour lire hors ligne

Presentation of the SemaGrow and agINFRA projects during the EDBT/ICDT 2014 Special Track on Big Data Management Challenges and Solutions in the Context of European Projects, 27th of March 2014
http://www.edbticdt2014.gr/index.php/eu-projects-track

Presentation of the SemaGrow and agINFRA projects during the EDBT/ICDT 2014 Special Track on Big Data Management Challenges and Solutions in the Context of European Projects, 27th of March 2014
http://www.edbticdt2014.gr/index.php/eu-projects-track

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Big Data in Agriculture, the SemaGrow and agINFRA experience (20)

Publicité

Plus récents (20)

Publicité

Big Data in Agriculture, the SemaGrow and agINFRA experience

  1. 1. Big data in agriculture Andreas Drakos Project Manager, Agro-Know
  2. 2. Presentation Outline • The importance of Big Data in Agriculture • Major challenges • The agINFRA and SemaGrow solutions • Supporting Global Initiatives EDBT Special Track Big Data, Athens, March 2014 2
  3. 3. INTRO TO OPEN DATA IN AGRICULTURE EDBT Special Track Big Data, Athens, March 2014 3 Source:http://www.agricorner.com/shareholder-demands-to-shape-modern-agriculture/
  4. 4. Agriculture data to solve major societal challenges • All demographic and food demand projections suggest that, by 2050, the planet will face severe food crises due to our inability to meet agricultural demand – by 2050: – 9.3 billion global population, 34% higher than today – 70% of the world’s population will be urban, compared to 49% today – food production (net of food used for biofuels) must increase by 70% • According to these projections, and in order to achieve the forecasted food levels by 2050, a total investment of USD 83 billion per annum will be required EDBT Special Track Big Data, Athens, March 2014 4
  5. 5. Open Data in Agriculture • In an era of Big Data, one of the most promising routes to bootstrap innovation in agriculture is by the use of Open Data: – e.g. provisioning, maintaining, enriching with relevant metadata, making openly available a vast amount of information • The use and wide dissemination of these data sets is strongly advocated by a number of global and national policy makers such as: – The New Alliance for Food Security and Nutrition G-8 initiative – Food & Agriculture Organization of the UN – DEFRA & DFID in UK – USDA & USAID in the US EDBT Special Track Big Data, Athens, March 2014 5
  6. 6. Open Data in agriculture: a political priority “How Open Data can be harnessed to help meet the challenge of sustainably feeding nine billion people by 2050” April, 2013, Washington, D.C. USA EDBT Special Track Big Data, Athens, March 2014 6
  7. 7. A huge market, globally Food & Agricultural commodities production, http://faostat.fao.org EDBT Special Track Big Data, Athens, March 2014 7
  8. 8. Some figures • Food - Gross Production Value globally in 2011: $2,318,966,621 • Agriculture - Gross Production Value globally in 2011: $2,405,001,443 • Investment in agriculture - Gross Capital Stock globally: $5,356,830,000 … they are big EDBT Special Track Big Data, Athens, March 2014 8
  9. 9. Open data for businesses EDBT Special Track Big Data, Athens, March 2014 9
  10. 10. Farmers starting to capitalize on Big Data technology • Freeing farmers from the constraints of uncertain factors – Dairy farm in UK with ‘connected’ herd • anticipating the risks of epidemics and spotting random factors in milk production – Monsanto’s new acquisition protects farmers from weather issues • The spread of smart sensors – Wine-growers in Spain reduced application of fertilizers and fungicides by 20%, accompanied by a 15% improvement in overall productivity using humidity sensors EDBT Special Track Big Data, Athens, March 2014 10
  11. 11. EDBT Special Track Big Data, Athens, March 2014 11
  12. 12. BIG DATA IN AGRICULTURE EDBT Special Track Big Data, Athens, March 2014 12
  13. 13. Agricultural data types I • Publications, theses, reports, other grey literature • Educational material and content, courseware • Research data, – Primary data, such as measurements & observations structured, e.g. datasets as tables digitized, e.g. images, videos – Secondary data, such as processed elaborations e.g. dendrograms, pie charts, models • Sensor data EDBT Special Track Big Data, Athens, March 2014 13
  14. 14. Agricultural data types II • Provenance information, incl. authors, their organizations and projects • Experimental protocols & methods • Social data, tags, ratings, etc. • Germplasm data • Soil maps • Statistical data • Financial data EDBT Special Track Big Data, Athens, March 2014 14
  15. 15. Big Data demand… • Storage – High volume storage – Impractical or impossible to use centralized storage • Distribution • Federation • Computational power – For efficient discovering / querying – For aggregating and processing – For joining EDBT Special Track Big Data, Athens, March 2014 15
  16. 16. Rationale: Problem statement  Enable the inclusion of: • Large, live, constantly updated datasets and streams • Heterogeneous data  Involve publishers that • cannot or will not directly and immediately make the transition to standards and best practices Open Agricultural Data Liaison Meeting 30-31/10/2013EDBT Special Track Big Data, Athens, March 2014 16
  17. 17. Use Cases (DLO) Heterogeneous Data Collections & Streams  Big data: – Sensor data: soil data, weather – GIS data: land usage, forest and natural resources management data – Historical data: crop yield, economic data – Forecasts: climate change models  Problem: – Combine heterogeneous sources to analyze past food production and forecast future trends – Cannot clone and translate: large scale, live data streams – Cannot immediately and directly affect radical re-design of all sensing and processing currently in place 3rd Plenary & ESG Meeting 21/10/2013EDBT Special Track Big Data, Athens, March 2014 17
  18. 18. Use Cases (FAO) Reactive Data Analysis  Big data: – Document collections: past experiences, analysis and research results – Databases: climate conditions and crop yield observations, economic data (land and food prices)  Problem: – Retrieving complete and accurate information to compile reports • Raw data and reports, scientific publications, etc. – Wastes human resources that could analyze data and synthesize useful knowledge and advice for food production • Too much time spent cross-relating responses from different sources – Too many different organizations and processes rely on the different schemas to make re-design viable – Cloning is inefficient: large and constantly updated stores 3rd Plenary & ESG Meeting 21/10/2013EDBT Special Track Big Data, Athens, March 2014 18
  19. 19. Use Cases (AK) Reactive Resource Discovery  Big data: – Multimedia content about agriculture and biodiversity  Problem: – Real-time retrieval of relevant content – Used to compile educational activities – Schema heterogeneity: • Different providers (Oganic edunet, Europeana, VOA3R, etc.) – Too many different organizations and processes rely on the different schema to make re-design viable – Cloning is inefficient: large and constantly updated stores 3rd Plenary & ESG Meeting 21/10/2013EDBT Special Track Big Data, Athens, March 2014 19
  20. 20. THE AGINFRA & SEMAGROW SOLUTIONS EDBT Special Track Big Data, Athens, March 2014 20
  21. 21. The agINFRA project • e-infrastructure for agricultural research resources (content/data) and services • Higher interoperability between agricultural and other data resources (linked data) • Improved research data services and tools using Grid and Cloud resources EDBT Special Track Big Data, Athens, March 2014 21
  22. 22. agINFRA Grid & Cloud resources EDBT Special Track Big Data, Athens, March 2014 22 • PARADOX cluster 704 CPU; 50 TB • Roma Tre cluster 350 CPUs; 100TB • Catania cluster 800 CPUs; 700 TB • SZTAKI cluster 8 CPUs • PARADOX upgrade 1696 CPU;100 TB • Total: 3.5 kCPU; 0.9 PT
  23. 23. The SemaGrow project • Develop novel algorithms and methods for querying distributed triple stores • Overcome problems stemming from heterogeneity and unbalanced distribution of data • Develop scalable and robust semantic indexing algorithms that can serve detailed and accurate data summaries and other data source annotations about extremely large datasets EDBT Special Track Big Data, Athens, March 2014 23
  24. 24. The SemaGrow Stack • Integrates the components in order to offer a single SPARQL endpoint that federates a number of heterogeneous data sources • Targets the federation of independently provided data sources • Use POWDER to mass-annotate large- subspaces – W3C recommendation, exploits natural groupings of URIs to annotate all resources in a subset of the URI space EDBT Special Track Big Data, Athens, March 2014 24
  25. 25. Moving Forward HARVESTER OAI-PMH Service Provider #1 Schema #1 OAI-PMH Service Provider #n Schema #n INDEXER Aggregated XML Repository Web Portals Open AGRIS (FAO) AgLR/GLN (ARIADNE) Organic.Edunet (UAH) VOA3R (UAH) ... AGRIS AP Schema IEEE LOM Schema DC Schema ... RDF Triple Store Common Schema SPARQL endpoint (Data Source #1) SPARQL endpoint (Data Source #n) INDEXER Web Portals SPARQL endpoint NOW (2012) CASE OF AGRICULTURAL INFRASTRUCTURES 2015 (AgINFRA) CASE OF AGRICULTURAL INFRASTRUCTURES EDBT Special Track Big Data, Athens, March 2014 25
  26. 26. Query Federated endpoint Wrapper SemaGrow SPARQL endpoint Resource Discovery Query results query fragment, Source (#1) Instance Statistics Data Summaries SPARQL endpoint POWDER Inference Layer P-Store Instance Statistics query fragment, target Source transformed query Query Decomposition query patterns Query Results Merger query fragment, Source (#n) query results Client Reactivity parameters Query Decomposer Data Source(s) Selector Ctrl Candidate Source(s) List Instance Statistics Load Info Semantic Proximity Query Transformation Service Schema Mappings SPARQL endpoint (Data Source #n) SPARQL query Ctrl Ctrl Load Info Instance Statistics Data Summaries Set of query patterns Query Pattern Discovery Service equivalent patterns query pattern Semantic Proximity Resource Selector query results schema transformed schema query request #1 query request #n query results SPARQL endpoint (Data Source #1) SPARQL query Query Manager What Semantic Web can bring into the picture • One Data Access Point for the entire Data Cloud – Enabling Service-Data level agreements with Data providers • Application-level Vocabularies / Thesauri / Ontologies – Enabling different application facets for different communities of users over the SAME data pool • Going beyond existing Distributed Triple Store Implementations –Link Heterogeneous but Semantically Connected Data –Index Extremely Large Information Volumes (Peta Sizes) –Improve Information Retrieval response • Data (+Metadata) physically stored in Data Provider – No need for harvesting • Vocabularies / Thesauri / Ontologies of Data Provider choice – No need for aligning according to common schemas EDBT Special Track Big Data, Athens, March 2014 26
  27. 27. SUPPORTING GLOBAL INITIATIVES EDBT Special Track Big Data, Athens, March 2014 27
  28. 28. Global Open Data for Agriculture and Nutrition (GODAN) godan.info EDBT Special Track Big Data, Athens, March 2014 28 Research Data Alliance (RDA) rd-alliance.org Agricultural Data Interoperability Interest Group Wheat Data Interoperability Working Group CIARD - global movement dedicated to open agricultural knowledge www.ciard.net e-Conference on Germplasm Data Interoperability
  29. 29. Thank you! Contact: Andreas Drakos drakos@agroknow.gr

Notes de l'éditeur

  • G-8 International Conference on Open Data for Agriculture: https://sites.google.com/site/g8opendataconference/home
  • http://www.atelier.net/en/trends/articles/farmers-starting-capitalize-big-data-technology_424444
  • Mention Velocity, Variety, Volume, Value, Viscocity, Virality
  • Overcome problems stemming from heterogeneity and from the fact that the distribution of data over nodes is not determined by the needs of better load balancing and more efficient resource discovery, but by data providers

×