Biodiversity Data Publishing Software for the Stockholm Biodiversity Informatics Symposium 2008 (SBIS2008). Stockholm, 3rd December 2008. Dag Endresen (Bioversity/NordGen).
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
Data exchange alternatives, SBIS conference in Stockholm (2008)
1. Biodiversity Data Provider Software Hands-on exercises with TAPIR Stockholm Biodiversity Informatics Symposium 2008 (SBIS2008) Dag Terje Filip Endresen, Nordic Genetic Resources Center (Sweden) / Bioversity International (Italy)
2. Fallacies of Distributed Computing The network is reliable. Latency is zero. Bandwidth is infinite. The network is secure. Topology doesn't change. There is one administrator. Transport cost is zero. The network is homogeneous. This list of fallacies came about at Sun Microsystems around 1994. 2
4. TAPIR TAPIR - TDWG Access Protocol for Information Retrieval. During the 2004 TDWG meeting in Christchurch, NZ, work started on a unified protocol and named TAPIR. TAPIR is based on the protocol from the two data provider software, BioCASE and DiGIR. 4
6. BioCASE 2.5.ORC 6 The BioCASE provider software is a product of the EU funded BioCASE project (2001-2004). Developed at BGBM in Berlin. Last updated in April 2008, with support for Python version 2.5 and less required external Implement the BioCASE provider to share data as ABCD 2.06. http://www.biocase.org
7. 1. Make sure you have Python 2.5 installed (command line: python –v) 2.Download the latest provider software from http://www.biocase.org 3. Uncompress the BioCASE provider software to a folder on your system [provider_software_2.5.0RC.tar.gz] (tar –xzvf provider_...tar.gz) 4. Run setup.py, (python setup.py) 5. Configure your web server to mount biocase/www as http://localhost/biocase/ Hint: You will find an example for httpd.conf as the last terminal output from running setup.py 7 BioCASE 2.5.ORC
8. BioCASE 2.5.ORC 6. Visit the library test page: http://localhost/biocase/utilities/testlibs.cgi 6a. Download latest 4 Suite from http://4suite.org/ Uncompress and install [4Suite-XML-1.0.2.tar.bz2] 6b. Install additional python libraries, including the desired database driver. For each python package: (python setup.py install) 6c. Graphviz is useful to visualize the database table structure. 8
13. BioCASE 2.5.ORC 8. Query Form The manual query form is illustrative for understanding exactly how the wrapper software works! http://localhost/biocase/utilities/queryforms/qf_manual.cgi?dsa=sesto 10
16. PyWrapper 3.1.0 alpha (development version, works with Python 2.5)PyWrapper is tested and verified to work fine with Windows, Mac OS X and Linux. 12
17. Required configuration Web server: Any CGI compliant web server: Apache, IIS etc. (The built in CherryPy web server can also be used). Database: Major databases are supported, including MySQL, Oracle, SQLServer, Sybase, Access, PostgreSQL. Theoretically any database with a Python library should work. Python PyWrapper is developed with the Python programming language. (The latest version from the SVN code repository works with Python version 2.5) 13 Apache, MySQL and Python are open source software, free to use - even for commercial products.
18. Installation http://trac.pywrapper.org/pywrapper/wiki/InstallationGuide 1. Download the latest PyWrapper3 installer. Use SVN export or checkout for Python 2.5 support 2. Uncompress to a folder of your choice. Example: “/usr/local/pywrapper3/” Example: “C:ywrapper Local installation: If you have a Subversion client installed, you may use the automatic installer. (Local Python and libraries are installed to your pywrapper folder) promt$ svn export svn://svn.pywrapper.org:80/pywrapper/trunk pywrapper promt$ cd pywrapper/tools promt$ /bin/shinstall.sh This will require that you have a bash shell, and probably that you have a Unix line system like e.g. FreeBSD, Linux or Mac OS X… 3. Execute: pywrapper/setup.py Example: promt$ python setup.py (Mac OS X, Linux) On Windows locate setup.py and double-click 14
19. Start standalone server Execute start_server.py(default port is 8080) promt$ cdwebapp/ promt$ ./start_server.py 8088(example to start on port 8088) On a Windows system you may do this in a MS-DOS window (or double-click the file - if you accept the default port). Some messages will pass across your screen. Please be patient, this could take a minute. Wait for the message “start server …” and find find PyWrapper at: http://localhost:8088/pywrapper 15
20. Configuration After successful installation, you will need to configure your data provider. Follow the instructions from the PyWrapper documentation web page to configure. Data sources. If you provide more datasets or several databases they will be configured as individual data sources (dsa). Database connection. For PyWrapper to access your database. Database structure. Define the relevant database tables, the primary keys and foreign keys. Data model. Map your database model to the standard represented by the XML Schemas you choose. http://trac.pywrapper.org/pywrapper/wiki/Documentation 16
21. Screen examples PyWrapper comes with a graphical web based configuration tool For more information and more screen dumps from the configuration of PyWrapper, see: http://circa.gbif.net/Public/irc/gbif/nodes/library?l=/meetings/2007_10_amsterdam/tapir_pywrapper3/_EN_1.0_&a=i 17
23. TapirLink 0.6.1 Home: http://wiki.tdwg.org/twiki/bin/view/TAPIR/TapirLink Primary developers: Renato De Giovanni, Dave Vieglais Download: http://sourceforge.net/project/showfiles.php?group_id=38190 Uncompress PHP source code Eg: /usr/local/tapir/tapirlink Mount admin and www directory for your web server. Example: Apache “httpd.conf” Alias /tapirlink "/usr/local/tapir/tapirlink/www” Alias /tapirlink-admin "/usr/local/tapir/tapirlink/admin" <Location /tapirlink> Order allow,deny Allow from all </Location> <Location /tapirlink-admin> Order allow,deny Allow from all </Location> Read permissions on all directories Write on cache, config, log, statistics 19
24. TapirLink 0.6.1 Start by adding a new resource http://localhost/tapirlink-admin/ Step 1: Describe your new resource 20
30. TapirLink 0.6.1 Test resource with client form: http://localhost/tapirlink/tapir_client.php The XML Client form is very illustrative for understanding exactly how the wrapper software works! 26
38. Example of OAI-PMH service request OAI-PMH requests are expressed as HTTP requests. OAI-PMH requests must be submitted using either the HTTP GET or POST methods. http://an.oa.org/OAI-script?verb=GetRecord &identifier=oai:arXiv.org:hep-th/9901001 &metadataPrefix=oai_dc 34
39. Example of OAI-PMH service RESPONSE OAI-PMH responses are formatted as HTTP responses. With The Content-Type as text/xml. 35
40. OAI-PMH PROTOCOL, metadata formats 36 Request types (verb): Identify ListMetadataFormats ListSets GetRecord ListIdentifiers ListRecords For purposes of interoperability, the metadataPrefix `oai_dc’ is reserved for Dublin Core. Communities adopt own metadataPrefixesfor own metadata fomats. Relevant formats/schemas for Biodiversity Informatics are Darwin Core and ABCD.
45. 41 Decentralized network GBIF (Global Biodiversity Information Facility) ALIS (Accession Level Information System) USER Svalbard Global Seed Vault (Safe Backup) (USDA ARS National Germplasm Repositories...) Web Services USDA GRIN (USA) SINGER (CGIAR) (CGIAR International Future Harvest gene banks...) EURISCO (Europe) MCPD IHAR (Poland) WUR CGN (Netherlands) NordGen (Northern Europe) IPK Gatersleben (Germany) (Other European gene banks...)
46. 42 Crop Wild Relatives LKA ARM BOL National Datasets are shared with the central CWR data index. The national datasets as well as access to other International datasets are provided from the CWR data portal. MDG EURISCO UZB http://www.cropwildrelatives.org SINGER
53. 49 Outlook The compatibility of data standards between PGR and biodiversity collections made it possible to integrate the worldwide germplasm collections into the biodiversity community. Using GBIF technology (and contributing to its development), the PGR community can easily establish specific PGR networks without duplicating GBIF's work. Use of GBIF technology and integration of PGR collection data into GBIF allows PGR users to simultaneously search PGR collections and other biodiversity collections, and to get access to the data (and possibly the material) of relevant biodiversity collections. The establishment of new data portals on a specific crop, a regional thematic network or similar subset of the total global biodiversity datasets; can be done with rather few efforts! This requires only that all the relevant datasets are provided by GBIF compatible web services (like the BioCASE PyWrapper).
54. Participation and the sharing of your institute datasets with global and national biodiversity projects is important for your public and scientific visibility, promoting the use (usefulness) of your data and ultimately for the continued funding of your institutional activities. 50
55. Special thanks to Bioversity International [http://www.bioversityinternational.org] GBIF, Global Biodiversity Information Facility [http://www.gbif.org] BioCASE, The Biological Collection Access Service for Europe. [http://www.biocase.org] TDWG, Biodiversity Information Standards [http://www.tdwg.org] 51
56. Special thanks to BioCASE and PyWrapper3 software Markus Döring Javier de la Torre DiGIR and TapirLink software Renato de Giovanni Dave Vieglais 52
Image source: University of Ottawa, Distributed Computing Research Group: http://www.genie.uottawa.ca/research/rsrch_site.php?lang=e&id=90 (Google Images).See also: http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing
More details see:GBIF NODES meeting 2007 in Amsterdam.Agenda 09 Technical Training session - TAPIR/PyWrapper3:http://circa.gbif.net/Public/irc/gbif/nodes/library?l=/meetings/2007_10_amsterdam/tapir_pywrapper3/_EN_1.0_&a=i
More details see:GBIF NODES meeting 2007 in Amsterdam.Agenda 09 Technical Training session - TAPIR/PyWrapper3:http://circa.gbif.net/Public/irc/gbif/nodes/library?l=/meetings/2007_10_amsterdam/tapir_pywrapper3/_EN_1.0_&a=i
In some cases with proxies:svn co svn://svn.pywrapper.org/pywrapper/trunk pywrapper
More details see:GBIFNODES meeting 2007 in Amsterdam.Agenda 09 Technical Training session - TAPIR/PyWrapper3:http://circa.gbif.net/Public/irc/gbif/nodes/library?l=/meetings/2007_10_amsterdam/tapir_pywrapper3/_EN_1.0_&a=i
IMAGE source: http://commons.wikimedia.org/wiki/Image:Handshake_(Workshop_Cologne_%2706).jpeg; Copyright: GNU Public Licence