SlideShare a Scribd company logo
1 of 24
Download to read offline
January 2010 Beijing, China
                                                          Phil Cryer
                                                          open source systems architect
                                                                  and accidental tourist

Tuesday, January 12, 2010
> BHL hardware architecture


   • storage issues and solutions

   • hardware risks and configuration

   • cloud computing possibilities

   • code and communication




Tuesday, January 12, 2010
> Storage issues


   • more data being created and saved

   • expanding SANs can be expensive

   • storage has not kept up with
     Moore’s Law

   • backups are usually for disaster
     recovery, not redundancy or failover




Tuesday, January 12, 2010
> Storage solutions: distributed storage


   • write once, read anywhere

   • replication and fault tolerance

   • error correction

   • automatic redundancy

   • scalable horizontally




Tuesday, January 12, 2010
> Distributed storage - Options


   • hosted storage (cloud, Amazon S3)

   • self hosted with proprietary hardware and
     software (Sun Thumper)

   • self hosted with commodity hardware and
     open source distributed filesystem
     (GlusterFS, Lustre)




Tuesday, January 12, 2010
> Distributed storage - GlusterFS


   • GlusterFS: a cluster file-system
     capable of scaling to several
     petabytes*

   • open source software on commodity
     hardware

   • simple to install and manage

   • very customizable

   • offers seamless expansion



                                         *(1,024 terabytes = 1 petabyte)


Tuesday, January 12, 2010
> Distributed storage - Archival 


   • Fedora-commons is an open source
     repository (not a Linux distribution)

   • accounts for all changes, so it provides
     built-in version control

   • provides disaster recover

   • open standards to mesh with future file
     formats

   • provides open sharing services such as
     OAI-PMH




Tuesday, January 12, 2010
Tuesday, January 12, 2010
> Distributed storage - Mirrored data


   • now we have redundancy

   • in fact, multiple redundant
     copies

   • provides fault tolerance

   • offers load balancing

   • gives us future geographical
     distribution




Tuesday, January 12, 2010
> Distributed processing


   • more abilities available than just storing
     data

   • with distributed storage comes
     distributed processing

   • distributed processing means faster
     answers

   • faster answers mean new questions

   •     lather, rinse, repeat




Tuesday, January 12, 2010
> Distributed processing


   • make existing data more useful

   • image and OCR re-processing

   • distributed web services

   • identifier resolution pools

   • map/reduce frameworks

   • generate new visualizations, text
     mining, NLP and ???




Tuesday, January 12, 2010
Finder                 TaxonFinder    TaxonFinder          TaxonFinder    TaxonFinder   Taxon

                               WebService                          WebService

                                             Load Balancer

                                             Cluster Node

                                                Cluster

                                                    Site




                                                Request




 Tuesday, January 12, 2010
> Hardware configuration: risks


   • computers crash

   • hard drives die

   • networks fail

   • natural disasters occur


                    but...




Tuesday, January 12, 2010
...so plan for it.



Tuesday, January 12, 2010
> Hardware configurations


   • six 5U sized servers hosted at
     MBL in Woods Hole

   • 8 Gigs RAM in each server

   • 24 / 1.5 TB drives in each server

   • 100 TB storage overall

   • fast connection, far greater
     bandwidth than ever before

   • mirror sets (3 and 3) will be
     stored in different locations

                            but...




Tuesday, January 12, 2010
> Some assembly required (optional)


   • while our example uses new,
     faster commodity hardware...

   • it could run on any hardware that
     can run Linux

   • you could chain together old
     "out dated" computers

   • build your own cluster for next to
     nothing (host it in your basement)

   •    solves some infrastructure funding
        issues and provides a working proof
        of concept

   •    hardware vendor neutrality



Tuesday, January 12, 2010
> Our proof of concept

   • we ran a six box cluster to demonstrate
     GlusterFS clustered filesystem

   • ran Debian/GNU Linux

   • simulated hardware failures

   • synced data with a remote cluster (STL to
     MBL/WH)

   • ran map/reduce jobs (distributed computing)

   • defined procedures, configurations and build
     scripts




Tuesday, January 12, 2010
> Distributed storage - Projected costs




                                 $246,000




                            Graph from Backblaze (http://www.backblaze.com)


Tuesday, January 12, 2010
> Cloud computing

   • BHL is participating in a pilot for
     Duraspace with The New York Public
     Library

   • Duraspace would provide a link to
     cloud providers

   • pilot to show feasibility of hosting

   • testing use of image server, other
     services in the cloud

   • cloud could seed new clusters




Tuesday, January 12, 2010
> Code (63 6f 64 65)

    • our code and configurations are open source, hosted on Google Code

                                           http://code.google.com/p/bhl-bits/
                                                                          phil.cryer@gmail.com | My favorites        | Profile | Sign out


                                    bhl-bits                                                                             Search projects
                                    Open Source code from the various Biodiversity
                                    Heritage Library (BHL) projects
                    Project Home   Downloads             Wiki       Issues      Source        Administer
                   Summary | Updates | People

                  This is the repository for code from various                     Starred (view starred projects)
                  Biodiversity Heritage Library (BHL) projects, all of
                  which are released under various Open Source
                  licenses. Our current portal website runs on .NET,           Code license: New BSD License
                  connects to a MSSQL backend database, and
                  utilizes an Apache Tomcat webapp fronted image               Labels:          bhl, linux, djatoka, imageviewer,
                  viewer. Our new citation web site runs under Linux                            java, asp.net, c-sharp
                  (Debian), Drupal, Apache, PHP and MySQL with help
                  from HAProxy, Solr/Lucene, OpenVZ, KVM and other             Links:    Biodiversity Heritage Library
                  software. We'll be releasing code, as well as the all
                  important configuration files for applications we use        Blogs:    BHL Blog
                  and support on our servers. We're open to any and
                                                                                         BHL updates on Twitter
                  all suggestions from the community, and of course
                  accept any patches that improve our code. We                 Feeds: Project feeds
                  appreciate your interest in our projects.

                  Latest updates                                                                                     People details

                                                                               Project owners:
                      Portal - portal is the .NET based code that
Tuesday, January 12, 2010 our current portal site, Biodiversity                           phil.cryer, mlichtenberg@hotmail.com,
                      powers
> Communication


   • our Projects server is a portal for various projects and ideas

                                          http://projects.biodiversitylibrary.org/
                   Home


                  Ten major natural history museum libraries, botanical libraries, and         Latest projects
                  research institutions have joined to form the Biodiversity Heritage
                  Library Project. The group is developing a strategy and operational           Data Hosting Centre Task Group (12/02/2009 02:24 PM)
                  plan to digitize the published literature of biodiversity held in their
                                                                                                Whilst an unprecedented volume of biodiversity data is
                  respective collections. This literature will be available through a global
                  “biodiversity commons.”                                                       currently being generated worldwide, it is perceived that
                                                                                                significant amounts of data get lost or will be lost after
                  The participating libraries have over two million volumes of                  project closure. To investigate the causes of such loss, and
                  biodiversity literature collected over 200 years to support the work of       recommend strategies as to how biodiversity data can be
                  scientists, researchers, and students in their home institutions and
                  throughout the world.                                                         rescued and archived, through ‘data hosting centres’ a Task
                                                                                                Group on ‘Data Hosting Centres’ will be commissioned.
                  The BHL will provide basic,
                  important content for                                                         Consistant, updated Identity (09/21/2009 03:50 AM)
                  immediate research and for                                                    An ongoing project to update BHL's identity, theme and logo
                  multiple bioinformatics                                                       across BHL Portal and Citebank
                  initiatives. For the first time
                  in history, the core of our                                                   Clustered storage (08/25/2009 05:28 AM)
                  natural history and herbaria
                  library collections will be                                                   The project to build a reproducible clustered file-system
                  available to a truly global                                                   distributed across multiple servers, utilizing GlusterFS, to
                  audience. Web-based                                                           house all of BHL Global content.
                  access to these collections will provide a substantial benefit to people
                  living and working in the developing world -- whether scientists or           Website feedback (06/15/2009 11:11 AM)
                  policymakers. The Biodiversity Heritage Library Project strives to            A project to accept feedback from users to cite.bhl.org
                  establish a major corpus of digitized publications on the Web drawn
                  from the historical biodiversity literature. This material will be            MOBOT djatoka implementation (04/13/2009 04:30 PM)
                  available for open access and responsible use as a part of a global
                  Biodiversity Commons.

                  We will work with the global taxonomic community, rights holders,
                  and other interested parties to ensure that this legacy literature is
                  available to all. The Biodiversity Heritage Library Project must be a
                  multi-institutional project because no single natural history museum
                  or botanical garden library holds the complete corpus of legacy
                  literature, even within the individual sub-domains of taxonomy.
                  However, taken together, the proposed consortium of collections
Tuesday, January 12, 2010
                  represents a uniquely comprehensive assemblage of literature.
> Communication


   • each project has its own wiki, allowing for detailed, how-to instructions

               http://projects.biodiversitylibrary.org/wiki/clusteredfilesystem/

                   INSTALL

                  the following is a continuously updated doc and howto for configuring a BHL cluster node on a
                  current Debian GNU/Linux installation. A breakdown of times involved in the physical building of
                  nodes is available at the build times page

                  System configuration
                  1) upgrade system to Squeeze (Lenny uses kernel 2.26, which doesn't fully support ext4)
                  and enable contrib and non-free repos

                  Copy the next 4 lines, and paste them in as 1:

                     echo "deb http://ftp.us.debian.org/debian/ squeeze main contrib non-free
                     deb-src http://ftp.us.debian.org/debian/ squeeze main
                     deb http://security.debian.org/ squeeze/updates main
                     deb-src http://security.debian.org/ squeeze/updates main" > /etc/apt/sources.list


                  and update the system

                     apt-get update
                     aptitude safe-upgrade

                  and now is a good a time as any to make sure the timezone data is all setup correctly (mine wasn't)

                     dpkg-reconfigure tzdata

                  2) reboot to use latest kernel (if needed)

                     /sbin/shutdown -r now

                  3) install needed utiltities and applications

                     apt-get install build-essential flex bison screen nginx vim-nox git-core subversion saidar

Tuesday, January 12, 2010
> Code (63 6f 64 65)

   • our code and configurations are open
     source and hosted on Google Code

   • our projects server shares detailed
     instructions to get you get started

   • get involved

   • join our mailing-lists (bhl-tech, bhl-bits)

   • ask us questions

   • you can do it, and we will help




Tuesday, January 12, 2010
code http://code.google.com/p/bhl-bits
                        projects http://projects.biodiversitylibrary.org

                        questions phil.cryer@mobot.org




Tuesday, January 12, 2010

More Related Content

What's hot

Issues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineeringIssues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineeringChris Rusbridge
 
Moving an Archive from Tape to Disk: A Case-Study at ICPSR
Moving an Archive from Tape to Disk: A Case-Study at ICPSRMoving an Archive from Tape to Disk: A Case-Study at ICPSR
Moving an Archive from Tape to Disk: A Case-Study at ICPSRBryan Beecher
 
eprints digital library software
eprints digital library softwareeprints digital library software
eprints digital library softwaresonia naomi bandao
 
Digital library software
Digital library softwareDigital library software
Digital library softwareavid
 
ArchivesSpace: Building a Next-Generation Archives Management Tool
ArchivesSpace: Building a Next-Generation Archives Management ToolArchivesSpace: Building a Next-Generation Archives Management Tool
ArchivesSpace: Building a Next-Generation Archives Management ToolMark Matienzo
 
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014Dag Endresen
 
Greenstone Digital Library Software
Greenstone Digital Library SoftwareGreenstone Digital Library Software
Greenstone Digital Library SoftwareMINTUMATHEW8
 
UK LOCKSS Alliance: Today’s scholarly content, secured for tomorrow
UK LOCKSS Alliance: Today’s scholarly content, secured for tomorrow UK LOCKSS Alliance: Today’s scholarly content, secured for tomorrow
UK LOCKSS Alliance: Today’s scholarly content, secured for tomorrow EDINA, University of Edinburgh
 

What's hot (12)

148 john shaw2006fall
148 john shaw2006fall148 john shaw2006fall
148 john shaw2006fall
 
Issues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineeringIssues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineering
 
Moving an Archive from Tape to Disk: A Case-Study at ICPSR
Moving an Archive from Tape to Disk: A Case-Study at ICPSRMoving an Archive from Tape to Disk: A Case-Study at ICPSR
Moving an Archive from Tape to Disk: A Case-Study at ICPSR
 
Komputasi Awan
Komputasi AwanKomputasi Awan
Komputasi Awan
 
eprints digital library software
eprints digital library softwareeprints digital library software
eprints digital library software
 
Digital library software
Digital library softwareDigital library software
Digital library software
 
ArchivesSpace: Building a Next-Generation Archives Management Tool
ArchivesSpace: Building a Next-Generation Archives Management ToolArchivesSpace: Building a Next-Generation Archives Management Tool
ArchivesSpace: Building a Next-Generation Archives Management Tool
 
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
 
Greenstone Digital Library Software
Greenstone Digital Library SoftwareGreenstone Digital Library Software
Greenstone Digital Library Software
 
Dspace
DspaceDspace
Dspace
 
Spark 2013-04-17
Spark 2013-04-17Spark 2013-04-17
Spark 2013-04-17
 
UK LOCKSS Alliance: Today’s scholarly content, secured for tomorrow
UK LOCKSS Alliance: Today’s scholarly content, secured for tomorrow UK LOCKSS Alliance: Today’s scholarly content, secured for tomorrow
UK LOCKSS Alliance: Today’s scholarly content, secured for tomorrow
 

Viewers also liked

Connecting Hardware to Flex (360MAX)
Connecting Hardware to Flex (360MAX)Connecting Hardware to Flex (360MAX)
Connecting Hardware to Flex (360MAX)Justin Mclean
 
10 Lol Cat Laws Of Web Services For Smaller Underfunded Libraries Il2009
10 Lol Cat Laws Of Web Services For Smaller Underfunded Libraries   Il200910 Lol Cat Laws Of Web Services For Smaller Underfunded Libraries   Il2009
10 Lol Cat Laws Of Web Services For Smaller Underfunded Libraries Il2009Sarah Houghton
 
Open Hardware and Libraries
Open Hardware and LibrariesOpen Hardware and Libraries
Open Hardware and LibrariesJason Griffey
 
Lib labreport final2
Lib labreport final2Lib labreport final2
Lib labreport final2Nate Hill
 
ALA Alex
ALA AlexALA Alex
ALA Alexabelden
 
Fact finding : REFERENCE AND INFORMATION SERVICES IN INFORMATION AGENCIES (I...
Fact finding : REFERENCE AND INFORMATION SERVICES IN INFORMATION AGENCIES  (I...Fact finding : REFERENCE AND INFORMATION SERVICES IN INFORMATION AGENCIES  (I...
Fact finding : REFERENCE AND INFORMATION SERVICES IN INFORMATION AGENCIES (I...Kumprinx Amin
 
UiTM Digital Library Services
UiTM Digital Library ServicesUiTM Digital Library Services
UiTM Digital Library ServicesAhmad Faizar
 
DIGITAL LIBRARY ARCHITECTURE
DIGITAL LIBRARY ARCHITECTUREDIGITAL LIBRARY ARCHITECTURE
DIGITAL LIBRARY ARCHITECTUREsarika meher
 
Rfid for library management system printronix
Rfid for library management system printronixRfid for library management system printronix
Rfid for library management system printronixBlaze_Hyd
 
Library management system
Library management systemLibrary management system
Library management systemRaaghav Bhatia
 

Viewers also liked (13)

Connecting Hardware to Flex (360MAX)
Connecting Hardware to Flex (360MAX)Connecting Hardware to Flex (360MAX)
Connecting Hardware to Flex (360MAX)
 
10 Lol Cat Laws Of Web Services For Smaller Underfunded Libraries Il2009
10 Lol Cat Laws Of Web Services For Smaller Underfunded Libraries   Il200910 Lol Cat Laws Of Web Services For Smaller Underfunded Libraries   Il2009
10 Lol Cat Laws Of Web Services For Smaller Underfunded Libraries Il2009
 
Open Hardware and Libraries
Open Hardware and LibrariesOpen Hardware and Libraries
Open Hardware and Libraries
 
Sparklet - Embedded GUI Library
Sparklet - Embedded GUI LibrarySparklet - Embedded GUI Library
Sparklet - Embedded GUI Library
 
Lib labreport final2
Lib labreport final2Lib labreport final2
Lib labreport final2
 
ALA Alex
ALA AlexALA Alex
ALA Alex
 
Fact finding : REFERENCE AND INFORMATION SERVICES IN INFORMATION AGENCIES (I...
Fact finding : REFERENCE AND INFORMATION SERVICES IN INFORMATION AGENCIES  (I...Fact finding : REFERENCE AND INFORMATION SERVICES IN INFORMATION AGENCIES  (I...
Fact finding : REFERENCE AND INFORMATION SERVICES IN INFORMATION AGENCIES (I...
 
Digital library
Digital libraryDigital library
Digital library
 
UiTM Digital Library Services
UiTM Digital Library ServicesUiTM Digital Library Services
UiTM Digital Library Services
 
DIGITAL LIBRARY ARCHITECTURE
DIGITAL LIBRARY ARCHITECTUREDIGITAL LIBRARY ARCHITECTURE
DIGITAL LIBRARY ARCHITECTURE
 
Rfid for library management system printronix
Rfid for library management system printronixRfid for library management system printronix
Rfid for library management system printronix
 
Master Defense Seminar
Master Defense SeminarMaster Defense Seminar
Master Defense Seminar
 
Library management system
Library management systemLibrary management system
Library management system
 

Similar to BHL hardware architecture - storage and clusters

Desktop as a Service supporting Environmental ‘omics
Desktop as a Service supporting Environmental ‘omicsDesktop as a Service supporting Environmental ‘omics
Desktop as a Service supporting Environmental ‘omicsDavid Wallom
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it worldChris Dwan
 
Storing and distributing data
Storing and distributing dataStoring and distributing data
Storing and distributing dataPhil Cryer
 
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardUsing Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardDocker, Inc.
 
Clustered and distributed
 storage with
 commodity hardware 
and open source ...
Clustered and distributed
 storage with
 commodity hardware 
and open source ...Clustered and distributed
 storage with
 commodity hardware 
and open source ...
Clustered and distributed
 storage with
 commodity hardware 
and open source ...Phil Cryer
 
Other distributed systems
Other distributed systemsOther distributed systems
Other distributed systemsSri Prasanna
 
Building A Scalable Open Source Storage Solution
Building A Scalable Open Source Storage SolutionBuilding A Scalable Open Source Storage Solution
Building A Scalable Open Source Storage SolutionPhil Cryer
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?Ivan Herman
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentationekansa
 
BHL in the Cloud: A Pilot Project with DuraCloud
BHL in the Cloud: A Pilot Project with DuraCloudBHL in the Cloud: A Pilot Project with DuraCloud
BHL in the Cloud: A Pilot Project with DuraCloudChris Freeland
 
The Reality of the Cloud: Implications of Cloud Computing for Mobile Library ...
The Reality of the Cloud: Implications of Cloud Computing for Mobile Library ...The Reality of the Cloud: Implications of Cloud Computing for Mobile Library ...
The Reality of the Cloud: Implications of Cloud Computing for Mobile Library ...University of Missouri
 
Biodiversity Heritiage Library: progress and process
Biodiversity Heritiage Library: progress and processBiodiversity Heritiage Library: progress and process
Biodiversity Heritiage Library: progress and processPhil Cryer
 
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...VMware Tanzu
 
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...VMware Tanzu
 
Cloud Foundry and OpenStack – Marriage Made in Heaven !
Cloud Foundry and OpenStack – Marriage Made in Heaven !Cloud Foundry and OpenStack – Marriage Made in Heaven !
Cloud Foundry and OpenStack – Marriage Made in Heaven ! Animesh Singh
 
Open stack swift_essex_meetup_2012_06_21_judd_maltin
Open stack swift_essex_meetup_2012_06_21_judd_maltinOpen stack swift_essex_meetup_2012_06_21_judd_maltin
Open stack swift_essex_meetup_2012_06_21_judd_maltinKamesh Pemmaraju
 
Accesso ai dati con Azure Data Platform
Accesso ai dati con Azure Data PlatformAccesso ai dati con Azure Data Platform
Accesso ai dati con Azure Data PlatformLuca Di Fino
 

Similar to BHL hardware architecture - storage and clusters (20)

Desktop as a Service supporting Environmental ‘omics
Desktop as a Service supporting Environmental ‘omicsDesktop as a Service supporting Environmental ‘omics
Desktop as a Service supporting Environmental ‘omics
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
 
Storing and distributing data
Storing and distributing dataStoring and distributing data
Storing and distributing data
 
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardUsing Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
 
Clustered and distributed
 storage with
 commodity hardware 
and open source ...
Clustered and distributed
 storage with
 commodity hardware 
and open source ...Clustered and distributed
 storage with
 commodity hardware 
and open source ...
Clustered and distributed
 storage with
 commodity hardware 
and open source ...
 
Other distributed systems
Other distributed systemsOther distributed systems
Other distributed systems
 
Building A Scalable Open Source Storage Solution
Building A Scalable Open Source Storage SolutionBuilding A Scalable Open Source Storage Solution
Building A Scalable Open Source Storage Solution
 
Climb bath
Climb bathClimb bath
Climb bath
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentation
 
BHL in the Cloud: A Pilot Project with DuraCloud
BHL in the Cloud: A Pilot Project with DuraCloudBHL in the Cloud: A Pilot Project with DuraCloud
BHL in the Cloud: A Pilot Project with DuraCloud
 
The Reality of the Cloud: Implications of Cloud Computing for Mobile Library ...
The Reality of the Cloud: Implications of Cloud Computing for Mobile Library ...The Reality of the Cloud: Implications of Cloud Computing for Mobile Library ...
The Reality of the Cloud: Implications of Cloud Computing for Mobile Library ...
 
Biodiversity Heritiage Library: progress and process
Biodiversity Heritiage Library: progress and processBiodiversity Heritiage Library: progress and process
Biodiversity Heritiage Library: progress and process
 
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
 
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
 
Cloud Foundry and OpenStack – Marriage Made in Heaven !
Cloud Foundry and OpenStack – Marriage Made in Heaven !Cloud Foundry and OpenStack – Marriage Made in Heaven !
Cloud Foundry and OpenStack – Marriage Made in Heaven !
 
Open stack swift_essex_meetup_2012_06_21_judd_maltin
Open stack swift_essex_meetup_2012_06_21_judd_maltinOpen stack swift_essex_meetup_2012_06_21_judd_maltin
Open stack swift_essex_meetup_2012_06_21_judd_maltin
 
Accesso ai dati con Azure Data Platform
Accesso ai dati con Azure Data PlatformAccesso ai dati con Azure Data Platform
Accesso ai dati con Azure Data Platform
 
Cavity Data
Cavity DataCavity Data
Cavity Data
 

More from Phil Cryer

Getting started with Mantl
Getting started with MantlGetting started with Mantl
Getting started with MantlPhil Cryer
 
Pets versus Cattle: servers evolved
Pets versus Cattle: servers evolvedPets versus Cattle: servers evolved
Pets versus Cattle: servers evolvedPhil Cryer
 
Moving towards unified logging
Moving towards unified loggingMoving towards unified logging
Moving towards unified loggingPhil Cryer
 
What if Petraeus Was a Hacker?
What if Petraeus Was a Hacker?What if Petraeus Was a Hacker?
What if Petraeus Was a Hacker?Phil Cryer
 
What if Petraeus was a hacker? Email privacy for the rest of us
What if Petraeus was a hacker? Email privacy for the rest of usWhat if Petraeus was a hacker? Email privacy for the rest of us
What if Petraeus was a hacker? Email privacy for the rest of usPhil Cryer
 
Online privacy concerns (and what we can do about it)
Online privacy concerns (and what we can do about it)Online privacy concerns (and what we can do about it)
Online privacy concerns (and what we can do about it)Phil Cryer
 
Online Privacy in the Year of the Dragon
Online Privacy in the Year of the DragonOnline Privacy in the Year of the Dragon
Online Privacy in the Year of the DragonPhil Cryer
 
Is your data secure? privacy and trust in the social web
Is your data secure?  privacy and trust in the social webIs your data secure?  privacy and trust in the social web
Is your data secure? privacy and trust in the social webPhil Cryer
 
Adoption of Persistent Identifiers for Biodiversity Informatics
Adoption of Persistent Identifiers for Biodiversity InformaticsAdoption of Persistent Identifiers for Biodiversity Informatics
Adoption of Persistent Identifiers for Biodiversity InformaticsPhil Cryer
 
Data hosting infrastructure for primary biodiversity data
Data hosting infrastructure for primary biodiversity dataData hosting infrastructure for primary biodiversity data
Data hosting infrastructure for primary biodiversity dataPhil Cryer
 
GBIF (Global Biodiversity Information Facility) Position Paper: Data Hosting ...
GBIF (Global Biodiversity Information Facility) Position Paper: Data Hosting ...GBIF (Global Biodiversity Information Facility) Position Paper: Data Hosting ...
GBIF (Global Biodiversity Information Facility) Position Paper: Data Hosting ...Phil Cryer
 
Taking your ball and going home
Taking your ball and going homeTaking your ball and going home
Taking your ball and going homePhil Cryer
 
Building Toward an Open and Extensible Autonomous Computing Platform Utilizi...
Building Toward an Open and Extensible  Autonomous Computing Platform Utilizi...Building Toward an Open and Extensible  Autonomous Computing Platform Utilizi...
Building Toward an Open and Extensible Autonomous Computing Platform Utilizi...Phil Cryer
 
Updates on the BHL Global Cluster
Updates on the BHL Global ClusterUpdates on the BHL Global Cluster
Updates on the BHL Global ClusterPhil Cryer
 
Biodiversity Heritage Library Articles Demo
Biodiversity Heritage Library Articles DemoBiodiversity Heritage Library Articles Demo
Biodiversity Heritage Library Articles DemoPhil Cryer
 
Using Fedora Commons To Create A Persistent Archive
Using Fedora Commons To Create A Persistent ArchiveUsing Fedora Commons To Create A Persistent Archive
Using Fedora Commons To Create A Persistent ArchivePhil Cryer
 

More from Phil Cryer (16)

Getting started with Mantl
Getting started with MantlGetting started with Mantl
Getting started with Mantl
 
Pets versus Cattle: servers evolved
Pets versus Cattle: servers evolvedPets versus Cattle: servers evolved
Pets versus Cattle: servers evolved
 
Moving towards unified logging
Moving towards unified loggingMoving towards unified logging
Moving towards unified logging
 
What if Petraeus Was a Hacker?
What if Petraeus Was a Hacker?What if Petraeus Was a Hacker?
What if Petraeus Was a Hacker?
 
What if Petraeus was a hacker? Email privacy for the rest of us
What if Petraeus was a hacker? Email privacy for the rest of usWhat if Petraeus was a hacker? Email privacy for the rest of us
What if Petraeus was a hacker? Email privacy for the rest of us
 
Online privacy concerns (and what we can do about it)
Online privacy concerns (and what we can do about it)Online privacy concerns (and what we can do about it)
Online privacy concerns (and what we can do about it)
 
Online Privacy in the Year of the Dragon
Online Privacy in the Year of the DragonOnline Privacy in the Year of the Dragon
Online Privacy in the Year of the Dragon
 
Is your data secure? privacy and trust in the social web
Is your data secure?  privacy and trust in the social webIs your data secure?  privacy and trust in the social web
Is your data secure? privacy and trust in the social web
 
Adoption of Persistent Identifiers for Biodiversity Informatics
Adoption of Persistent Identifiers for Biodiversity InformaticsAdoption of Persistent Identifiers for Biodiversity Informatics
Adoption of Persistent Identifiers for Biodiversity Informatics
 
Data hosting infrastructure for primary biodiversity data
Data hosting infrastructure for primary biodiversity dataData hosting infrastructure for primary biodiversity data
Data hosting infrastructure for primary biodiversity data
 
GBIF (Global Biodiversity Information Facility) Position Paper: Data Hosting ...
GBIF (Global Biodiversity Information Facility) Position Paper: Data Hosting ...GBIF (Global Biodiversity Information Facility) Position Paper: Data Hosting ...
GBIF (Global Biodiversity Information Facility) Position Paper: Data Hosting ...
 
Taking your ball and going home
Taking your ball and going homeTaking your ball and going home
Taking your ball and going home
 
Building Toward an Open and Extensible Autonomous Computing Platform Utilizi...
Building Toward an Open and Extensible  Autonomous Computing Platform Utilizi...Building Toward an Open and Extensible  Autonomous Computing Platform Utilizi...
Building Toward an Open and Extensible Autonomous Computing Platform Utilizi...
 
Updates on the BHL Global Cluster
Updates on the BHL Global ClusterUpdates on the BHL Global Cluster
Updates on the BHL Global Cluster
 
Biodiversity Heritage Library Articles Demo
Biodiversity Heritage Library Articles DemoBiodiversity Heritage Library Articles Demo
Biodiversity Heritage Library Articles Demo
 
Using Fedora Commons To Create A Persistent Archive
Using Fedora Commons To Create A Persistent ArchiveUsing Fedora Commons To Create A Persistent Archive
Using Fedora Commons To Create A Persistent Archive
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

BHL hardware architecture - storage and clusters

  • 1. January 2010 Beijing, China Phil Cryer open source systems architect and accidental tourist Tuesday, January 12, 2010
  • 2. > BHL hardware architecture • storage issues and solutions • hardware risks and configuration • cloud computing possibilities • code and communication Tuesday, January 12, 2010
  • 3. > Storage issues • more data being created and saved • expanding SANs can be expensive • storage has not kept up with Moore’s Law • backups are usually for disaster recovery, not redundancy or failover Tuesday, January 12, 2010
  • 4. > Storage solutions: distributed storage • write once, read anywhere • replication and fault tolerance • error correction • automatic redundancy • scalable horizontally Tuesday, January 12, 2010
  • 5. > Distributed storage - Options • hosted storage (cloud, Amazon S3) • self hosted with proprietary hardware and software (Sun Thumper) • self hosted with commodity hardware and open source distributed filesystem (GlusterFS, Lustre) Tuesday, January 12, 2010
  • 6. > Distributed storage - GlusterFS • GlusterFS: a cluster file-system capable of scaling to several petabytes* • open source software on commodity hardware • simple to install and manage • very customizable • offers seamless expansion *(1,024 terabytes = 1 petabyte) Tuesday, January 12, 2010
  • 7. > Distributed storage - Archival  • Fedora-commons is an open source repository (not a Linux distribution) • accounts for all changes, so it provides built-in version control • provides disaster recover • open standards to mesh with future file formats • provides open sharing services such as OAI-PMH Tuesday, January 12, 2010
  • 9. > Distributed storage - Mirrored data • now we have redundancy • in fact, multiple redundant copies • provides fault tolerance • offers load balancing • gives us future geographical distribution Tuesday, January 12, 2010
  • 10. > Distributed processing • more abilities available than just storing data • with distributed storage comes distributed processing • distributed processing means faster answers • faster answers mean new questions • lather, rinse, repeat Tuesday, January 12, 2010
  • 11. > Distributed processing • make existing data more useful • image and OCR re-processing • distributed web services • identifier resolution pools • map/reduce frameworks • generate new visualizations, text mining, NLP and ??? Tuesday, January 12, 2010
  • 12. Finder TaxonFinder TaxonFinder TaxonFinder TaxonFinder Taxon WebService WebService Load Balancer Cluster Node Cluster Site Request Tuesday, January 12, 2010
  • 13. > Hardware configuration: risks • computers crash • hard drives die • networks fail • natural disasters occur but... Tuesday, January 12, 2010
  • 14. ...so plan for it. Tuesday, January 12, 2010
  • 15. > Hardware configurations • six 5U sized servers hosted at MBL in Woods Hole • 8 Gigs RAM in each server • 24 / 1.5 TB drives in each server • 100 TB storage overall • fast connection, far greater bandwidth than ever before • mirror sets (3 and 3) will be stored in different locations but... Tuesday, January 12, 2010
  • 16. > Some assembly required (optional) • while our example uses new, faster commodity hardware... • it could run on any hardware that can run Linux • you could chain together old "out dated" computers • build your own cluster for next to nothing (host it in your basement) • solves some infrastructure funding issues and provides a working proof of concept • hardware vendor neutrality Tuesday, January 12, 2010
  • 17. > Our proof of concept • we ran a six box cluster to demonstrate GlusterFS clustered filesystem • ran Debian/GNU Linux • simulated hardware failures • synced data with a remote cluster (STL to MBL/WH) • ran map/reduce jobs (distributed computing) • defined procedures, configurations and build scripts Tuesday, January 12, 2010
  • 18. > Distributed storage - Projected costs $246,000 Graph from Backblaze (http://www.backblaze.com) Tuesday, January 12, 2010
  • 19. > Cloud computing • BHL is participating in a pilot for Duraspace with The New York Public Library • Duraspace would provide a link to cloud providers • pilot to show feasibility of hosting • testing use of image server, other services in the cloud • cloud could seed new clusters Tuesday, January 12, 2010
  • 20. > Code (63 6f 64 65) • our code and configurations are open source, hosted on Google Code http://code.google.com/p/bhl-bits/ phil.cryer@gmail.com | My favorites | Profile | Sign out bhl-bits Search projects Open Source code from the various Biodiversity Heritage Library (BHL) projects Project Home Downloads Wiki Issues Source Administer Summary | Updates | People This is the repository for code from various Starred (view starred projects) Biodiversity Heritage Library (BHL) projects, all of which are released under various Open Source licenses. Our current portal website runs on .NET, Code license: New BSD License connects to a MSSQL backend database, and utilizes an Apache Tomcat webapp fronted image Labels: bhl, linux, djatoka, imageviewer, viewer. Our new citation web site runs under Linux java, asp.net, c-sharp (Debian), Drupal, Apache, PHP and MySQL with help from HAProxy, Solr/Lucene, OpenVZ, KVM and other Links: Biodiversity Heritage Library software. We'll be releasing code, as well as the all important configuration files for applications we use Blogs: BHL Blog and support on our servers. We're open to any and BHL updates on Twitter all suggestions from the community, and of course accept any patches that improve our code. We Feeds: Project feeds appreciate your interest in our projects. Latest updates People details Project owners: Portal - portal is the .NET based code that Tuesday, January 12, 2010 our current portal site, Biodiversity phil.cryer, mlichtenberg@hotmail.com, powers
  • 21. > Communication • our Projects server is a portal for various projects and ideas http://projects.biodiversitylibrary.org/ Home Ten major natural history museum libraries, botanical libraries, and Latest projects research institutions have joined to form the Biodiversity Heritage Library Project. The group is developing a strategy and operational Data Hosting Centre Task Group (12/02/2009 02:24 PM) plan to digitize the published literature of biodiversity held in their Whilst an unprecedented volume of biodiversity data is respective collections. This literature will be available through a global “biodiversity commons.” currently being generated worldwide, it is perceived that significant amounts of data get lost or will be lost after The participating libraries have over two million volumes of project closure. To investigate the causes of such loss, and biodiversity literature collected over 200 years to support the work of recommend strategies as to how biodiversity data can be scientists, researchers, and students in their home institutions and throughout the world. rescued and archived, through ‘data hosting centres’ a Task Group on ‘Data Hosting Centres’ will be commissioned. The BHL will provide basic, important content for Consistant, updated Identity (09/21/2009 03:50 AM) immediate research and for An ongoing project to update BHL's identity, theme and logo multiple bioinformatics across BHL Portal and Citebank initiatives. For the first time in history, the core of our Clustered storage (08/25/2009 05:28 AM) natural history and herbaria library collections will be The project to build a reproducible clustered file-system available to a truly global distributed across multiple servers, utilizing GlusterFS, to audience. Web-based house all of BHL Global content. access to these collections will provide a substantial benefit to people living and working in the developing world -- whether scientists or Website feedback (06/15/2009 11:11 AM) policymakers. The Biodiversity Heritage Library Project strives to A project to accept feedback from users to cite.bhl.org establish a major corpus of digitized publications on the Web drawn from the historical biodiversity literature. This material will be MOBOT djatoka implementation (04/13/2009 04:30 PM) available for open access and responsible use as a part of a global Biodiversity Commons. We will work with the global taxonomic community, rights holders, and other interested parties to ensure that this legacy literature is available to all. The Biodiversity Heritage Library Project must be a multi-institutional project because no single natural history museum or botanical garden library holds the complete corpus of legacy literature, even within the individual sub-domains of taxonomy. However, taken together, the proposed consortium of collections Tuesday, January 12, 2010 represents a uniquely comprehensive assemblage of literature.
  • 22. > Communication • each project has its own wiki, allowing for detailed, how-to instructions http://projects.biodiversitylibrary.org/wiki/clusteredfilesystem/ INSTALL the following is a continuously updated doc and howto for configuring a BHL cluster node on a current Debian GNU/Linux installation. A breakdown of times involved in the physical building of nodes is available at the build times page System configuration 1) upgrade system to Squeeze (Lenny uses kernel 2.26, which doesn't fully support ext4) and enable contrib and non-free repos Copy the next 4 lines, and paste them in as 1: echo "deb http://ftp.us.debian.org/debian/ squeeze main contrib non-free deb-src http://ftp.us.debian.org/debian/ squeeze main deb http://security.debian.org/ squeeze/updates main deb-src http://security.debian.org/ squeeze/updates main" > /etc/apt/sources.list and update the system apt-get update aptitude safe-upgrade and now is a good a time as any to make sure the timezone data is all setup correctly (mine wasn't) dpkg-reconfigure tzdata 2) reboot to use latest kernel (if needed) /sbin/shutdown -r now 3) install needed utiltities and applications apt-get install build-essential flex bison screen nginx vim-nox git-core subversion saidar Tuesday, January 12, 2010
  • 23. > Code (63 6f 64 65) • our code and configurations are open source and hosted on Google Code • our projects server shares detailed instructions to get you get started • get involved • join our mailing-lists (bhl-tech, bhl-bits) • ask us questions • you can do it, and we will help Tuesday, January 12, 2010
  • 24. code http://code.google.com/p/bhl-bits projects http://projects.biodiversitylibrary.org questions phil.cryer@mobot.org Tuesday, January 12, 2010