SlideShare une entreprise Scribd logo
1  sur  26
Building Metadata
Aggregation
Services for
Resource Discovery
Paul Walk
                                                     UKOLN is supported by:
p.walk@ukoln.ac.uk



     www.ukoln.ac.uk
     A centre of expertise in digital information management
aggregating
 metadata
              2
why aggregate metadata?
• to address systems/network latency - a cache
 •   supporting resource-discovery

• for ‘Web Scale concentration’
 •   ‘gaming’ Google - raising ‘visibility’ of content

 •   network effects if user facing services also developed

• to showcase resources
• to create middleman business opportunities
• as infrastructure to support 3rd-party services
• as an approach to preservation
                                                              3
patterns
• harvest from network, aggregate and re-expose
 •   discovery.ac.uk, Europeana, RepUK

• collect from offline sources and make available in
  aggregate on the network
 •   Collections Trust (UK)

• harvest without re-exposing, build services on
  top of aggregation
 •   Google et.al.

• expose as a ‘data dump’, or expose through an
  API
                                                      4
the big question facing
data providers:

do you want to provide a
data service, or just data?

                              5
current work
  in the UK
               6
• a metadata ‘ecosystem’
 • aggregation is a major component
 • preparing resources for aggregation
 • http://www.discovery.ac.uk
                                         7
• support innovation
• develop some ‘business intelligence’
• develop infrastructure component for services   8
issues with
aggregation
              9
distribution
• state management is a challenge! (deletions, changes)
• aggregation of aggregations is consequently non-trivial
  •   e.g. federated models

• linking?
  •   should records in an aggregation ever be the target of a link?
      Or, should such links point to the source?

  •   can/should we make aggregations into Google-friendly targets?

  •   if we succeed with SEO, are we undermining source
      repositories?

• ‘attribution stacking’
  data-protocol/)
                              (http://sciencecommons.org/projects/publishing/open-access-




                                                                                            10
openness and usability
• ‘open’ in danger of becoming synonymous with
  ‘permissively licensed’

• can be both ‘open’ but very difficult to use
 •   needs periodic review - right now SPARQL is barrier to wide
     adoption

 •   remember all those SOAP interfaces....

 •   a well supported API might be more open than a completely
     freely available dump of gigabytes (or more) of data in the sense
     that it might allow open engagement from more people

• we need a richer understanding of openness

                                                                         11
in other words...


           be open, usefully


                               12
character encodings....
• huge number of XML records
  from UK IRs are invalid due
  to character encoding
  issues....




     • there is a special
       place in hell for
       developers who
       ignore character
       encodings...

                                http://www.flickr.com/photos/10661825@N07/

                                                                            13
a distributed system is one in which
the failure of a computer you didn't
even know existed can render your
own computer unusable
                                   Leslie Lamport




  are we creating a new version of this with
  data....?
                                                    14
shifting landscape
• Google was previously seen as in opposition to a rich
  metadata approach...
  •   recall versus precision

  •   Google’s abandonment of OAI-PMH

• but now...
  •   Google, Microsoft & Yahoo committed to improving precision
      through harvesting of Microdata

  •   schema.org and others bridging this divide

• so, is there still a need for other ‘concentrations’ or
  can we rely on the global search engines?


                                                                   15
good
practice
           16
licensing!
• use explicit licenses
• this means requiring explicit licenses from sources
• if at all possible work with extremely open licenses
  such as CC0

• in data aggregation, especially when using a Linked Data
  approach, ‘share alike’ might be easier than ‘attribution’




                                                               17
“build for normal users,
  developers and
       machines”
        Tom Coates
        http://www.plasticbag.org/archives/2006/02/my_future_of_web_apps_slides/




                                                                                   18
developer-friendly formats
• XML has a lot going for it:
 •   very well supported with tools, libraries etc.

 •   well understood & often fits the info models we’re used to

• but it has some issues:
 •   validation is a pain and is very often ignored

 •   it’s verbose - it takes up a lot of bandwidth

• JSON has gained rapid adoption
 •   less verbose - good for simple client-side manipulation
     •   curl -D - -L -H "Accept: application/rdf+xml" "http://dx.doi.org/10.1126/science.
         1157784"

     •   curl -D - -L -H "Accept: application/json" "http://dx.doi.org/10.1126/science.1157784"

                                                                                                  19
service (anti)patterns
                                                                     end-user


• design your API to be                       end-user
                                                                        UI
                                                                                             end-user


  developer-friendly                             UI                                              UI

                                                                      Future


• be aware of what works, and                   Future
                                               3rd-party
                                                                     3rd-party
                                                                        dev
                                                                                              Future
                                                                                             3rd-party

  of what appears to work                         dev                                           dev


  but actually might not...                              AP
                                                           I
                                                                        API            AP
                                                                                             I



• share this understanding                                some aggregated data of broad
                                                         interest and potential usefulness


                                      = certainty                       UI
                                      = belief
                                      = speculation

                                                                    end-user
                             Paul Walk, An infrastructure service anti-pattern
                             http://blog.paulwalk.net/2009/12/07/an-infrastructure-service-anti-pattern/
                                                                                                         20
expect & enable
users to filter
  - give them
  feeds (RSS/
     Atom)

                  http://www.flickr.com/photos/httpwwwflickrcompeoplenadar/3349883/ (CC BY-
                  NC-ND 2.0)




                                                                                            21
workshop
tomorrow!
            22
tomorrow at 16:15
• (Thursday, 23rd June, 16:15-18:15)
• short presentations from UKOLN on LOCAH and
  RepUK, and from Edina on aggregating services

• open discussion on the way forward for metadata
  aggregation, addressing questions such as:
 •   is Linked Data the future for metadata aggregation services?

 •   do initiatives like Microdata & schema.org reduce the need for
     our investment in metadata aggregation services?

 •   does usability matter as much as ‘openness’?

• please join us, and feel free to bring your own
  questions & issues to discuss
                                                                      23
summing up
    in a
sentence....
               24
we should use aggregation

 [applying a tool]

to balance the creation of opportunity

 [building infrastructure]

with the solving of problems

 [developing & providing services]


                                         25
thank you


            26

Contenu connexe

Plus de Paul Walk

Next generation repositories
Next generation repositoriesNext generation repositories
Next generation repositoriesPaul Walk
 
What does the next generation repository look like?
What does the next generation repository look like?What does the next generation repository look like?
What does the next generation repository look like?Paul Walk
 
COAR Next Generation Repositories Working Group
COAR Next Generation Repositories Working GroupCOAR Next Generation Repositories Working Group
COAR Next Generation Repositories Working GroupPaul Walk
 
Static Site Generators: what they are and when they are useful
Static Site Generators: what they are and when they are usefulStatic Site Generators: what they are and when they are useful
Static Site Generators: what they are and when they are usefulPaul Walk
 
RIOXX: a Modern Metadata Application Profile
RIOXX: a Modern Metadata Application ProfileRIOXX: a Modern Metadata Application Profile
RIOXX: a Modern Metadata Application ProfilePaul Walk
 
Implementing RIOXX
Implementing RIOXXImplementing RIOXX
Implementing RIOXXPaul Walk
 
Exploiting the value of Dublin Core through pragmatic development
Exploiting the value of Dublin Core through pragmatic developmentExploiting the value of Dublin Core through pragmatic development
Exploiting the value of Dublin Core through pragmatic developmentPaul Walk
 
Rioxx 2 repository fringe
Rioxx 2 repository fringeRioxx 2 repository fringe
Rioxx 2 repository fringePaul Walk
 
The Strategic Developer: a new role for Higher Education?
The Strategic Developer: a new role for Higher Education?The Strategic Developer: a new role for Higher Education?
The Strategic Developer: a new role for Higher Education?Paul Walk
 
Local, technical innovation in an outsourced world
Local, technical innovation in an outsourced worldLocal, technical innovation in an outsourced world
Local, technical innovation in an outsourced worldPaul Walk
 
Working with Developers
Working with DevelopersWorking with Developers
Working with DevelopersPaul Walk
 
It's their cloud, not yours
It's their cloud, not yoursIt's their cloud, not yours
It's their cloud, not yoursPaul Walk
 
Technical Challenges in Resource Discovery
Technical Challenges in Resource DiscoveryTechnical Challenges in Resource Discovery
Technical Challenges in Resource DiscoveryPaul Walk
 
Responsive Innovation in a Local Context
Responsive Innovation in a Local ContextResponsive Innovation in a Local Context
Responsive Innovation in a Local ContextPaul Walk
 
The Changing Role of the Developer in HE
The Changing Role of the Developer in HEThe Changing Role of the Developer in HE
The Changing Role of the Developer in HEPaul Walk
 
Supporting Developers, Supporting Research
Supporting Developers, Supporting ResearchSupporting Developers, Supporting Research
Supporting Developers, Supporting ResearchPaul Walk
 
Future of LMS
Future of LMSFuture of LMS
Future of LMSPaul Walk
 
Innovation, community, sustainability
Innovation, community, sustainabilityInnovation, community, sustainability
Innovation, community, sustainabilityPaul Walk
 
Strategic development in a local HEI context
Strategic development in a local HEI contextStrategic development in a local HEI context
Strategic development in a local HEI contextPaul Walk
 
Enterprise Information Integration at LondonMet
Enterprise Information Integration at LondonMetEnterprise Information Integration at LondonMet
Enterprise Information Integration at LondonMetPaul Walk
 

Plus de Paul Walk (20)

Next generation repositories
Next generation repositoriesNext generation repositories
Next generation repositories
 
What does the next generation repository look like?
What does the next generation repository look like?What does the next generation repository look like?
What does the next generation repository look like?
 
COAR Next Generation Repositories Working Group
COAR Next Generation Repositories Working GroupCOAR Next Generation Repositories Working Group
COAR Next Generation Repositories Working Group
 
Static Site Generators: what they are and when they are useful
Static Site Generators: what they are and when they are usefulStatic Site Generators: what they are and when they are useful
Static Site Generators: what they are and when they are useful
 
RIOXX: a Modern Metadata Application Profile
RIOXX: a Modern Metadata Application ProfileRIOXX: a Modern Metadata Application Profile
RIOXX: a Modern Metadata Application Profile
 
Implementing RIOXX
Implementing RIOXXImplementing RIOXX
Implementing RIOXX
 
Exploiting the value of Dublin Core through pragmatic development
Exploiting the value of Dublin Core through pragmatic developmentExploiting the value of Dublin Core through pragmatic development
Exploiting the value of Dublin Core through pragmatic development
 
Rioxx 2 repository fringe
Rioxx 2 repository fringeRioxx 2 repository fringe
Rioxx 2 repository fringe
 
The Strategic Developer: a new role for Higher Education?
The Strategic Developer: a new role for Higher Education?The Strategic Developer: a new role for Higher Education?
The Strategic Developer: a new role for Higher Education?
 
Local, technical innovation in an outsourced world
Local, technical innovation in an outsourced worldLocal, technical innovation in an outsourced world
Local, technical innovation in an outsourced world
 
Working with Developers
Working with DevelopersWorking with Developers
Working with Developers
 
It's their cloud, not yours
It's their cloud, not yoursIt's their cloud, not yours
It's their cloud, not yours
 
Technical Challenges in Resource Discovery
Technical Challenges in Resource DiscoveryTechnical Challenges in Resource Discovery
Technical Challenges in Resource Discovery
 
Responsive Innovation in a Local Context
Responsive Innovation in a Local ContextResponsive Innovation in a Local Context
Responsive Innovation in a Local Context
 
The Changing Role of the Developer in HE
The Changing Role of the Developer in HEThe Changing Role of the Developer in HE
The Changing Role of the Developer in HE
 
Supporting Developers, Supporting Research
Supporting Developers, Supporting ResearchSupporting Developers, Supporting Research
Supporting Developers, Supporting Research
 
Future of LMS
Future of LMSFuture of LMS
Future of LMS
 
Innovation, community, sustainability
Innovation, community, sustainabilityInnovation, community, sustainability
Innovation, community, sustainability
 
Strategic development in a local HEI context
Strategic development in a local HEI contextStrategic development in a local HEI context
Strategic development in a local HEI context
 
Enterprise Information Integration at LondonMet
Enterprise Information Integration at LondonMetEnterprise Information Integration at LondonMet
Enterprise Information Integration at LondonMet
 

Dernier

MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 

Dernier (20)

Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 

Metadata Aggregation Services

  • 1. Building Metadata Aggregation Services for Resource Discovery Paul Walk UKOLN is supported by: p.walk@ukoln.ac.uk www.ukoln.ac.uk A centre of expertise in digital information management
  • 3. why aggregate metadata? • to address systems/network latency - a cache • supporting resource-discovery • for ‘Web Scale concentration’ • ‘gaming’ Google - raising ‘visibility’ of content • network effects if user facing services also developed • to showcase resources • to create middleman business opportunities • as infrastructure to support 3rd-party services • as an approach to preservation 3
  • 4. patterns • harvest from network, aggregate and re-expose • discovery.ac.uk, Europeana, RepUK • collect from offline sources and make available in aggregate on the network • Collections Trust (UK) • harvest without re-exposing, build services on top of aggregation • Google et.al. • expose as a ‘data dump’, or expose through an API 4
  • 5. the big question facing data providers: do you want to provide a data service, or just data? 5
  • 6. current work in the UK 6
  • 7. • a metadata ‘ecosystem’ • aggregation is a major component • preparing resources for aggregation • http://www.discovery.ac.uk 7
  • 8. • support innovation • develop some ‘business intelligence’ • develop infrastructure component for services 8
  • 10. distribution • state management is a challenge! (deletions, changes) • aggregation of aggregations is consequently non-trivial • e.g. federated models • linking? • should records in an aggregation ever be the target of a link? Or, should such links point to the source? • can/should we make aggregations into Google-friendly targets? • if we succeed with SEO, are we undermining source repositories? • ‘attribution stacking’ data-protocol/) (http://sciencecommons.org/projects/publishing/open-access- 10
  • 11. openness and usability • ‘open’ in danger of becoming synonymous with ‘permissively licensed’ • can be both ‘open’ but very difficult to use • needs periodic review - right now SPARQL is barrier to wide adoption • remember all those SOAP interfaces.... • a well supported API might be more open than a completely freely available dump of gigabytes (or more) of data in the sense that it might allow open engagement from more people • we need a richer understanding of openness 11
  • 12. in other words... be open, usefully 12
  • 13. character encodings.... • huge number of XML records from UK IRs are invalid due to character encoding issues.... • there is a special place in hell for developers who ignore character encodings... http://www.flickr.com/photos/10661825@N07/ 13
  • 14. a distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable Leslie Lamport are we creating a new version of this with data....? 14
  • 15. shifting landscape • Google was previously seen as in opposition to a rich metadata approach... • recall versus precision • Google’s abandonment of OAI-PMH • but now... • Google, Microsoft & Yahoo committed to improving precision through harvesting of Microdata • schema.org and others bridging this divide • so, is there still a need for other ‘concentrations’ or can we rely on the global search engines? 15
  • 17. licensing! • use explicit licenses • this means requiring explicit licenses from sources • if at all possible work with extremely open licenses such as CC0 • in data aggregation, especially when using a Linked Data approach, ‘share alike’ might be easier than ‘attribution’ 17
  • 18. “build for normal users, developers and machines” Tom Coates http://www.plasticbag.org/archives/2006/02/my_future_of_web_apps_slides/ 18
  • 19. developer-friendly formats • XML has a lot going for it: • very well supported with tools, libraries etc. • well understood & often fits the info models we’re used to • but it has some issues: • validation is a pain and is very often ignored • it’s verbose - it takes up a lot of bandwidth • JSON has gained rapid adoption • less verbose - good for simple client-side manipulation • curl -D - -L -H "Accept: application/rdf+xml" "http://dx.doi.org/10.1126/science. 1157784" • curl -D - -L -H "Accept: application/json" "http://dx.doi.org/10.1126/science.1157784" 19
  • 20. service (anti)patterns end-user • design your API to be end-user UI end-user developer-friendly UI UI Future • be aware of what works, and Future 3rd-party 3rd-party dev Future 3rd-party of what appears to work dev dev but actually might not... AP I API AP I • share this understanding some aggregated data of broad interest and potential usefulness = certainty UI = belief = speculation end-user Paul Walk, An infrastructure service anti-pattern http://blog.paulwalk.net/2009/12/07/an-infrastructure-service-anti-pattern/ 20
  • 21. expect & enable users to filter - give them feeds (RSS/ Atom) http://www.flickr.com/photos/httpwwwflickrcompeoplenadar/3349883/ (CC BY- NC-ND 2.0) 21
  • 23. tomorrow at 16:15 • (Thursday, 23rd June, 16:15-18:15) • short presentations from UKOLN on LOCAH and RepUK, and from Edina on aggregating services • open discussion on the way forward for metadata aggregation, addressing questions such as: • is Linked Data the future for metadata aggregation services? • do initiatives like Microdata & schema.org reduce the need for our investment in metadata aggregation services? • does usability matter as much as ‘openness’? • please join us, and feel free to bring your own questions & issues to discuss 23
  • 24. summing up in a sentence.... 24
  • 25. we should use aggregation [applying a tool] to balance the creation of opportunity [building infrastructure] with the solving of problems [developing & providing services] 25
  • 26. thank you 26

Notes de l'éditeur

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n