Aucune remarque pour cette diapositive
The national archives is the official archive and publisher for the UK government and for England and Wales. We are the guardians of some of our most iconic national documents, dating back over 1000 years.
We’re going to tell you the story of how we built Discovery and it’s a story that lasted for the 5 past years.
Can do a live demo instead of using below slides
Describe what we see,
Read “What is Discovery” description to introduce the project
What can you do?
Search for records held by The National Archives and other UK archives (TNA catalogue, A2A, MDR and NRA)
Search and download records digitised by TNA
and by departments and born-digital records
Search for collections created by people, families, businesses, organisations and manors (NRA and MDR)
Search for archive contacts (ARCHON)
If the document can be downloaded, you would have a link here
If the document is classified, you will see it here. Possibly make a FOI request
If the document is available for download, there would be a link, or the cost of the document if costly
TNA Records information provided across many different systems and parts of the website
End users: So many tools to master, have to search on every system
Archival teams: So many systems managed and supported by different teams, some of them limited for expansion and future development, contributed to in different ways
Sector leadership and our support of the government policy on archives. We were given that responsibility while we started working on Discovery
increased digitisation and born-digital records: while we’re limited in numbers with paper documents (we cannot finance the digitisation of everything), we just have to handle all new born digital documents, and there are more and more of them
Legislative changes – eg manage FOI and Data Protection: we want documents to be opened by default and enable users to make FOI requests if they are closed
Increasingly bleak financial outlook for the sector: like other business in the cultural field, less and less fundings from the government. many archives do not have the budget to build a website, maybe some of them cannot afford it anymore after a budget cut.
Helped to prioritize tasks. If it does not answer any of those goals, we probably do not need it
one of the main focuses of Discovery is that it needs to be be future proof in the sense that it can handle the high volumes of digital records that will be coming our way over the next few years.
Pairs of servers to handle failures
Extra scale down instances hosted on could services to handle high peaks of traffic
Detection of DOS attacks by firewalls (CheckPoint IPS) + automated reaction ( in development)
Flexible and future proof
3 tier architecture enables us to move tiers to cloud services if needed
Use of shards on databases and search engine indexes:
can be extended with extra servers
Can be easily transferred to the cloud services
Platform highly constrained with (too) many physical firewalls
Meets government legislation on security
federated Single Sign On to TNA websites using .Net membership provider
Transfer website for government staff
Record copying: get a copy of non digitized records
keep up with current technologies: a lot of innovation going on
From SOA to micro services
Several open source technologies (Apache Solr, Wordpress, MongoDB, AngularJS, Akka.Net)
NRA National register of archives
A2A Acces 2 Archives
Archon: contact directory of all archival repositories in UK
MDR: manorial documents register, very niche, historical, but we have a legal requirement to maintain it
TNA Catalogue: our archives here
Gives a good overview of what Discovery is focusing on:
Several servers dedicated to search (solr servers)
Lots of servers dedicated to metadata storage + most above all images: gridfs
Does not include the filers that still hold most of the files (but we are migrating them progressively to gridfs)
Use this other quote instead for archival sector pro:
“The project has succeeded in integrating diverse data and databases from hundreds of archives to create a beautiful and intuitive new resource that provides a greatly enhanced platform for catalogue information. Discovery will enable archivists to promote their collections more effectively and is certain to attract new users to archives”
Feedback from a Senior Archives Services Manager, King’s College London
Use each quote to illustrate something:
1: as stated earlier
2: people pay for documents online, no benefits on our side, just to finance digitization of documents
3: we (TNA) are not the only ones to benefit from Discovery
4: this focus on user needs will be detailed later
Complete quote from King’s College:
The Discovery team have succeeded in building an attractive new search engine that will enable users for the first time to fully explore descriptions of the nation’s archival heritage, alongside records held by The National Archives. The project has succeeded in integrating diverse data and databases from hundreds of archives to create a beautiful and intuitive new resource that provides a greatly enhanced platform for catalogue information. Discovery will enable archivists to promote their collections more effectively and is certain to attract new users to archives
Access 2 archives
Biggest work of that scale worldwide at that time (2000 to 2008). About gathering the collections from 400 other archives (out of 2000 in UK). Stopped in 2008 by lack of funding
We defined a common EAD schema from analysing their own schemas (they were using ISAD (G) but not EAD for most of them)
Very complicated, most of them were not technical so we had to do the mapping ourselves, schemas varied extremely.
10m documents, contents from 400 archives, is now outdated, but still offers online visibility to many archives which did not even have a website
Schema designed through through trial and error
Integrated A2A into Discovery in 2014 using the original data (does not include their latest updates but still provides them visibility)
Current ongoing project about providing a back office to external contributors so that they can directly update their documents into our database
Start with the british library which is our most expert/technical partner
Work iteratively. Will start reaching other archives once we got this one live and fixed most issues
We agreed with Axiell archival software to provide a “export to TNA EAD format” feature to our archives in the future to get last version of their catalogues
Speak in the end about the timeline
First we built discovery website with our contents
Then we integrated other archives contents
And now we are working on a back office to enable other archives to update their contents
> work step by step, get things working internally then integrate others
From ISAD(G) from another archives
To EAD TNA format designed in Access 2 Archives
To Discovery Information Asset format
We did it through user centred design. It means that we focus on users throughout the complete development of our products.
And it starts with user research, we need knowledge about our users first (and this is something we already did before Discovery)
We defined several categories of users using the hiking analogy, according to how advanced they are.
Then we created personas for specific group of people: a description of a fictive user, with what he knows, what he does, what he expects from us and from Discovery.
The purpose of personas is to create reliable and realistic representations of your key audience for reference within the project team and organisation wide. That way we’re clear that we’re not designing for us but for our user group. We updated the personas with staff from across the organisation in a series of interactive workshops.
When we look at the complete list of personas, we find again the list of users mentioned earlier:
People interested in genealogy
People completing administrative work
People passionated in history
Paid researchers from the gov, our staff, or businesses
User Centred design in TNA since 2008, before Discovery
When work started on Discovery, we already had a good amount of knowledge on our users
1 – Different forms of user research
We have carried out different forms of research throughout the project. We
sought feedback through an online exercise
sought feedback from visitors in our reading rooms showing them prototypes of the new designs
spent a day speaking to members of the public in a cafe to get feedback from non-users of the site
ran online surveys and used web analytics for more quantitative insights into how people use our website
ran one to one, hour long sessions in London and Bristol with our users showing them the new designs, recruiting to our personas to ensure we spoke to the right people. We observed and listened as the participants carried out tasks and gave their feedback.
We then took all this feedback, analysed it and made changes to finalise the new pages for release in beta.
Agility is a methodology that we used a lot in the government, that is recommended by Government Digital Services, that is the norm at TNA
Describes how people interact
Describes how we deliver a project
Consider you want to bake a big cake that represents your website, each slice a functionality, each layer a component of your software.
How do you implement that?
If you do it like a construction site, you’re going to build everything, one layer at a time. And you’re gonna wait for everything to be built to test it. If you missed something, misunderstood something, maybe had it completely wrong, you’re going to find this out after 6months of development and this is going to cost you a lot.
If you do it the agile way, you’re going to build one feature, maybe one sub-feature, at a time, and publish it immediately. So you can test it very quickly, and fail very quickly if there is anything wrong, and so, mend it very quickly while having wasted only 2 to 4 weeks worth of work.
Now we do that iteratively.
We go through iterations (sprint) of 1 to 4 weeks.
In which we go through a discover, dseign, develop, test phase. And we check it on the user on both the discover and the test phases, so twice in that very short iterations.
And then we iterate, and work on next steps
Concrete example on Discovery
Example of the discovery phase: we build a mockup, a fake website, and go in the reading room and get it tested by end users: we immediately know whether it is going to be useful to them, and we don’t risk wasting weeks of implementation to learn it
Example of the test phase.
We deliver most of our services directly in beta, a non finalized version
We iterate not only to build new features, but sometimes to redo things, to improves existing features.
Good showcase is how our front page evolved from 2011 to 2014, as shown here.
Discovery is a search engine, and when you think of what a good example of search engine is, we think about Google. Their User interface is minimized, super simplified. You don’t need to learn it to use it, you already know how to use it, but you can still do complex searches if you need and are more familiar with it.
Gathering all those collections from different sources is extremely hard, and to make a simple UI too. But this is what we are trying to achieve with Discovery.
On last version, a very visual interface, so that the end user can quickly scan it. Try to bring what he really needs first. He searches by default on all collections but can tick a box to search only within TNA. We hided the menu bar in that red button because few users are going to use it, etc.
First technical learnings then broader:
Use cutting edge technologies: MongoDB at its start: tough to manage for 9 months with a lot of exchanges with Mongo, but pays off in the end
better performances than Mysql, document db great to represent archival documents, scale with sharding
built a very good relationship with Mongo, and they shaped their product for us, gave us discounts
Move from proprietary Autonomy to Solr, RedDot to wordpressbecause of problems about supplier deliveries, their product’s life expectancy. And for money reasons
We waited for 5 years to work on expert contributions, to get our own house in order (have a proper schema matching all our in-house collections)
Look at how we have been working with other archives in A2A then Experts Contribution: first we tried to get their contents ourselves, but it could not really work, and it stops by lack of funding. Experts Contribution follows a different philosophy, we provide them with a platform and are going to guide/teach/help them to get their contents on it, but they are going to do it mainly themselves.
Shiny vs non shinyImplementation of the back office AFTER the frontend: would have been better to implement both together?> work on front end first was good to show that’s useful and beneficial, but you must know that you’ll have to invest on the non shiny to get all the benefits.
Flexible: it’s not because we’re agile that we don’t have a plan. We plan a year ahead which features we want to bring. But we’re flexible with that plan, we’re going to reassess priority of our features on each iteration. We are maybe going to only implement a sub feature of a big feature, maybe we are going to postpone another big feature. Agile is about setting an amount of people working for a specific period of time. Not about trying to implement a specified set of features. We are going to implement what we think is best in the time we have, step by step. And we have the guarantee that we will provide something that works as a whole and will suit our user needs.
Opened to fail: happy to redo things if necessary(browse is on the 3rd iteration)
Key Benefits to TNA
Pull all the data + users together
> easier to have holistic picture of all, better understand who our users are, what they need, what are our most popular collections
Key benefits to archives
Allows smaller archives to serve their contents to a wider audience (6m used Discovery on last year: no way a small archive could achieve that on its own)