SlideShare une entreprise Scribd logo
1  sur  12
1
Bits of charters: putting Carolingiancharters intoa database
INTRODUCTION
[Slide 2:projectwebsite]The Makingof Charlemagne’sEurope project ranfrom2012-2014 at King’s
College Londonand createdadatabase frameworkforthe storage and retrieval of prosopographical,
geographical andsocio-economicdatafrom earlymedieval charters.The projectteamalsoinput
data fromnearly1000 chartersintothe database systemtoproduce a corpusthat couldaddressa
wide range of researchqueries.
If anyone wouldlike a demonstration of the database,Icangive one brieflylater. Thistalk isfocusing
on the decisionprocesses behindthe scenes of the project, discussingsome aspectsof the database
designandthe reasonsforthem. How do youdecide the technological formandscope of a project?
How can typesof data such as places be enteredconsistentlyandretrievedeffectively?How doyou
thenlinktogetherbasicunitsof people,placesandpossessionstoprovide useful representationsof
activitiesinacharter?And finally,havingputall the dataintoa database, how doyouget it out
again?What I’m tryingto showissome of the practical problems involvedintakingunstructured
and oftenambiguousmedieval informationandputtingitindigital formats.
But I wantto start withthe most basicquestion:why thisproject?Digital humanitiesreflectsthe
interactionof twodifferentfields:fora goodprojectyouneeda balance betweenthe technological
developments thatmake new digital approachesfeasibleandthe researchquestionsthat scholars
wantnew insightsinto.Andthere’salsoathirdaspect,whichshouldn’tbe overlooked:whatyoucan
getfundingfor. One of the bigproblemswith alotof digital humanities projectsisthat’sithardto
getthe long-termfundingtoallowthemtoshow theirfull potential.
The Making of Charlemagne’sEurope projectessentiallyarose fromthe coincidence of twoaspects.
[Slide 3:previousDDHprojects]One wasthatthere isa longtraditionof creating prosopographic
databasesat King’s;the Departmentof Digital Humanitieshad more than20 yearsof expertisein
that.
[Slide 4:MKCHEUR spreadsheet]The otherwasthe large numberof Carolingiancharterswhichexist,
whose prospographical and socio-economicdatahasn’tbeenfullyexploited.Itmade obvioussense
to combine these two factorsandalsoto take advantage of developmentsinmappingsoftware to
representspatial informationinnewways.
CHARTERS AND THEIR REPRESENTATION
I’ve mentioned chartersseveral times,butwhatexactly are they?The shortanswerismedievallegal
documents,mostlyconcernedwiththe grantingorsellingof property.I’vegivenyouahandout
whichisthe textand translationof one quite longcharter.I’mgoingto be referringtothat a lotin
my examples.
But a longeransweristhat there are differentwaysof understandingwhatacharter isand they
logically resultindifferentkindsof digital projects. [Slide5:understanding] Thereare three main
waysyou can thinkof a charter: as a material object,asatextor as a source of information.
2
Firstly,youcan thinkof a charter as a material object:aparticularpiece of parchmentwithwords
and signswrittenonit. [Slide 6:Marburg] A digital projectthinkingof chartersinthat way,like the
Lichtbildarchiv ältererOriginalurkunden includes high-qualitydigital imagesof the charters,
combinedwithtoolstoallowthe analysisof these images.A similarapproachisbeingtakenby the
newModelsof AuthorityprojectbeingjointlydevelopedbyKing’sCollege andGlasgow.1
The problemforsuch an approachwithCarolingianchartersisthatwe don’thave the original
manuscriptforthe vast majorityof them;insteadwe onlyhave copiesfromlaterinthe Middle Ages
(or sometimes evenjustearlymoderntranscripts). Anapproach focusingonmaterial objects would
therefore meanwe were unabletouse a lotof charters fromthe corpus.
A second viewof charterstreatsthemas texts. [Slide 7:CDLM] One optionforour projectwould
have beentoset upa database containingthe full textsof charters, togetherwithsome kindof XML
mark-upto allowusto pickout wordsand termsof particularinterest:personal names,titles,verbs
indicatingdonationetc.Thisisthe approachtakenbyprojectssuch as
Codice diplomaticodellaLombardiamedievale. [Slide 8:CEI]There’salreadyan international group
workingonstandardsfor encodingcharters
The difficulty if we wentdownthatroute wouldhave been the datawe workedwith.Carolingian
chartershave relativelylittlestructure tothem,theirspellingcanbe idiosyncratictothe pointof
unintelligibility andtheycanalsobe maddeninglyindirectinreferringtopeopleandplaces. [Slide 9:
Texton screen] Forexample, the charterI’ve givenyoureferstohow Tassilo ‘hadcauseda
monasterytobe newlybuiltinhonourof the HolySaviourwithinourforestatthe place called
Kremsmünsterinthe pagusof Traungau’(inmonasteriumhonore sancti Salvatorisinfrawaldo
nostroloco qui diciturChremisainpagonuncupante Drungaoe novoopere construerefecisset).
What bitof that Latintextdo youmark up as the ‘name’of Kremsmünstermonastery?[Slide10:
referencestoFater] AndalthoughFateris the abbotof Kremsmünster,thisfactisonlyever
mentionedindirectly Sohowdoyoumark up the textto show that Fater’snotjustany oldabbot,
but specificallythe abbotof the monastery receivingthe donation?
Recentscholarshiponchartershastendedtofavourthe textual approachtocharters:it’soften
focusedonindividualchartersorat the widestregional Urkundenlandschaften (charterlandscapes).
But we didn’twantto treatcharters as special snowflakes:we wantedtofindawayto lookacross
the whole corpusfromCataloniatoAustriaand compare socio-economicdatafromacross different
regions.Butmark-upof textswouldn’t allow us todothiseasily:itsstructuresaren’tsuitedto
findingall the unfree peoplementioned underhalf adozendifferentLatinterms orfor findingall
female vendors.
To answerthose kindof questions,ourprojectwas thereforethinkingof chartersina thirdway:as
sourcesfromwhichinformation couldbe extractedand putintostandardisedformatsfor
comparisonandlarge-scale analysis. Andourtoolsforthatwere a large-scale relational database
combinedwiththe ‘factoidmodel’.
1 http://www.digipal.eu/blog/new-digipal-project-models-of-authority/
3
[Slide 11:definition] Sowhat’sthe factoidmodel?Itwas developedbyKing’sforprevious
prosopographical projects. Andatitsheart isthe factoid, an assertion made bythe projectteamthat
a source says somethingaboutapersonor a place. Such a model canbe appliedtoanythingfroma
saint'slife (the VitaCuthberti states“Cuthberthealed awomanfroma demon”) toa coin
(FitzwilliamMuseumcoin839 states "Charlemagne iskingof the Franks").
How didwe apply the conceptof factoidstoour specific project?We startedoff withsome main
modelsforbasicentitiessuchasagents(peopleorinstitutions),placesandsources;these contain
staticinformationthatdoesn’tchange betweencharters(informationsuchasa person’s sex orthe
geo-locationof aparticularplace).We thenlinkedthese modelstogetherviafactoids.Here’swhat
thisfactoidmodel lookslike inthe specificcase of our Charlemagne database[Slide 12:Factoid
diagram]
So factoidsare statementswhichlinktogether agents,placesandpossessionsderivedfroma
specificcharter.These factoidsare of varying types,dependingonthe contentthattheyneedto
contain.For example, youneeddifferent datastructures toinputthe statement"Faterisabbotof
Kremsmünster"(whatwe call anattribute andrelationshipfactoid) thantoinputthe statement
"FaterpetitionedCharlemagne to confirmthe propertyof the monasteryof Kremsmünster"(what
we call a transactionfactoid).Authoritylists,meanwhile, are usedtoensure thatdata isentered
consistently,e.g.that Faterisnot describedas"abbot"inone factoidand"abbas" inanother.
Thisfactoidmethodpotentiallyallowed ustoinputa wide range of data intoa single database. We
still hadto determine alotof bigissues,however.What specificdatafromeach charterdidwe want
to record?What structuresand fields wouldwe needforthis dataandhow didwe ensure datawas
enteredconsistently?Andhowdidwe then allow enduserstoextractthe data?
A lotof the firstyearof the projectwasspent iteratively developingthe database andthe inputting
standards, ratherthan beingable tocarry out much sustaininputting. We startedinthe mostbasic
way:just lookinghardat the data. [Slide 13] Discussinghow we dealtwith people whowerebeing
transacted,[Slide 14] collatingexamplesof termsusedtoformthe basisfor authoritylists.The
charter I’ve givenyouasa handout, DKAR1:169 was one that we wentthrough several timesin
considerable detail,because ithadthe kindof complexitythatwe neededtobe able tohandle
withinthe system. Ithas18 differentplacesinit:Iknow because we analysedeverysingle one of
them.It alsocontainedalotof complicatedinteractionsbetween differentagents andplaces;[Slide
15: diagram] here’sone of my early attemptsat drawinga diagramof that.I’ll come back to that
diagramlateron.
We were alsothinkingabout the questions usersmightwanttoask [Slide 16:WendyDaviesand
Mark Mersiowsky]2
,consideringthe differentresearchtopics thathistorians hadalready explored
usingcharters.[Slide 17:questionslist] We endedupwithalonglistof possible questions thatthe
2 Wendy Davies,Small worlds: the village community in early medieval Brittany (London, 1988); Mark
Mersiowsky,'Y-a-t-il une influencedes actes royaux sur les actes privés du IXe siècle?',in Marie-José• Gasse-
Grandjean and Benoît-Michel Tock (eds.), Les actes comme expression du pouvoir au haut Moyen âge: actes de
la table ronde de Nancy, 26-27 novembre 1999,Atelier de recherches sur les textes médiévaux, 5 (Turnhout,
2003),pp.139-78
4
database mightbe usedto answer.Lookingatthemnow at the endof the project, mostof themare
too complicatedtoanswerwiththe currentversionof the userinterface.But it’sbythinkingof key
researchquestions inthatwaythat we couldworkout which data to record. For example, we
developedstructurestorecordthe relationshipsbetweenplaces, anoption whichwasn’tavailablein
previousprojects.I’lltalkabout those abitlater.
Anotherpartof developingthe projectwasdeciding itslimits:some of the issuesthatwe wouldn’t
realistically be able totackle. The single mostfrequentcomplaintwe have hadconcerningthe
database isthat we don’tinclude the full textof the charters(althoughwe hope inthe future to
include linksto the full textavailablefromotheronlinesites).Butgettingsuchdataintothe system
consistently wouldhave requiredavastamountof extrawork.I’ve givenyouthe textof DKAR 1:169
on the handout.Here’s the firstsentence of atextof a differenteditionof the same charter [Slide
18: Kremsmünstercharter].Thissingle sentence has 6differences fromthe textI’ve givenyou[click].
If we’dusedprojecttime toinputthe full textof databaseswe probablywouldn’thave ahdtime to
do anythingelse.
DATA STRUCTURES AND STANDARDS
There are alwaysthingsyou’re notable to achieve withinone project.Butevenwithoutthe full text,
whydiddevelopingdatastructuresandstandardsfor the Charlemagne database take solong?
What I want todo now is to lookinsome detail ata couple of examplesof the datastructureswe
created. Everydatabase projecthasto do thiskindof work,so althoughsome of the detailsI’ll be
discussingare specifictoearlymedievalcharters,Ihope you’ll findalotof the conceptsare more
widelyrelevant.Ourunderlyingrelational databaseishorrendouslycomplex.The schemaforitwent
fromthis[slide 19] to this[slide20] andit’snow even bigger. Butbecause it’sarelational database,
you’re essentiallybuildingupthiscomplexityfromalarge number of relativelysmall buildingblocks.
I’mgoingto be talkingabouttwoof these:placesandtransactionfactoids,usingexamples mainly
fromthe charterI’ve givenyou onthe handout.
Places
Thinkingaboutplaces,twomainaspectsdrove the structureswe developedfor recordingthem.One
was technologicaldevelopmentsindigitalmapping;we wantedtobe able torepresentdata
patternson a map onscreen.The otherwasthe data itself. [Slide21] In charters,whatwe get isn’t
technicallyplaces,butplace names.How yougetfroma name ina medievalchartertoa spot on the
map turnsout to require alot of thought.
The firstdistinctionwe made inthe database wasbetweencharter-specificinformation onplace
names(whichwe recordedin individualfactoids)andstaticinformationonaplace (whichwas
recordedinthe place model). [Slide 22] Forexample, the charterI’ve givenyou,DKAR1:169 has two
differentmanuscripts andtheirspellingof places varies. The place name editedas‘Bettinbah’ hasa
variantspellinginfootnote11 of ‘Petenpach’. There’sanothercharterrecordingTassilo’soriginal
grant whichcallsthe same place ‘Petinpach’.Insome chartersyou evengetseveral different
spellingsof aname (especiallypersonalnames) within asingle text.Andthere’salsothe
complicationof Latinbeinganinflectedlanguage,soyoucan get namesin differentcases,likethe
‘actum Wormacie’inthe lastline of the charter.
5
One of ourearlydecisions therefore wasthatwe’dincludefieldstorecordthe original textof place
namesand thatthese original textboxes wouldbe searchable. [Slide23:place name search]. If you
put ‘Wormacie’intothe searchplace namesfunctionsof the database,you’llcorrectlyfind‘Worms’,
be toldthat it’sin Germanyand evenhave itlocatedona map foryou that,so that you have what
youmightcall a ‘bottom-up’search.If you’ve gotjustaname in a medieval charteryou’ve gota
possible wayof identifyingit.
But behindthatlinkingfromaname instance toa map, there’salot of work goingonin termsof
identificationandstandardisation. I’ll show this schematicallywith anotherplace inthe same
charter: the place calledLiublinbah(towardsthe bottomof the firstpage of the handout).[Slide 24:
charter-specificinformation]We startoff withwhatthe chartertellsus:Liublinbahisa‘locus’,itsrole
inthe charteris as the locationof a possessionbeingdonatedandit’sinthe pagusof Drungaoor
Traungau (pagiare Carolingianadministrative regions,similarto Anglo-Saxoncounties).
The firstthingwe have to do is produce a standardisedmedieval name forthisplace.Ourmain
reference source forsuch LatinnamesisOrbisLatinus[slide 25],but like the majorityof minor
placesinour charter,Liublinbahdoesn’tappear inthat.[Slide 26:standardmedieval name] So
insteadwe choose one of the spellingsanduse that(addinganasterisktoshow that it may need
furtherworkat some point).
The nextquestioniswhere thisplace is.We relyonthe editorsof the charterfor this;we don’ttry
and researchdetailsforourselves,since we simplydon’thave time. [slide 27:place identification] In
thiscase,the editorsaysthatthe settlementconcernedisLeombach,a locationinAustriaandhis
identificationiscertain. [slide 28] Once we have thisidentification,we can use modernreference
toolsto check the geo-locationforLeombachandwhatpart of Austriait’sin.[slide 29] Fromthis,we
can create a place recordfor Leombach.[slide 30:map fromDKAR1:169] Whenwe enterdata
concerningthe charteritself we thenincludecharter-specificinformationsuchas the place’s role or
the place descriptor.
That may all seemratherlongwinded, butthiswayof thinkingaboutplacesgivesusa lotof flexibility
for dealingwithmore complex cases. Forexample,we quiteoftengetriversormountains
mentionedincharters;[slide 31:Ipfbach]inDKAR1:169 there are the two riverscalledIpfbach
(Ipphas inthe original).We can’teasilygeo-locate rivers,butwe caninputthe name data we do
have intothe systemandproduce recordsfor natural featuresinthatway
The more difficultsituationiswhenthere’suncertaintyaboutwhat place the medieval name refers
to. Sometimes,the editorwill justgive anapproximate area.[Slide 32:Raotola]. The editorof DKAR
1:169 thinks Raotola,where there are several vineyards, issomewhere onthe Rodelbach,atributary
of the Danube.We inputthe place as having a medieval place name,butnomodernone,butan
approximate locationmeanswe canrecordit withinmodernplace hierarchies:it’ssomewhere
withinUpperAustria
Alternativelythe editormaysuggest one ormore possible modernlocationsforaparticular
medieval place. Forexample,acharterfromMondsee discussesadonationof propertyatan
unknownplace called Teginga.[Slide33:schematic] Here iswhere we effectively carve upour
6
earlierschematicintotwo.We still have amedievalplace onone side, butitisnow beingtentatively
matchedto three differentmodernplaces,withvaryinglevelsof probability. [Slide34] Byrecording
possible matchesinthisway,we caneventuallygenerate adisplayforthe usersthatshowsthe
possible options.
The final aspectof places I want to talkaboutis place relationshipfactoids.One of the thingswe
realisedearlieronwhenlookingatourplace data is that we had twodifferentsortsof place
hierarchy,where placesare inlargerunits.Firstly,we hadthe modernhierarchies:Leombachisin
the Austrianstate UpperAustria,whichisinthe moderncountryAustria.Thatwas easyenoughto
record.But what didwe do aboutthe otherinformationwe’re beinggiven,thatLeombach isinthe
pagusof Traungau? How do we recordmedieval hierarchies?
PreviousprojectsatKing’shave beenable tofudge thisandmeldtogethermodernandmedieval
hierarchiesbecause they’ve beendealingwithaBritishadministrative systemthat’sextraordinarily
long-lasting. Forexample, the Prosopographyof Anglo-SaxonEnglandusedthe pre-1974English
countiesfortheirhierarchies,whichare nearenoughtoAnglo-Saxoncountiestobe workable.But
we didn’tknowwhere Carolingian pagiwere onthe map.Infact, there’sevenascholarlyargument
aboutwhetherpagiwere flatareason the ground at all or just scatteredcollectionsof
administrativerights.So torecordthe factthat Leombachwas inthe pagusof Traungau, which
mighthelpresearchersto understand more aboutpagi,we neededsome additionalstructures.
What we endedupusingiswhat’scalleda place relationshipfactoid. [Slide 35] Thisisa charter
specificassertionthat“CharterXsays Place 1 isin Place 2”, and alsoincludesplace descriptorsfor
the two placesconcerned.The original ideawasthata userwouldbe able to pull upa place record
for a medieval regionand thenbe able tosee all the placessaidto be withinin(includinggeo-
locationswhere they’reknown).We weren’table toimplementthisfully, butevensothese factoids
still provide useful information.Andtheyalsoillustrate anotherimportantaspectof any digital
humanitiesproject.If youthink apiece of data mightbe useful,it’smuchbetterto start recordingit
earlyon, rather thanhavingto go back laterand rechecka large numberof records.
Transaction factoids
As youcan see,justdesigningthe buildingblocksforthe database system, like places,takesalotof
thoughtif you’re goingtobe able to recordthemconsistently. Ourtrickiestproblem, however,was
workingouthowto record transactions,the actual businessof the charters,withinthe database.
Andit’sdifficultbecausewe’re tryingtorecord dynamicratherthan juststatic information. If you
lookat this place relationshipfactoid,forexample,the statement‘charterXsays Leombach isin
Traungau’doesn’talterthroughoutthe charter. Similarly,whenDKAR1:169 refersto ‘AbbotFater’,
whichwe’drecordas the attribute andrelationshipfactoid ‘Faterisabbotof Kremsmünster,that’s a
fixedstatement.There maybe earlierandlaterchartersinwhichFaterisn’tabbotof Kremsmünster,
but inthisparticularcharter he alwaysis.
In contrast,the mainimportance of a charter is thatit’schangingthings,typicallysomeone granting
propertyto someone else.Thingsare differentafterthe actionsdescribedinthe charterthanthey
were before.Buthowdoyou recordsuch a change in a relational database structure thatdoesn’t
allowfordifferentstates?
7
The firstthingwe did was simplifythingsby breakingdownthe activitiesin anycharterintoa
collectionof differenttransactions (possessionsflowingaround) andevents(allthe otherthings
goingon).[Slide 36] I showedyou thisdiagramforthe activitiesinDKAR1:169 earlieron. It’svery
complex because itshowsyoualmostall the activities (thoughinfactthere are few more eventhan
that).But if we start breakingthese activities down,we cangetrather more manageable units.
Firstly,there are three differentevents. [Slide 37] Tassilofoundsthe monasteryof Kremsmünster
and [Slide 38] there are two examplesof landclearance.[Slide 39] Activitieslike theseare recorded
ineventfactoids,whichgive fairlybasicinformationaboutagents,placesandthe type of activity
goingon.
Once we’ve dealtwiththose, we’re leftwiththe transactioninformation[Slide 40].Whatwe have in
thisdiagramis three separate transactions. [Slide41] One isa recordof whathappened inthe past:
Tassilograntedpossessions toKremsmünster.[Slide 42]The othertwoare Charlemagne’sactionsin
the present.He confirmsKremsmünster’spossessionsandhe alsograntsto the menof Eberstalzell
the right to remainonlandthey’ve cleared illegally
We needtobreakdown the charter intothese separate transactionstoallow us torepresentthe
flowsof possessions inadatabase structure.[Slide 43] Sofor example,Tassilo’sinitial grant to
Kremsmünstercanbe representedinaseries of tablesthatshowsthe agentsinvolved,thenthe
detailsof the placesmentioned,thenthe possessiontransferredandsoon We’ve frozenthe activity
ina waythat allowsusto describe itwithinarelational database.
Thisis a simplifiedversionof the structure we use torecord transactions,butit still givesusquite a
lotof flexibility.[Slide44: multipledonations] Forexample,itmeansthatwe can record a numberof
donatedpossessionsinthe same record;we don’tneed 22 differentfactoidsforthe 22 different
thingsTassilogave toKremsmünster,whichisabigrelief.Andif there are different locationsor
termsand conditionsfordifferentpossessionswe canrecordthose inone go.
But if we’re goingtouse a data structure like this,we needtomake sure thatwe can interpret
everythingunambiguouslyfromthe informationwe recordinthe table.[Slide 45] If you try andput
bothCharlemagne’sconfirmationtoKremsmünsterandhisgrantto the menof Eberstalzell inthe
same record,how doyou keeptrackof who’sgettingwhatpossessions?Yourapidlygeta complete
mess. Sowe had to define atransactionas involving onlytwomainagents(oragentsworking
together) andonlyone type of activity,sowe didn’tcombine aconfirmationandagrant inthe same
record.
We still had furtherdifficultiestosolve. One issuewasthatthe neatdistinctionI’ve beenmakingall
alongbetweenagents,placesandpossessionsdoesn’t actuallyworkwhenyoulook closely atthe
data. [Slide 46] For example, lookatthe 22 possessionsthatDKAR1:169 mentions. Asyou’ll see,
amongthe thingsbeinggranted [click] are entitiesthatwe regardas agents:churches,forexample,
like thatat Alburg,butalsopeople beinggranted,like the craftsmeninRaotola.Andaswell asland
at particularplaces beinggiven,there’salsoawhole place beinggiven: the villaof Alkoven [click].
We hadto ensure thatwe couldrecordagentsand placesas possessions;we alsostillhadtorecord
8
themas agentsand placesinthe normal way.People don’tcease tobe people justbecause the Duke
of Bavariatreatsthemas objectsforsome purposes.
[Slide 47] As youmay alsohave noticed,possessionsno16 and 18 on thislisthave an additional
complication:there’smore thanone objectinvolved. Tassilo’sgiving2vineyardsatAschachand 3 at
Raoltola,sowe alsoneededtorecordquantitiesof objectsbeingtransferred.
A final issue wasthatnotall the transactionswe were interestedinhadidentical flowsof
possessions. [Slide 48:sale] Ina sale,there are possessionsgoingbothways:how doyourecord
that? [click] Ourapproachwas to add anothercolumntothe table for possessions:if the returnbox
ismarked,it meansthatthese particularpossessionsare flowinginthe oppositedirection.[Slide 49]
In the online versionof the database, we use anarrow to highlightthis.
Unfortunately,however, DKAR1:169 hasan evenmore complicatedtwist.[Slide50:
transaction]Let’sgobackto Charlemagne’sinteractionwiththe menof Eberstalzell. Charlemagne
grants to these men the rightto remainonlandthey’ve cleared,butinreturntheyhave todo
service notto him,butthe monasteryof Kremsmünster.Thisisnolongeranarrangementbetween
twoparties:a thirdparty is nowinvolved.
[Slide 51:diagram] Aftera lotof discussion,we eventually developedadata structure inwhichwe
couldrecord sucharrangements,whichwe calledthird-partyreturns. Essentiallythiswasavariantof
the methodwe’dalreadyusedforordinaryreturns;we just neededtoadd anothertick-box tothe
inputform.
But thishighlights akeyissue whenyou’redesigningdatastructures:there’satrade-off between
how accuratelyyoucan representyour dataand how complex yourdatastructuresneedtobe. It’s
alwaysa temptationto developadatabase thatcan deal withevery possible datavariant,butyou
endup bymakingit more and more complicated anddifficulttouse.Asitwas,we had inputscreens
where youhadto scroll across horizontallytosee all the fieldsyouhadtoinput.
[Slide 52:shoes] Eventually therecomesapointwhere youhave to decide you’re notgoingto
change your data structuresanymore; youjusthave to fitthe data, howeverimperfectly,intothe
existingstructuresThe problemisknowingwhenyou’ve got tothat point.Doyou designelaborate
data structuresforthe minorityof casesthatare as complex asthe charter I’ve givenyou? The
problemcomesinknowing howcomplexthe average charterisbefore you’velookedatitindetail.
Third-partyflowsturnedouttoappearonlyinabout 3% of charters; possiblywe couldhave dealt
withtheminsome otherway,but at the time thisseemedthe mosteffectiveapproach.
More generally,whendesigningstructuresandstandardsit’sveryeasytomake decisionsthatyou
laterrealise were wrong.Forexample,we didn’tinitiallytreatall churchesasagents,buttriedto
distinguishbetweenmore andlessimportantchurches.Towardsthe endof the project,we realised
thisapproach wasn’tworkingandhadto go back and re-inputsome of the data.
So developingdatastructuresandstandardsforthe Charlemagne project wasanodd mixture.We
had to combine detailedconceptual analysisof platonicideal of names,charters,transactionsand
9
the like withthe messyreality of the actual formsthatsuch thingstake inpractice.JohnBradley and
Michele Pasin, membersof ourdigital humanitiesteam, once wrote apaperentitled ‘Structuring
that whichcannotbe structured’ 3
andin a sense that’swhatwe’ve beentryingtodo. Butit’sonlyif
youcan inputdataintothe database ina way that’sbothstructuredand that retainsasmuch of its
meaningthatyoucan getit outagain ina useful form.That’swhatI want to discussbrieflyinthe
final partof thistalk.
GETTING DATA OUT
In some waysgettingdataout isharderthan gettingitin.Just sortingoutthe displays of factoidsso
that theymake sense toend-usersis time-consumingandthere are still aspectsof thatthat coulddo
withbeingimproved.Butthe biggestproblemwhenwe were designingthe userinterface was
providinganeffective wayforuserstobrowse the database andfindthe particularinformation
there were lookingfor.
The main methodwe usedwas‘facetedbrowsing’,whichisincreasinglyusedbymanydatabases.To
explainhowthatworks,I’ll showyouaverysimple example,notusingcharters.[Slide53] Suppose
youhave a database thatcontainsinformationabout colouredshapes.How doyoufindthe object
withthe particularshape andcolour youwant?
The traditional wayiswitha search box [Slide 54],butthere are several problemswiththis.The first
isnot knowingthe righttermsto use. [Slide 55] Forexample,youinputthe term‘square’andget
zeroresults. Whyisthat? Because the shapesinthe database aren’tactuallysquares, evenif they
looklike it, butrectangles,sothey’re all listedunderthat term.Similarly,supposeyou’re interested
inparticularsorts of triangle. [slide56] You inputthe pair of terms‘red’and ‘triangle’.Againyouget
zeroresults. Butwhyis that?Doesthe database notcontaintriangles?Orhas itclassifiedthemall
undersome differentterm?Perhapsitthinksthatcolourisn’tredbutscarlet? Searchingadatabase
you’re notfamiliarwithcanbe worryinglylike makingrepeated stabsinthe dark.
Facetedbrowsing,incontrast,providesaneasierwaytonarrow downyoursearch till youfind
exactlywhatyou’re lookingfor. [Slide 57] Inthiscase,you mightbe givena choice of twofiltersto
browse by:by shape or bycolour.Choose one of these [click] andyouthengetshownhow many
examplesthere are of each colour[click].It’smuchsimplertofindall the red objectsyouwant[Slide
58].
But you’re notinterestedinall red objects,justredtriangles.Here’swhere youcancombine filters.
[Slide 59] Choose the shape filter,andyou’re shown whatshapesthe redobjectshave Sothere
definitelyaren’tanyredtrianglesinthe database,it’snotjustthatyou’ve gotyour searchterms
wrong.Andif you wantto, youcan nowclearthe colour filteranduse the shape filtertolookfor
trianglesof anycolour[Slide 60].
Facetedbrowsingtherefore,offersusersaneasywayof narrowingdownentriesinadatabase to
findexactlywhattheywant.You’re alwayskeptaware of whatfacetsyou’ve usedalreadyandyou
3 Bradley,John, and Pasin,Michele,'Structuring that which cannot be structured: a role for formal models in
representing aspects of Medieval Scotland',in Matthew Hammond (ed.), New perspectives on medieval
Scotland, 1093-1286,Woodbridge: The Boydell Press,2013,pp. 203-14
10
can remove some of themif youfindyou’re notgettinganyresults.The same basicprinciplesas
withthe colouredshapesunderlie the muchmore complicatedfacetedbrowsinginourdatabase.
You can browse bycharter, agentor place and graduallydrill downtofindwhatyouwant [Slide 61–
3 clicks]
Facetingmakesthingsalot easierforusers,butitdoesn’tsolve all theirsearchproblems.Users of
our database can browse byeithercharters,agentsorplaces,buttheyhave to thinkabout the
meaningof the filterstheyuse inthese differentviewstogetthe resultstheyexpect.Suppose
you’re browsingbycharters.Youcan choose tofilterthe chartersby variouscharacteristicsof the
agentstheycontain,soyou couldchoose all those thatinclude scribes[Slide 62] andthenadd in
women asan additional filter[Slide 63]. Youendup with177 charters,but that doesn’tmeanthat
there are 177 charterswrittenbyfemale slides,butthatthere are 177 whichinclude ascribe of
some sex andalsoa womaninsome role. [Slide 64] Infact, if yousearch viaagents,youfindthat so
far we haven’tfoundany scribeswhoare definitelyfemale.
In addition,facetedbrowsingwitha database like ours(unlike withthe colouredshapesexample)
needsalot of behind-the-sceneswork sothatwhenusersclickona filtertheygetthe resultsthey
expect.Todemonstrate that,Iwant to talkaboutone of the most complicatedfilterswe hadto
develop,thatlinkingagentsandplaces.If youwant tofindall the agents connected withthe place
Worms,for example,howdoyou definethisconnection inameaningful way?
A simple-mindedrule wouldsayall thatall agents whoappear ina charter whichmentions place X
are connected withplace X.[Slide 65]. Butas you’ll see fromthe exampleof DKAR1:169 youget
some unsatisfactory results. The groupof Slavssomewhereoutinthe wildsof Austria,forexample,
may well nothave knownanythingaboutWorms.Doesitmake sense to connectthemto it?Equally,
if we happenedtoknowthe scribe of thischarter (we don’t),he’dbe sittinginWormsandthe first
he may have heardof Leombachiswhenhe’saskedtowrite a charter mentioningit.AsforTassilo,
by 791 when thischarter waswritten,he’dbeendeposedbyCharlemagne andwasbeingheldin a
monasteryinNormandy. ConnectinghimtoWormsbecause a charter he’donce givenwas
confirmedthere seemstenuousatbest.
What we had to do therefore waslookathow agentsare connectedtoplacesinthe charter bythe
rolesthey play.[Slide 66] Youimmediatelygetamuchsmallerbutmore meaningful setof
connections.So,forexample,Tassiloasagranter isconnectedtothe differentplaceshe donated
and so is Kremsmünsterasthe recipient.Butthe monastery isalsoconnectedtoWorms,because it
(or at leastFaderrepresentingit) wenttoWormsto geta confirmationcharter fromCharlemagne.
The beekeepersof Raotola,meanwhile,are connectedto Raotola,butnotto anyof the otherplaces
mentionedinthe charter. We don’tknow if theyeverwenttosome of the other properties of
Kremsmünsterthatare mentioned.Andwe don’tknow exactly where the decaniaof Slavswere,so
theydon’tgetconnectedto anywhere.
We obviouslycouldn’tdothissortof detailedanalysisof agentandplace interconnectionsforevery
charter. [Slide 67] Instead,we hadto come upwith rules(still verycomplicated) forhow agentroles
and place rolesare connectedtogether. Forexample,anyone whose agentrole makesthem
responsible forthe flowsof possessions(like grantersandrecipients) shouldbe linkedtoall places
11
whichhave the place role ‘locationof possession’.Inmostcases(whichwe specify) theyshouldalso
be linkedtoanyplace withthe place role ‘locationof transaction’.Anyone whoseagentrole just
involvesthembeingpresentata transaction (like apetitioner) shouldbe linkedonlytothe place
withrole ‘locationof transaction’.
The full rules justforthisfacetprobably tookus a monthor more of discussiontoworkout. [Slide
68] Here’sanextractfrom the final results.Thisdocumentgivesthe rules,butitalsoincludesthe
testdata: examplesthatwe coulduse tocheckthe facetswere workingaswe wanted.Doingthis
testingwasincrediblypernicketyandtediousforboththe historiansandITspecialists [Slide69].But
it wasthe onlyway of checkingthatuserswouldalwaysgetthe expectedresults. Facetedbrowsing
isa verypowerful tool forusers,butit’snoteasyto setup a database to be able to use it.
CONCLUSIONS(5 min)
In thistalkI’ve triedtogive a feel forthe practicalitiesof a medievaldatabase projectandtoshow
somethingof the messinessbehindthe neatfacade.Iwanttoendwith five more general points
aboutcombiningdigital technologyandhistorical texts.
[Slide 70] [click] One of the mostbasic problemswe hadwiththe projectdidn’tinvolve the digital
aspectat all.It was simply understandingwhatsome chartersactuallymeant.Whatisthe decania of
Slavsthat DKAR1:169 refersto?We decideditwasprobably some kindof administrativeunit,but
the charter’sirritatinglyvague andwe’re stillnotquite sure.
[click] A secondissue ishowto avoidre-inventingthe wheel withdigital historyprojects.We didour
bestto buildonwhatpreviousprojectshaddone,butit’s sometimes surprisinglydifficulttofindout
specifictechnical detailsof otherprojects.We’re thereforedoingourbesttopreserve and
documentthe knowledge we’ve gained,throughthe website’sblogand alsothroughpresentations
like this.
[click] One of the otherreasonsprojectstendtoendup re-inventingthe wheel isthatdifferent
historical periods produce notjustdifferenttypesof document,butdifferentstyles.The Peopleof
Medieval Scotlandproject,forexample wasapreviousprojectbasedoncharters,butthese are far
more standardised documents inthe twelfthcenturythaninthe eighth.Changingsocial practices
alsomakesstandardpracticesacross databases verydifficulttoimplement.POMS, dealingwith
Scotland inthe central Middle Ages,foundituseful torecordGaelicandLatin namesseparately,
whichwasn’tan issue forus.On the otherhand,about half the people inmedievalScotland seemto
have beencalled eitherJohn,WilliamorRobert,sothe researchersonPOMS didn’thave tospend so
much time arguingaboutwhetherAdalbertreallyisthe same name asOdalpert.
[click] Fourthly,althoughI’ve focusedondatastructuresin thistalk, data inputstandardsare also
veryimportantforany historical projectandthey dohave to be specifiedin incredible detail.Touse
an analogy, youcan’t have one inputterclassifyingshapesasrectangleswhile anothersees themas
squares.Eventhe mostbasic data tobe inputhasto be agreed.
Evenso, problemsof inconsistentinputtingovertime are inevitable,evenif it’sjustone person
doingthat.We didour bestto discussanddocumentthe decisionswe made,usingwiki software
12
and a lotof Skype meetings,butwe still hadtodo a lotof data clean-uptowardsthe endof the
project.
[click] Whichleadsme tomy final pointabouthistorical databases.They’re inevitablyimperfect. It
doesn’thave tobe quite at the level of garbage in,garbage out,but behindthe cleanfacade of any
database projectthere tendstobe some verymessydata anda lotof compromisesondatabase
design.Butthenhistoryismessyandearlymedievalhistoryisparticularlyso.The Charlemagne
database isn’tperfect,butdespite all itsimperfection,we hope it’ll still be avaluable tool forfuture
researchers.

Contenu connexe

Tendances

Open Knowledge Foundation Edinburgh meet-up #3
Open Knowledge Foundation Edinburgh meet-up #3Open Knowledge Foundation Edinburgh meet-up #3
Open Knowledge Foundation Edinburgh meet-up #3Gill Hamilton
 
Modeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVModeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVEUDAT
 
Vocabularies as Linked Data: SENESCHAL & HeritageData.org
Vocabularies as Linked Data: SENESCHAL & HeritageData.orgVocabularies as Linked Data: SENESCHAL & HeritageData.org
Vocabularies as Linked Data: SENESCHAL & HeritageData.orgKeith.May
 
Semantic Web special interest group meeting - IFLA WLIC 2012
Semantic Web special interest group meeting - IFLA WLIC 2012Semantic Web special interest group meeting - IFLA WLIC 2012
Semantic Web special interest group meeting - IFLA WLIC 2012Figoblog
 
Linked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationLinked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationSören Auer
 
Vocabularies as Linked Data - OUDCE March2014
Vocabularies as Linked Data - OUDCE March2014Vocabularies as Linked Data - OUDCE March2014
Vocabularies as Linked Data - OUDCE March2014Keith.May
 
20130711 records2 graphs_madrid
20130711 records2 graphs_madrid20130711 records2 graphs_madrid
20130711 records2 graphs_madridStefan Gradmann
 
Overview of Open Data, Linked Data and Web Science
Overview of Open Data, Linked Data and Web ScienceOverview of Open Data, Linked Data and Web Science
Overview of Open Data, Linked Data and Web ScienceHaklae Kim
 
What can linked data do for digital libraries
What can linked data do for digital librariesWhat can linked data do for digital libraries
What can linked data do for digital librariesSören Auer
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesOpen Data Support
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogsandrea huang
 
Notes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
Notes for talk on 12th June 2013 to Open Innovation meeting, GlasgowNotes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
Notes for talk on 12th June 2013 to Open Innovation meeting, GlasgowPeterWinstanley1
 
Context is King: On Semantic Publishing
Context is King: On Semantic PublishingContext is King: On Semantic Publishing
Context is King: On Semantic PublishingStefan Gradmann
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greaterCristina Sarasua
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
 
IFLA 2012 - OCLC Linked Data round table
IFLA 2012 - OCLC Linked Data round tableIFLA 2012 - OCLC Linked Data round table
IFLA 2012 - OCLC Linked Data round tableFigoblog
 

Tendances (18)

Open Knowledge Foundation Edinburgh meet-up #3
Open Knowledge Foundation Edinburgh meet-up #3Open Knowledge Foundation Edinburgh meet-up #3
Open Knowledge Foundation Edinburgh meet-up #3
 
Modeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVModeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROV
 
Connecting Museums with Linked Data
Connecting Museums with Linked DataConnecting Museums with Linked Data
Connecting Museums with Linked Data
 
Vocabularies as Linked Data: SENESCHAL & HeritageData.org
Vocabularies as Linked Data: SENESCHAL & HeritageData.orgVocabularies as Linked Data: SENESCHAL & HeritageData.org
Vocabularies as Linked Data: SENESCHAL & HeritageData.org
 
Semantic Web special interest group meeting - IFLA WLIC 2012
Semantic Web special interest group meeting - IFLA WLIC 2012Semantic Web special interest group meeting - IFLA WLIC 2012
Semantic Web special interest group meeting - IFLA WLIC 2012
 
Linked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationLinked data for Enterprise Data Integration
Linked data for Enterprise Data Integration
 
Vocabularies as Linked Data - OUDCE March2014
Vocabularies as Linked Data - OUDCE March2014Vocabularies as Linked Data - OUDCE March2014
Vocabularies as Linked Data - OUDCE March2014
 
20130711 records2 graphs_madrid
20130711 records2 graphs_madrid20130711 records2 graphs_madrid
20130711 records2 graphs_madrid
 
Overview of Open Data, Linked Data and Web Science
Overview of Open Data, Linked Data and Web ScienceOverview of Open Data, Linked Data and Web Science
Overview of Open Data, Linked Data and Web Science
 
What can linked data do for digital libraries
What can linked data do for digital librariesWhat can linked data do for digital libraries
What can linked data do for digital libraries
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and Examples
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs
 
Notes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
Notes for talk on 12th June 2013 to Open Innovation meeting, GlasgowNotes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
Notes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
 
Context is King: On Semantic Publishing
Context is King: On Semantic PublishingContext is King: On Semantic Publishing
Context is King: On Semantic Publishing
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greater
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 
Datamining
DataminingDatamining
Datamining
 
IFLA 2012 - OCLC Linked Data round table
IFLA 2012 - OCLC Linked Data round tableIFLA 2012 - OCLC Linked Data round table
IFLA 2012 - OCLC Linked Data round table
 

Similaire à Carolingian Charter Database Design

2014 11-17 crichton institute talk on open data
2014 11-17 crichton institute talk on open data2014 11-17 crichton institute talk on open data
2014 11-17 crichton institute talk on open dataPeterWinstanley1
 
SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Link...
SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Link...SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Link...
SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Link...CIGScotland
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Jane Stevenson
 
Bingham, De Wild & Aasman Presentation
Bingham, De Wild & Aasman PresentationBingham, De Wild & Aasman Presentation
Bingham, De Wild & Aasman PresentationWARCnet
 
An Open and Shut Case? Shared Standards for Stratigraphic Data and Heritage L...
An Open and Shut Case? Shared Standards for Stratigraphic Data and Heritage L...An Open and Shut Case? Shared Standards for Stratigraphic Data and Heritage L...
An Open and Shut Case? Shared Standards for Stratigraphic Data and Heritage L...Keith.May
 
Nabil Sultan. The disruptive and democratizing credentials of cloud computing
Nabil Sultan. The disruptive and democratizing credentials of  cloud computingNabil Sultan. The disruptive and democratizing credentials of  cloud computing
Nabil Sultan. The disruptive and democratizing credentials of cloud computingCBOD ANR project U-PSUD
 
Data Lakes versus Data Warehouses
Data Lakes versus Data WarehousesData Lakes versus Data Warehouses
Data Lakes versus Data WarehousesTom Donoghue
 
Review of big data analytics (bda) architecture trends and analysis
Review of big data analytics (bda) architecture   trends and analysis Review of big data analytics (bda) architecture   trends and analysis
Review of big data analytics (bda) architecture trends and analysis Conference Papers
 
Collections Databases; Making the system work for you
Collections Databases; Making the system work for youCollections Databases; Making the system work for you
Collections Databases; Making the system work for youirowson
 
Kellogg XML Holland Speech
Kellogg XML Holland SpeechKellogg XML Holland Speech
Kellogg XML Holland SpeechDave Kellogg
 
Providing geospatial information as Linked Open Data
Providing geospatial information as Linked Open DataProviding geospatial information as Linked Open Data
Providing geospatial information as Linked Open DataPat Kenny
 
An Evaluation Of Query Processing Strategies Using The TIPSTER Collection
An Evaluation Of Query Processing Strategies Using The TIPSTER CollectionAn Evaluation Of Query Processing Strategies Using The TIPSTER Collection
An Evaluation Of Query Processing Strategies Using The TIPSTER CollectionSara Perez
 
Introduction to-data-mining chapter 1
Introduction to-data-mining  chapter 1Introduction to-data-mining  chapter 1
Introduction to-data-mining chapter 1Mahmoud Alfarra
 

Similaire à Carolingian Charter Database Design (20)

Database Management & Models
Database Management & ModelsDatabase Management & Models
Database Management & Models
 
2014 11-17 crichton institute talk on open data
2014 11-17 crichton institute talk on open data2014 11-17 crichton institute talk on open data
2014 11-17 crichton institute talk on open data
 
SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Link...
SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Link...SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Link...
SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Link...
 
Aggregation as tactic sm new
Aggregation as tactic sm newAggregation as tactic sm new
Aggregation as tactic sm new
 
Aggregation as Tactic
Aggregation as TacticAggregation as Tactic
Aggregation as Tactic
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
 
Bingham, De Wild & Aasman Presentation
Bingham, De Wild & Aasman PresentationBingham, De Wild & Aasman Presentation
Bingham, De Wild & Aasman Presentation
 
Database Essay
Database EssayDatabase Essay
Database Essay
 
It's all semantics! -The premises and promises of the semantic web
It's all semantics! -The premises and promises of the semantic webIt's all semantics! -The premises and promises of the semantic web
It's all semantics! -The premises and promises of the semantic web
 
Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"
Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"
Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"
 
An Open and Shut Case? Shared Standards for Stratigraphic Data and Heritage L...
An Open and Shut Case? Shared Standards for Stratigraphic Data and Heritage L...An Open and Shut Case? Shared Standards for Stratigraphic Data and Heritage L...
An Open and Shut Case? Shared Standards for Stratigraphic Data and Heritage L...
 
Nabil Sultan. The disruptive and democratizing credentials of cloud computing
Nabil Sultan. The disruptive and democratizing credentials of  cloud computingNabil Sultan. The disruptive and democratizing credentials of  cloud computing
Nabil Sultan. The disruptive and democratizing credentials of cloud computing
 
Data Lakes versus Data Warehouses
Data Lakes versus Data WarehousesData Lakes versus Data Warehouses
Data Lakes versus Data Warehouses
 
Review of big data analytics (bda) architecture trends and analysis
Review of big data analytics (bda) architecture   trends and analysis Review of big data analytics (bda) architecture   trends and analysis
Review of big data analytics (bda) architecture trends and analysis
 
Collections Databases; Making the system work for you
Collections Databases; Making the system work for youCollections Databases; Making the system work for you
Collections Databases; Making the system work for you
 
Kellogg XML Holland Speech
Kellogg XML Holland SpeechKellogg XML Holland Speech
Kellogg XML Holland Speech
 
Providing geospatial information as Linked Open Data
Providing geospatial information as Linked Open DataProviding geospatial information as Linked Open Data
Providing geospatial information as Linked Open Data
 
An Evaluation Of Query Processing Strategies Using The TIPSTER Collection
An Evaluation Of Query Processing Strategies Using The TIPSTER CollectionAn Evaluation Of Query Processing Strategies Using The TIPSTER Collection
An Evaluation Of Query Processing Strategies Using The TIPSTER Collection
 
Introduction to-data-mining chapter 1
Introduction to-data-mining  chapter 1Introduction to-data-mining  chapter 1
Introduction to-data-mining chapter 1
 
Sonex deposit meeting_ws_20110301
Sonex deposit meeting_ws_20110301Sonex deposit meeting_ws_20110301
Sonex deposit meeting_ws_20110301
 

Carolingian Charter Database Design

  • 1. 1 Bits of charters: putting Carolingiancharters intoa database INTRODUCTION [Slide 2:projectwebsite]The Makingof Charlemagne’sEurope project ranfrom2012-2014 at King’s College Londonand createdadatabase frameworkforthe storage and retrieval of prosopographical, geographical andsocio-economicdatafrom earlymedieval charters.The projectteamalsoinput data fromnearly1000 chartersintothe database systemtoproduce a corpusthat couldaddressa wide range of researchqueries. If anyone wouldlike a demonstration of the database,Icangive one brieflylater. Thistalk isfocusing on the decisionprocesses behindthe scenes of the project, discussingsome aspectsof the database designandthe reasonsforthem. How do youdecide the technological formandscope of a project? How can typesof data such as places be enteredconsistentlyandretrievedeffectively?How doyou thenlinktogetherbasicunitsof people,placesandpossessionstoprovide useful representationsof activitiesinacharter?And finally,havingputall the dataintoa database, how doyouget it out again?What I’m tryingto showissome of the practical problems involvedintakingunstructured and oftenambiguousmedieval informationandputtingitindigital formats. But I wantto start withthe most basicquestion:why thisproject?Digital humanitiesreflectsthe interactionof twodifferentfields:fora goodprojectyouneeda balance betweenthe technological developments thatmake new digital approachesfeasibleandthe researchquestionsthat scholars wantnew insightsinto.Andthere’salsoathirdaspect,whichshouldn’tbe overlooked:whatyoucan getfundingfor. One of the bigproblemswith alotof digital humanities projectsisthat’sithardto getthe long-termfundingtoallowthemtoshow theirfull potential. The Making of Charlemagne’sEurope projectessentiallyarose fromthe coincidence of twoaspects. [Slide 3:previousDDHprojects]One wasthatthere isa longtraditionof creating prosopographic databasesat King’s;the Departmentof Digital Humanitieshad more than20 yearsof expertisein that. [Slide 4:MKCHEUR spreadsheet]The otherwasthe large numberof Carolingiancharterswhichexist, whose prospographical and socio-economicdatahasn’tbeenfullyexploited.Itmade obvioussense to combine these two factorsandalsoto take advantage of developmentsinmappingsoftware to representspatial informationinnewways. CHARTERS AND THEIR REPRESENTATION I’ve mentioned chartersseveral times,butwhatexactly are they?The shortanswerismedievallegal documents,mostlyconcernedwiththe grantingorsellingof property.I’vegivenyouahandout whichisthe textand translationof one quite longcharter.I’mgoingto be referringtothat a lotin my examples. But a longeransweristhat there are differentwaysof understandingwhatacharter isand they logically resultindifferentkindsof digital projects. [Slide5:understanding] Thereare three main waysyou can thinkof a charter: as a material object,asatextor as a source of information.
  • 2. 2 Firstly,youcan thinkof a charter as a material object:aparticularpiece of parchmentwithwords and signswrittenonit. [Slide 6:Marburg] A digital projectthinkingof chartersinthat way,like the Lichtbildarchiv ältererOriginalurkunden includes high-qualitydigital imagesof the charters, combinedwithtoolstoallowthe analysisof these images.A similarapproachisbeingtakenby the newModelsof AuthorityprojectbeingjointlydevelopedbyKing’sCollege andGlasgow.1 The problemforsuch an approachwithCarolingianchartersisthatwe don’thave the original manuscriptforthe vast majorityof them;insteadwe onlyhave copiesfromlaterinthe Middle Ages (or sometimes evenjustearlymoderntranscripts). Anapproach focusingonmaterial objects would therefore meanwe were unabletouse a lotof charters fromthe corpus. A second viewof charterstreatsthemas texts. [Slide 7:CDLM] One optionforour projectwould have beentoset upa database containingthe full textsof charters, togetherwithsome kindof XML mark-upto allowusto pickout wordsand termsof particularinterest:personal names,titles,verbs indicatingdonationetc.Thisisthe approachtakenbyprojectssuch as Codice diplomaticodellaLombardiamedievale. [Slide 8:CEI]There’salreadyan international group workingonstandardsfor encodingcharters The difficulty if we wentdownthatroute wouldhave been the datawe workedwith.Carolingian chartershave relativelylittlestructure tothem,theirspellingcanbe idiosyncratictothe pointof unintelligibility andtheycanalsobe maddeninglyindirectinreferringtopeopleandplaces. [Slide 9: Texton screen] Forexample, the charterI’ve givenyoureferstohow Tassilo ‘hadcauseda monasterytobe newlybuiltinhonourof the HolySaviourwithinourforestatthe place called Kremsmünsterinthe pagusof Traungau’(inmonasteriumhonore sancti Salvatorisinfrawaldo nostroloco qui diciturChremisainpagonuncupante Drungaoe novoopere construerefecisset). What bitof that Latintextdo youmark up as the ‘name’of Kremsmünstermonastery?[Slide10: referencestoFater] AndalthoughFateris the abbotof Kremsmünster,thisfactisonlyever mentionedindirectly Sohowdoyoumark up the textto show that Fater’snotjustany oldabbot, but specificallythe abbotof the monastery receivingthe donation? Recentscholarshiponchartershastendedtofavourthe textual approachtocharters:it’soften focusedonindividualchartersorat the widestregional Urkundenlandschaften (charterlandscapes). But we didn’twantto treatcharters as special snowflakes:we wantedtofindawayto lookacross the whole corpusfromCataloniatoAustriaand compare socio-economicdatafromacross different regions.Butmark-upof textswouldn’t allow us todothiseasily:itsstructuresaren’tsuitedto findingall the unfree peoplementioned underhalf adozendifferentLatinterms orfor findingall female vendors. To answerthose kindof questions,ourprojectwas thereforethinkingof chartersina thirdway:as sourcesfromwhichinformation couldbe extractedand putintostandardisedformatsfor comparisonandlarge-scale analysis. Andourtoolsforthatwere a large-scale relational database combinedwiththe ‘factoidmodel’. 1 http://www.digipal.eu/blog/new-digipal-project-models-of-authority/
  • 3. 3 [Slide 11:definition] Sowhat’sthe factoidmodel?Itwas developedbyKing’sforprevious prosopographical projects. Andatitsheart isthe factoid, an assertion made bythe projectteamthat a source says somethingaboutapersonor a place. Such a model canbe appliedtoanythingfroma saint'slife (the VitaCuthberti states“Cuthberthealed awomanfroma demon”) toa coin (FitzwilliamMuseumcoin839 states "Charlemagne iskingof the Franks"). How didwe apply the conceptof factoidstoour specific project?We startedoff withsome main modelsforbasicentitiessuchasagents(peopleorinstitutions),placesandsources;these contain staticinformationthatdoesn’tchange betweencharters(informationsuchasa person’s sex orthe geo-locationof aparticularplace).We thenlinkedthese modelstogetherviafactoids.Here’swhat thisfactoidmodel lookslike inthe specificcase of our Charlemagne database[Slide 12:Factoid diagram] So factoidsare statementswhichlinktogether agents,placesandpossessionsderivedfroma specificcharter.These factoidsare of varying types,dependingonthe contentthattheyneedto contain.For example, youneeddifferent datastructures toinputthe statement"Faterisabbotof Kremsmünster"(whatwe call anattribute andrelationshipfactoid) thantoinputthe statement "FaterpetitionedCharlemagne to confirmthe propertyof the monasteryof Kremsmünster"(what we call a transactionfactoid).Authoritylists,meanwhile, are usedtoensure thatdata isentered consistently,e.g.that Faterisnot describedas"abbot"inone factoidand"abbas" inanother. Thisfactoidmethodpotentiallyallowed ustoinputa wide range of data intoa single database. We still hadto determine alotof bigissues,however.What specificdatafromeach charterdidwe want to record?What structuresand fields wouldwe needforthis dataandhow didwe ensure datawas enteredconsistently?Andhowdidwe then allow enduserstoextractthe data? A lotof the firstyearof the projectwasspent iteratively developingthe database andthe inputting standards, ratherthan beingable tocarry out much sustaininputting. We startedinthe mostbasic way:just lookinghardat the data. [Slide 13] Discussinghow we dealtwith people whowerebeing transacted,[Slide 14] collatingexamplesof termsusedtoformthe basisfor authoritylists.The charter I’ve givenyouasa handout, DKAR1:169 was one that we wentthrough several timesin considerable detail,because ithadthe kindof complexitythatwe neededtobe able tohandle withinthe system. Ithas18 differentplacesinit:Iknow because we analysedeverysingle one of them.It alsocontainedalotof complicatedinteractionsbetween differentagents andplaces;[Slide 15: diagram] here’sone of my early attemptsat drawinga diagramof that.I’ll come back to that diagramlateron. We were alsothinkingabout the questions usersmightwanttoask [Slide 16:WendyDaviesand Mark Mersiowsky]2 ,consideringthe differentresearchtopics thathistorians hadalready explored usingcharters.[Slide 17:questionslist] We endedupwithalonglistof possible questions thatthe 2 Wendy Davies,Small worlds: the village community in early medieval Brittany (London, 1988); Mark Mersiowsky,'Y-a-t-il une influencedes actes royaux sur les actes privés du IXe siècle?',in Marie-José• Gasse- Grandjean and Benoît-Michel Tock (eds.), Les actes comme expression du pouvoir au haut Moyen âge: actes de la table ronde de Nancy, 26-27 novembre 1999,Atelier de recherches sur les textes médiévaux, 5 (Turnhout, 2003),pp.139-78
  • 4. 4 database mightbe usedto answer.Lookingatthemnow at the endof the project, mostof themare too complicatedtoanswerwiththe currentversionof the userinterface.But it’sbythinkingof key researchquestions inthatwaythat we couldworkout which data to record. For example, we developedstructurestorecordthe relationshipsbetweenplaces, anoption whichwasn’tavailablein previousprojects.I’lltalkabout those abitlater. Anotherpartof developingthe projectwasdeciding itslimits:some of the issuesthatwe wouldn’t realistically be able totackle. The single mostfrequentcomplaintwe have hadconcerningthe database isthat we don’tinclude the full textof the charters(althoughwe hope inthe future to include linksto the full textavailablefromotheronlinesites).Butgettingsuchdataintothe system consistently wouldhave requiredavastamountof extrawork.I’ve givenyouthe textof DKAR 1:169 on the handout.Here’s the firstsentence of atextof a differenteditionof the same charter [Slide 18: Kremsmünstercharter].Thissingle sentence has 6differences fromthe textI’ve givenyou[click]. If we’dusedprojecttime toinputthe full textof databaseswe probablywouldn’thave ahdtime to do anythingelse. DATA STRUCTURES AND STANDARDS There are alwaysthingsyou’re notable to achieve withinone project.Butevenwithoutthe full text, whydiddevelopingdatastructuresandstandardsfor the Charlemagne database take solong? What I want todo now is to lookinsome detail ata couple of examplesof the datastructureswe created. Everydatabase projecthasto do thiskindof work,so althoughsome of the detailsI’ll be discussingare specifictoearlymedievalcharters,Ihope you’ll findalotof the conceptsare more widelyrelevant.Ourunderlyingrelational databaseishorrendouslycomplex.The schemaforitwent fromthis[slide 19] to this[slide20] andit’snow even bigger. Butbecause it’sarelational database, you’re essentiallybuildingupthiscomplexityfromalarge number of relativelysmall buildingblocks. I’mgoingto be talkingabouttwoof these:placesandtransactionfactoids,usingexamples mainly fromthe charterI’ve givenyou onthe handout. Places Thinkingaboutplaces,twomainaspectsdrove the structureswe developedfor recordingthem.One was technologicaldevelopmentsindigitalmapping;we wantedtobe able torepresentdata patternson a map onscreen.The otherwasthe data itself. [Slide21] In charters,whatwe get isn’t technicallyplaces,butplace names.How yougetfroma name ina medievalchartertoa spot on the map turnsout to require alot of thought. The firstdistinctionwe made inthe database wasbetweencharter-specificinformation onplace names(whichwe recordedin individualfactoids)andstaticinformationonaplace (whichwas recordedinthe place model). [Slide 22] Forexample, the charterI’ve givenyou,DKAR1:169 has two differentmanuscripts andtheirspellingof places varies. The place name editedas‘Bettinbah’ hasa variantspellinginfootnote11 of ‘Petenpach’. There’sanothercharterrecordingTassilo’soriginal grant whichcallsthe same place ‘Petinpach’.Insome chartersyou evengetseveral different spellingsof aname (especiallypersonalnames) within asingle text.Andthere’salsothe complicationof Latinbeinganinflectedlanguage,soyoucan get namesin differentcases,likethe ‘actum Wormacie’inthe lastline of the charter.
  • 5. 5 One of ourearlydecisions therefore wasthatwe’dincludefieldstorecordthe original textof place namesand thatthese original textboxes wouldbe searchable. [Slide23:place name search]. If you put ‘Wormacie’intothe searchplace namesfunctionsof the database,you’llcorrectlyfind‘Worms’, be toldthat it’sin Germanyand evenhave itlocatedona map foryou that,so that you have what youmightcall a ‘bottom-up’search.If you’ve gotjustaname in a medieval charteryou’ve gota possible wayof identifyingit. But behindthatlinkingfromaname instance toa map, there’salot of work goingonin termsof identificationandstandardisation. I’ll show this schematicallywith anotherplace inthe same charter: the place calledLiublinbah(towardsthe bottomof the firstpage of the handout).[Slide 24: charter-specificinformation]We startoff withwhatthe chartertellsus:Liublinbahisa‘locus’,itsrole inthe charteris as the locationof a possessionbeingdonatedandit’sinthe pagusof Drungaoor Traungau (pagiare Carolingianadministrative regions,similarto Anglo-Saxoncounties). The firstthingwe have to do is produce a standardisedmedieval name forthisplace.Ourmain reference source forsuch LatinnamesisOrbisLatinus[slide 25],but like the majorityof minor placesinour charter,Liublinbahdoesn’tappear inthat.[Slide 26:standardmedieval name] So insteadwe choose one of the spellingsanduse that(addinganasterisktoshow that it may need furtherworkat some point). The nextquestioniswhere thisplace is.We relyonthe editorsof the charterfor this;we don’ttry and researchdetailsforourselves,since we simplydon’thave time. [slide 27:place identification] In thiscase,the editorsaysthatthe settlementconcernedisLeombach,a locationinAustriaandhis identificationiscertain. [slide 28] Once we have thisidentification,we can use modernreference toolsto check the geo-locationforLeombachandwhatpart of Austriait’sin.[slide 29] Fromthis,we can create a place recordfor Leombach.[slide 30:map fromDKAR1:169] Whenwe enterdata concerningthe charteritself we thenincludecharter-specificinformationsuchas the place’s role or the place descriptor. That may all seemratherlongwinded, butthiswayof thinkingaboutplacesgivesusa lotof flexibility for dealingwithmore complex cases. Forexample,we quiteoftengetriversormountains mentionedincharters;[slide 31:Ipfbach]inDKAR1:169 there are the two riverscalledIpfbach (Ipphas inthe original).We can’teasilygeo-locate rivers,butwe caninputthe name data we do have intothe systemandproduce recordsfor natural featuresinthatway The more difficultsituationiswhenthere’suncertaintyaboutwhat place the medieval name refers to. Sometimes,the editorwill justgive anapproximate area.[Slide 32:Raotola]. The editorof DKAR 1:169 thinks Raotola,where there are several vineyards, issomewhere onthe Rodelbach,atributary of the Danube.We inputthe place as having a medieval place name,butnomodernone,butan approximate locationmeanswe canrecordit withinmodernplace hierarchies:it’ssomewhere withinUpperAustria Alternativelythe editormaysuggest one ormore possible modernlocationsforaparticular medieval place. Forexample,acharterfromMondsee discussesadonationof propertyatan unknownplace called Teginga.[Slide33:schematic] Here iswhere we effectively carve upour
  • 6. 6 earlierschematicintotwo.We still have amedievalplace onone side, butitisnow beingtentatively matchedto three differentmodernplaces,withvaryinglevelsof probability. [Slide34] Byrecording possible matchesinthisway,we caneventuallygenerate adisplayforthe usersthatshowsthe possible options. The final aspectof places I want to talkaboutis place relationshipfactoids.One of the thingswe realisedearlieronwhenlookingatourplace data is that we had twodifferentsortsof place hierarchy,where placesare inlargerunits.Firstly,we hadthe modernhierarchies:Leombachisin the Austrianstate UpperAustria,whichisinthe moderncountryAustria.Thatwas easyenoughto record.But what didwe do aboutthe otherinformationwe’re beinggiven,thatLeombach isinthe pagusof Traungau? How do we recordmedieval hierarchies? PreviousprojectsatKing’shave beenable tofudge thisandmeldtogethermodernandmedieval hierarchiesbecause they’ve beendealingwithaBritishadministrative systemthat’sextraordinarily long-lasting. Forexample, the Prosopographyof Anglo-SaxonEnglandusedthe pre-1974English countiesfortheirhierarchies,whichare nearenoughtoAnglo-Saxoncountiestobe workable.But we didn’tknowwhere Carolingian pagiwere onthe map.Infact, there’sevenascholarlyargument aboutwhetherpagiwere flatareason the ground at all or just scatteredcollectionsof administrativerights.So torecordthe factthat Leombachwas inthe pagusof Traungau, which mighthelpresearchersto understand more aboutpagi,we neededsome additionalstructures. What we endedupusingiswhat’scalleda place relationshipfactoid. [Slide 35] Thisisa charter specificassertionthat“CharterXsays Place 1 isin Place 2”, and alsoincludesplace descriptorsfor the two placesconcerned.The original ideawasthata userwouldbe able to pull upa place record for a medieval regionand thenbe able tosee all the placessaidto be withinin(includinggeo- locationswhere they’reknown).We weren’table toimplementthisfully, butevensothese factoids still provide useful information.Andtheyalsoillustrate anotherimportantaspectof any digital humanitiesproject.If youthink apiece of data mightbe useful,it’smuchbetterto start recordingit earlyon, rather thanhavingto go back laterand rechecka large numberof records. Transaction factoids As youcan see,justdesigningthe buildingblocksforthe database system, like places,takesalotof thoughtif you’re goingtobe able to recordthemconsistently. Ourtrickiestproblem, however,was workingouthowto record transactions,the actual businessof the charters,withinthe database. Andit’sdifficultbecausewe’re tryingtorecord dynamicratherthan juststatic information. If you lookat this place relationshipfactoid,forexample,the statement‘charterXsays Leombach isin Traungau’doesn’talterthroughoutthe charter. Similarly,whenDKAR1:169 refersto ‘AbbotFater’, whichwe’drecordas the attribute andrelationshipfactoid ‘Faterisabbotof Kremsmünster,that’s a fixedstatement.There maybe earlierandlaterchartersinwhichFaterisn’tabbotof Kremsmünster, but inthisparticularcharter he alwaysis. In contrast,the mainimportance of a charter is thatit’schangingthings,typicallysomeone granting propertyto someone else.Thingsare differentafterthe actionsdescribedinthe charterthanthey were before.Buthowdoyou recordsuch a change in a relational database structure thatdoesn’t allowfordifferentstates?
  • 7. 7 The firstthingwe did was simplifythingsby breakingdownthe activitiesin anycharterintoa collectionof differenttransactions (possessionsflowingaround) andevents(allthe otherthings goingon).[Slide 36] I showedyou thisdiagramforthe activitiesinDKAR1:169 earlieron. It’svery complex because itshowsyoualmostall the activities (thoughinfactthere are few more eventhan that).But if we start breakingthese activities down,we cangetrather more manageable units. Firstly,there are three differentevents. [Slide 37] Tassilofoundsthe monasteryof Kremsmünster and [Slide 38] there are two examplesof landclearance.[Slide 39] Activitieslike theseare recorded ineventfactoids,whichgive fairlybasicinformationaboutagents,placesandthe type of activity goingon. Once we’ve dealtwiththose, we’re leftwiththe transactioninformation[Slide 40].Whatwe have in thisdiagramis three separate transactions. [Slide41] One isa recordof whathappened inthe past: Tassilograntedpossessions toKremsmünster.[Slide 42]The othertwoare Charlemagne’sactionsin the present.He confirmsKremsmünster’spossessionsandhe alsograntsto the menof Eberstalzell the right to remainonlandthey’ve cleared illegally We needtobreakdown the charter intothese separate transactionstoallow us torepresentthe flowsof possessions inadatabase structure.[Slide 43] Sofor example,Tassilo’sinitial grant to Kremsmünstercanbe representedinaseries of tablesthatshowsthe agentsinvolved,thenthe detailsof the placesmentioned,thenthe possessiontransferredandsoon We’ve frozenthe activity ina waythat allowsusto describe itwithinarelational database. Thisis a simplifiedversionof the structure we use torecord transactions,butit still givesusquite a lotof flexibility.[Slide44: multipledonations] Forexample,itmeansthatwe can record a numberof donatedpossessionsinthe same record;we don’tneed 22 differentfactoidsforthe 22 different thingsTassilogave toKremsmünster,whichisabigrelief.Andif there are different locationsor termsand conditionsfordifferentpossessionswe canrecordthose inone go. But if we’re goingtouse a data structure like this,we needtomake sure thatwe can interpret everythingunambiguouslyfromthe informationwe recordinthe table.[Slide 45] If you try andput bothCharlemagne’sconfirmationtoKremsmünsterandhisgrantto the menof Eberstalzell inthe same record,how doyou keeptrackof who’sgettingwhatpossessions?Yourapidlygeta complete mess. Sowe had to define atransactionas involving onlytwomainagents(oragentsworking together) andonlyone type of activity,sowe didn’tcombine aconfirmationandagrant inthe same record. We still had furtherdifficultiestosolve. One issuewasthatthe neatdistinctionI’ve beenmakingall alongbetweenagents,placesandpossessionsdoesn’t actuallyworkwhenyoulook closely atthe data. [Slide 46] For example, lookatthe 22 possessionsthatDKAR1:169 mentions. Asyou’ll see, amongthe thingsbeinggranted [click] are entitiesthatwe regardas agents:churches,forexample, like thatat Alburg,butalsopeople beinggranted,like the craftsmeninRaotola.Andaswell asland at particularplaces beinggiven,there’salsoawhole place beinggiven: the villaof Alkoven [click]. We hadto ensure thatwe couldrecordagentsand placesas possessions;we alsostillhadtorecord
  • 8. 8 themas agentsand placesinthe normal way.People don’tcease tobe people justbecause the Duke of Bavariatreatsthemas objectsforsome purposes. [Slide 47] As youmay alsohave noticed,possessionsno16 and 18 on thislisthave an additional complication:there’smore thanone objectinvolved. Tassilo’sgiving2vineyardsatAschachand 3 at Raoltola,sowe alsoneededtorecordquantitiesof objectsbeingtransferred. A final issue wasthatnotall the transactionswe were interestedinhadidentical flowsof possessions. [Slide 48:sale] Ina sale,there are possessionsgoingbothways:how doyourecord that? [click] Ourapproachwas to add anothercolumntothe table for possessions:if the returnbox ismarked,it meansthatthese particularpossessionsare flowinginthe oppositedirection.[Slide 49] In the online versionof the database, we use anarrow to highlightthis. Unfortunately,however, DKAR1:169 hasan evenmore complicatedtwist.[Slide50: transaction]Let’sgobackto Charlemagne’sinteractionwiththe menof Eberstalzell. Charlemagne grants to these men the rightto remainonlandthey’ve cleared,butinreturntheyhave todo service notto him,butthe monasteryof Kremsmünster.Thisisnolongeranarrangementbetween twoparties:a thirdparty is nowinvolved. [Slide 51:diagram] Aftera lotof discussion,we eventually developedadata structure inwhichwe couldrecord sucharrangements,whichwe calledthird-partyreturns. Essentiallythiswasavariantof the methodwe’dalreadyusedforordinaryreturns;we just neededtoadd anothertick-box tothe inputform. But thishighlights akeyissue whenyou’redesigningdatastructures:there’satrade-off between how accuratelyyoucan representyour dataand how complex yourdatastructuresneedtobe. It’s alwaysa temptationto developadatabase thatcan deal withevery possible datavariant,butyou endup bymakingit more and more complicated anddifficulttouse.Asitwas,we had inputscreens where youhadto scroll across horizontallytosee all the fieldsyouhadtoinput. [Slide 52:shoes] Eventually therecomesapointwhere youhave to decide you’re notgoingto change your data structuresanymore; youjusthave to fitthe data, howeverimperfectly,intothe existingstructuresThe problemisknowingwhenyou’ve got tothat point.Doyou designelaborate data structuresforthe minorityof casesthatare as complex asthe charter I’ve givenyou? The problemcomesinknowing howcomplexthe average charterisbefore you’velookedatitindetail. Third-partyflowsturnedouttoappearonlyinabout 3% of charters; possiblywe couldhave dealt withtheminsome otherway,but at the time thisseemedthe mosteffectiveapproach. More generally,whendesigningstructuresandstandardsit’sveryeasytomake decisionsthatyou laterrealise were wrong.Forexample,we didn’tinitiallytreatall churchesasagents,buttriedto distinguishbetweenmore andlessimportantchurches.Towardsthe endof the project,we realised thisapproach wasn’tworkingandhadto go back and re-inputsome of the data. So developingdatastructuresandstandardsforthe Charlemagne project wasanodd mixture.We had to combine detailedconceptual analysisof platonicideal of names,charters,transactionsand
  • 9. 9 the like withthe messyreality of the actual formsthatsuch thingstake inpractice.JohnBradley and Michele Pasin, membersof ourdigital humanitiesteam, once wrote apaperentitled ‘Structuring that whichcannotbe structured’ 3 andin a sense that’swhatwe’ve beentryingtodo. Butit’sonlyif youcan inputdataintothe database ina way that’sbothstructuredand that retainsasmuch of its meaningthatyoucan getit outagain ina useful form.That’swhatI want to discussbrieflyinthe final partof thistalk. GETTING DATA OUT In some waysgettingdataout isharderthan gettingitin.Just sortingoutthe displays of factoidsso that theymake sense toend-usersis time-consumingandthere are still aspectsof thatthat coulddo withbeingimproved.Butthe biggestproblemwhenwe were designingthe userinterface was providinganeffective wayforuserstobrowse the database andfindthe particularinformation there were lookingfor. The main methodwe usedwas‘facetedbrowsing’,whichisincreasinglyusedbymanydatabases.To explainhowthatworks,I’ll showyouaverysimple example,notusingcharters.[Slide53] Suppose youhave a database thatcontainsinformationabout colouredshapes.How doyoufindthe object withthe particularshape andcolour youwant? The traditional wayiswitha search box [Slide 54],butthere are several problemswiththis.The first isnot knowingthe righttermsto use. [Slide 55] Forexample,youinputthe term‘square’andget zeroresults. Whyisthat? Because the shapesinthe database aren’tactuallysquares, evenif they looklike it, butrectangles,sothey’re all listedunderthat term.Similarly,supposeyou’re interested inparticularsorts of triangle. [slide56] You inputthe pair of terms‘red’and ‘triangle’.Againyouget zeroresults. Butwhyis that?Doesthe database notcontaintriangles?Orhas itclassifiedthemall undersome differentterm?Perhapsitthinksthatcolourisn’tredbutscarlet? Searchingadatabase you’re notfamiliarwithcanbe worryinglylike makingrepeated stabsinthe dark. Facetedbrowsing,incontrast,providesaneasierwaytonarrow downyoursearch till youfind exactlywhatyou’re lookingfor. [Slide 57] Inthiscase,you mightbe givena choice of twofiltersto browse by:by shape or bycolour.Choose one of these [click] andyouthengetshownhow many examplesthere are of each colour[click].It’smuchsimplertofindall the red objectsyouwant[Slide 58]. But you’re notinterestedinall red objects,justredtriangles.Here’swhere youcancombine filters. [Slide 59] Choose the shape filter,andyou’re shown whatshapesthe redobjectshave Sothere definitelyaren’tanyredtrianglesinthe database,it’snotjustthatyou’ve gotyour searchterms wrong.Andif you wantto, youcan nowclearthe colour filteranduse the shape filtertolookfor trianglesof anycolour[Slide 60]. Facetedbrowsingtherefore,offersusersaneasywayof narrowingdownentriesinadatabase to findexactlywhattheywant.You’re alwayskeptaware of whatfacetsyou’ve usedalreadyandyou 3 Bradley,John, and Pasin,Michele,'Structuring that which cannot be structured: a role for formal models in representing aspects of Medieval Scotland',in Matthew Hammond (ed.), New perspectives on medieval Scotland, 1093-1286,Woodbridge: The Boydell Press,2013,pp. 203-14
  • 10. 10 can remove some of themif youfindyou’re notgettinganyresults.The same basicprinciplesas withthe colouredshapesunderlie the muchmore complicatedfacetedbrowsinginourdatabase. You can browse bycharter, agentor place and graduallydrill downtofindwhatyouwant [Slide 61– 3 clicks] Facetingmakesthingsalot easierforusers,butitdoesn’tsolve all theirsearchproblems.Users of our database can browse byeithercharters,agentsorplaces,buttheyhave to thinkabout the meaningof the filterstheyuse inthese differentviewstogetthe resultstheyexpect.Suppose you’re browsingbycharters.Youcan choose tofilterthe chartersby variouscharacteristicsof the agentstheycontain,soyou couldchoose all those thatinclude scribes[Slide 62] andthenadd in women asan additional filter[Slide 63]. Youendup with177 charters,but that doesn’tmeanthat there are 177 charterswrittenbyfemale slides,butthatthere are 177 whichinclude ascribe of some sex andalsoa womaninsome role. [Slide 64] Infact, if yousearch viaagents,youfindthat so far we haven’tfoundany scribeswhoare definitelyfemale. In addition,facetedbrowsingwitha database like ours(unlike withthe colouredshapesexample) needsalot of behind-the-sceneswork sothatwhenusersclickona filtertheygetthe resultsthey expect.Todemonstrate that,Iwant to talkaboutone of the most complicatedfilterswe hadto develop,thatlinkingagentsandplaces.If youwant tofindall the agents connected withthe place Worms,for example,howdoyou definethisconnection inameaningful way? A simple-mindedrule wouldsayall thatall agents whoappear ina charter whichmentions place X are connected withplace X.[Slide 65]. Butas you’ll see fromthe exampleof DKAR1:169 youget some unsatisfactory results. The groupof Slavssomewhereoutinthe wildsof Austria,forexample, may well nothave knownanythingaboutWorms.Doesitmake sense to connectthemto it?Equally, if we happenedtoknowthe scribe of thischarter (we don’t),he’dbe sittinginWormsandthe first he may have heardof Leombachiswhenhe’saskedtowrite a charter mentioningit.AsforTassilo, by 791 when thischarter waswritten,he’dbeendeposedbyCharlemagne andwasbeingheldin a monasteryinNormandy. ConnectinghimtoWormsbecause a charter he’donce givenwas confirmedthere seemstenuousatbest. What we had to do therefore waslookathow agentsare connectedtoplacesinthe charter bythe rolesthey play.[Slide 66] Youimmediatelygetamuchsmallerbutmore meaningful setof connections.So,forexample,Tassiloasagranter isconnectedtothe differentplaceshe donated and so is Kremsmünsterasthe recipient.Butthe monastery isalsoconnectedtoWorms,because it (or at leastFaderrepresentingit) wenttoWormsto geta confirmationcharter fromCharlemagne. The beekeepersof Raotola,meanwhile,are connectedto Raotola,butnotto anyof the otherplaces mentionedinthe charter. We don’tknow if theyeverwenttosome of the other properties of Kremsmünsterthatare mentioned.Andwe don’tknow exactly where the decaniaof Slavswere,so theydon’tgetconnectedto anywhere. We obviouslycouldn’tdothissortof detailedanalysisof agentandplace interconnectionsforevery charter. [Slide 67] Instead,we hadto come upwith rules(still verycomplicated) forhow agentroles and place rolesare connectedtogether. Forexample,anyone whose agentrole makesthem responsible forthe flowsof possessions(like grantersandrecipients) shouldbe linkedtoall places
  • 11. 11 whichhave the place role ‘locationof possession’.Inmostcases(whichwe specify) theyshouldalso be linkedtoanyplace withthe place role ‘locationof transaction’.Anyone whoseagentrole just involvesthembeingpresentata transaction (like apetitioner) shouldbe linkedonlytothe place withrole ‘locationof transaction’. The full rules justforthisfacetprobably tookus a monthor more of discussiontoworkout. [Slide 68] Here’sanextractfrom the final results.Thisdocumentgivesthe rules,butitalsoincludesthe testdata: examplesthatwe coulduse tocheckthe facetswere workingaswe wanted.Doingthis testingwasincrediblypernicketyandtediousforboththe historiansandITspecialists [Slide69].But it wasthe onlyway of checkingthatuserswouldalwaysgetthe expectedresults. Facetedbrowsing isa verypowerful tool forusers,butit’snoteasyto setup a database to be able to use it. CONCLUSIONS(5 min) In thistalkI’ve triedtogive a feel forthe practicalitiesof a medievaldatabase projectandtoshow somethingof the messinessbehindthe neatfacade.Iwanttoendwith five more general points aboutcombiningdigital technologyandhistorical texts. [Slide 70] [click] One of the mostbasic problemswe hadwiththe projectdidn’tinvolve the digital aspectat all.It was simply understandingwhatsome chartersactuallymeant.Whatisthe decania of Slavsthat DKAR1:169 refersto?We decideditwasprobably some kindof administrativeunit,but the charter’sirritatinglyvague andwe’re stillnotquite sure. [click] A secondissue ishowto avoidre-inventingthe wheel withdigital historyprojects.We didour bestto buildonwhatpreviousprojectshaddone,butit’s sometimes surprisinglydifficulttofindout specifictechnical detailsof otherprojects.We’re thereforedoingourbesttopreserve and documentthe knowledge we’ve gained,throughthe website’sblogand alsothroughpresentations like this. [click] One of the otherreasonsprojectstendtoendup re-inventingthe wheel isthatdifferent historical periods produce notjustdifferenttypesof document,butdifferentstyles.The Peopleof Medieval Scotlandproject,forexample wasapreviousprojectbasedoncharters,butthese are far more standardised documents inthe twelfthcenturythaninthe eighth.Changingsocial practices alsomakesstandardpracticesacross databases verydifficulttoimplement.POMS, dealingwith Scotland inthe central Middle Ages,foundituseful torecordGaelicandLatin namesseparately, whichwasn’tan issue forus.On the otherhand,about half the people inmedievalScotland seemto have beencalled eitherJohn,WilliamorRobert,sothe researchersonPOMS didn’thave tospend so much time arguingaboutwhetherAdalbertreallyisthe same name asOdalpert. [click] Fourthly,althoughI’ve focusedondatastructuresin thistalk, data inputstandardsare also veryimportantforany historical projectandthey dohave to be specifiedin incredible detail.Touse an analogy, youcan’t have one inputterclassifyingshapesasrectangleswhile anothersees themas squares.Eventhe mostbasic data tobe inputhasto be agreed. Evenso, problemsof inconsistentinputtingovertime are inevitable,evenif it’sjustone person doingthat.We didour bestto discussanddocumentthe decisionswe made,usingwiki software
  • 12. 12 and a lotof Skype meetings,butwe still hadtodo a lotof data clean-uptowardsthe endof the project. [click] Whichleadsme tomy final pointabouthistorical databases.They’re inevitablyimperfect. It doesn’thave tobe quite at the level of garbage in,garbage out,but behindthe cleanfacade of any database projectthere tendstobe some verymessydata anda lotof compromisesondatabase design.Butthenhistoryismessyandearlymedievalhistoryisparticularlyso.The Charlemagne database isn’tperfect,butdespite all itsimperfection,we hope it’ll still be avaluable tool forfuture researchers.