SHEBANQ project (half-way) as a use case in querying language resources. The corpus is the text of the Hebrew Bible with linguistic features, packaged in de special text database and converted to LAF
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Shebanq roma-2013-10-01
1. Data Archiving and Networked Services!
SHEBANQ!
Dirk Roorda - researcher @ DANS,TLA!
System for HEBrew Text: ANnotations
for Queries and Markup!
TEI pre-conference workshop: Query!
Roma – 2013-10-01!
2. Overview
1. Context: text, data, research in Hebrew
Bible
2. MdF database model, MQL query
language
3. Sharing the research process
4. CLARIN-NL project: SHEBANQ
5. Towards new tools
3. 1 (of 5) Context
Text, data and research in the Hebrew Bible
4. VU Amsterdam
Eep Talstra Centre for Bible and Computer
text + linguistic features => database
database + research questions => publications
4!
5. 2 (of 5) MdF and MQL
• MdF database model
• MQL query language
6. Monad Object Feature
1977-now: Eep Talstra et al. ECA, WIVU.
Print reference (Google Books)
1988-1994 Crist-Jan Doedens: Text
Databases – One Database Model and
Several Retrieval Languages (google
books reference)
2004: Ulrik Petersen. Emdros - a text
database engine for analyzed or
annotated text. COLING
12. Leiden: international workshop
biblical scholarship
Desiderata:
new tool development
text transmission (variants)
linguistic analysis (features)
even combined!
a short history: 2012
leiden lorentz!
16. Research Data Cycle
Text transmission,
tradition, editorial
processes
Free University,
theology faculty,
server department,
WIVU project
!
NWO projects!NWO projects
religious
communities
theol.
scholars
theol.
scholars
enlightened lay
people
scholarly-
ibles.com!
17. Research Data Cycle
Text transmission,
tradition, editorial
processes
Free University,
theology faculty,
server department,
WIVU project
!
NWO projects!NWO projects
religious
communities
theol.
scholars
theol.
scholars
CLARIN
SHEBANQ
linguists
Wider public:
Annotation,
Query Saving,
via Linked Data
dig. hum
comp. hum
enlightened lay
people
scholarly-
ibles.com!
Research Data
Archiving
DANS
18. 3 (of 5) Sharing (c’t’d)
Solution: Queries As Annotations
19. queries-as-annotations
model! query! example!
body! query instruction!
SELECT ALL OBJECTS WHERE [Word
FOCUS part_of_speech = verb AND
lexeme = "!]"שים
targets!
query results in
context!
ׁרֶשֲא ֙ןֶבֶ֨א ָה ֶתא ֤חַּקִּי ַו ֶרקֹּ֗ב ַּב ֜בֹקֲעַי ֨םֵּכְׁשַּי ַו
ֶןמֶׁ֖ש ֥קֹצִּי ַו ֑הָבֵּצַמ ּהָ֖תֹא ׂםֶשָּ֥י ַו ֔יוָתֹׁשֲאַֽרְמ ֣םָׂש
ּהָֽׁשֹאר ַלע
annotation! published query! qu123 (just an identifier)!
metadata!
researcher, date
created, date last
run, research
question!
Janet Dyk 2004-02-16 2012-01-27
Can the verb יםִׂש have a double
object? - article in Foundations
for Syriac Lexicography!
43. select all objects where
[clause
[phrase phrase_function = Objc
[word FOCUS tense = infinitive_absolute]
]
]
Execute
Query executed
Passage
ּבְֵראׁשִ֖יתּבָָר֣אאֱֹלהִ֑יםאֵ֥תהַּׁשָמַ֖יִםוְאֵ֥ת
הָאֶָֽרץ׃
וַּיֹ֥אמֶרחִזְִקּיָ֖הּומָ֣האֹ֑ותּכִ֥יאֶעֱלֶ֖הּבֵ֥ית
יְהוָֽה׃
Controls
וַּיֹ֥אמֶרחִזְִקּיָ֖הּומָ֣האֹ֑ותּכִ֥יאֶעֱלֶ֖הּבֵ֥ית
יְהוָֽה׃
Gen 1:1
2Chron 3:4
Gen 1:1
ּבְֵראׁשִ֖יתּבָָר֣אאֱֹלהִ֑יםאֵ֥תהַּׁשָמַ֖יִםוְאֵ֥ת
הָאֶָֽרץ׃
וַּיֹ֥אמֶרחִזְִקּיָ֖הּומָ֣האֹ֑ותּכִ֥יאֶעֱלֶ֖הּבֵ֥ית
יְהוָֽה׃
Text
1Sam 12:4
Ex 23:2
Query results
Prev 2 3 65 ... 2241 Next21 313 results
Executing query ...
view in context
Save this query
Researcher Oliver Glanz
Date created 2013-08-25
Date last run 2013-08-25
Project Data and Tradition
Institute VU/Eep Talstra Centre for Bible and Computing
Reason irregular valency of ּבָָר֣א
Comments
needs to be combined with query on אֱֹלהִ֑ים
Save PublishCancel
Name valency ּבָָר֣א
Edit Query
47. 5 (of 5) Towards new tools
• LAF tools
• or generic graph algorithms
• Emdros tools
• or generic database technology
• Linked Data tools
• or generic SPARQL queries
48. Side conditions
• development close to the researchers
• preferably in their own institutions
• decent performance
• within the scale of a laptop
• usable to researchers
• that is: non-programmers
• persistence in mind
• new results will be archived and re-
enter the data cycle