SlideShare a Scribd company logo
1 of 38
HIT
      Humanities Integration Technology




Enhancing research in the Humanities through an
  integrated knowledge management system
Team
Collaboration

Constantino Malagón
  Associate professor of Computer Engineering
  Universidad Nebrija, Spain

Justo Hidalgo
  Vice-President, Denodo Technologies
  Co-Founder of 24symbols

Yonsoo Kim
  Assistant Professor of Spanish
 School of Languages & Cultures, Purdue University
Collaboration
Javier Polanco - Developer
  Undergraduate student at Nebrija University
  Now, Computer Engineer

Carlos Martínez – Web Designer
  Undergraduate student at Nebrija University


Eric Herrera – Website Testing
  Undergraduate student at Purdue University
HIT
Introduction and Objectives
Solution
Results
Conclusions
Future Work
Introduction
Our first idea: Help researchers in
Humanities
1. Medieval documents (MMEDIS.com)
     First: Transcription
     Then: Search, Access, Context
2. Finally, a web portal (HIT)
Medieval
    Document




MSS 120
Author: Gilbertus Anglicus
Medieval documents
Automatic transcription
  Abbreviations in medieval medical
  documents

International Conference of Frontiers in
Handwriting Recognition (ICFHR2012)
  Main peer reviewed conference
Medieval documents
Search and access
  Hispanic Seminary:
  El Corpus de Textos Médicos Españoles:
  http://www.hispanicseminary.org/t&c/med/index.htm
  Keyword: “medicina”
Medieval documents
Context

   Author

   Dates

   Related research
HIT Web portal
Visualization tool

   Expand the document type to any published, digitized
   format, not just medieval texts

   Expand the type and number ofsources, databases and
   repositories

   Expand the contextual information
Objectives
This implies to extend our first idea to a more general
system

And more flexible
Flexible!
Difficult
But flexibility implies a greater degree of
difficulty
Solution:


 HIT
HIT
Repositories

Data access

Data integration

Visualization

User interaction
Repositories
Core – These sources provide the digitized documents



Contextual – These sources provide contextual
information
Repositories
Core
– Jstor
– Project Muse
– MLA Bibliography
– Patrología Latina
Contextual
– Amazon
– Google Books
– Wikipedia
Access Types
API (Application Programming Interface)




Screen Scraping
Access
API (Application Programming Interface)
– These sources provide a set of rules and
  programmatic “doors” that let us interact with them
– Example: Amazon, Google Books



– Amazon, give me all info you have about the book
  with ISBN=“XXXX”
Access
Screen Scraping
– We need to “scratch” the web page and create
  structure out of it
– Example: Wikipedia
Data integration
Data virtualization
Data visualization
   Web app enabled for:

   Browsers: I. Explorer

(several versions),

Google Chrome, Firefox

   Device: Desktop,

mobile devices,

tablets
Interaction
Like igoogle
  Set of personalized panels
Arquitecture
Virtualization
Results
Conclusions
The proof of concept has shown:

- How to access heterogeneous, web-based
data sources
- How to integrate those data pieces in a single
data model
Conclusions
The proof of concept has shown:

- How to execute search methods among those
sources
- How to visualize this info in a meaningful,
useful way
Future work
To expand the list of repositories
Portal personalization
Adapt HIT to all kinds of devices
Addition of semantic capabilities
Thanks!

More Related Content

What's hot

Relevance of clasification and indexing
Relevance of clasification and indexingRelevance of clasification and indexing
Relevance of clasification and indexing
VaralakshmiRSR
 
In other words...: Using multiple taxonimies
In other words...: Using multiple taxonimiesIn other words...: Using multiple taxonimies
In other words...: Using multiple taxonimies
kramsey
 

What's hot (19)

Auto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and SensemakingAuto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
 
The Future of Library Cataloguing
The Future of Library CataloguingThe Future of Library Cataloguing
The Future of Library Cataloguing
 
Share: discovery: a focus on papers
Share: discovery: a focus on papersShare: discovery: a focus on papers
Share: discovery: a focus on papers
 
Relevance of clasification and indexing
Relevance of clasification and indexingRelevance of clasification and indexing
Relevance of clasification and indexing
 
Shared Canvas presentation at the LIBER conference
Shared Canvas presentation at the LIBER conferenceShared Canvas presentation at the LIBER conference
Shared Canvas presentation at the LIBER conference
 
Inteligent Catalogue Final
Inteligent Catalogue FinalInteligent Catalogue Final
Inteligent Catalogue Final
 
The Information Workbench -
The Information Workbench -  The Information Workbench -
The Information Workbench -
 
Multilingual Knowledge Organization Systems Management: Best Practices
Multilingual Knowledge Organization Systems Management: Best PracticesMultilingual Knowledge Organization Systems Management: Best Practices
Multilingual Knowledge Organization Systems Management: Best Practices
 
ProQuest Flow
ProQuest FlowProQuest Flow
ProQuest Flow
 
In other words...: Using multiple taxonimies
In other words...: Using multiple taxonimiesIn other words...: Using multiple taxonimies
In other words...: Using multiple taxonimies
 
New Directions in Information Organization: A Linked Data Model with BIBFRAME
New Directions in Information Organization: A Linked Data Model with BIBFRAMENew Directions in Information Organization: A Linked Data Model with BIBFRAME
New Directions in Information Organization: A Linked Data Model with BIBFRAME
 
International Journal on AdHoc Networking Systems (IJANS)
International Journal on AdHoc Networking Systems (IJANS)International Journal on AdHoc Networking Systems (IJANS)
International Journal on AdHoc Networking Systems (IJANS)
 
International Journal of Web & Semantic Technology (IJWesT)
International Journal of Web & Semantic Technology (IJWesT)International Journal of Web & Semantic Technology (IJWesT)
International Journal of Web & Semantic Technology (IJWesT)
 
International Conference on NLP & Big Data (NLPD 2020)
International Conference on NLP & Big Data (NLPD 2020)International Conference on NLP & Big Data (NLPD 2020)
International Conference on NLP & Big Data (NLPD 2020)
 
International Journal of Web & Semantic Technology (IJWesT)
International Journal of Web & Semantic Technology (IJWesT)International Journal of Web & Semantic Technology (IJWesT)
International Journal of Web & Semantic Technology (IJWesT)
 
International Journal of Web & Semantic Technology (IJWesT)
 International Journal of Web & Semantic Technology (IJWesT) International Journal of Web & Semantic Technology (IJWesT)
International Journal of Web & Semantic Technology (IJWesT)
 
call for papers - International Journal of Web & Semantic Technology (IJWesT)
call for papers - International Journal of Web & Semantic Technology (IJWesT)call for papers - International Journal of Web & Semantic Technology (IJWesT)
call for papers - International Journal of Web & Semantic Technology (IJWesT)
 
Ijwest.cfp
Ijwest.cfpIjwest.cfp
Ijwest.cfp
 
call for papers - International Journal of Web & Semantic Technology (IJWesT)
call for papers - International Journal of Web & Semantic Technology (IJWesT)call for papers - International Journal of Web & Semantic Technology (IJWesT)
call for papers - International Journal of Web & Semantic Technology (IJWesT)
 

Similar to HIT project - Humanities Integration Technology

20080903arsenalsofnemesis 04
20080903arsenalsofnemesis 0420080903arsenalsofnemesis 04
20080903arsenalsofnemesis 04
Richard Ovenden
 
Pratt Sils LIS653 4 Fall 2007
Pratt Sils LIS653 4 Fall 2007Pratt Sils LIS653 4 Fall 2007
Pratt Sils LIS653 4 Fall 2007
PrattSILS
 
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
innovatics
 
Developments in Access to Art Information: EnCompass Digital Portal. 2003
Developments in Access to Art Information: EnCompass Digital Portal. 2003Developments in Access to Art Information: EnCompass Digital Portal. 2003
Developments in Access to Art Information: EnCompass Digital Portal. 2003
Rose Holley
 

Similar to HIT project - Humanities Integration Technology (20)

Digital Humanities Workshop
Digital Humanities WorkshopDigital Humanities Workshop
Digital Humanities Workshop
 
20080903arsenalsofnemesis 04
20080903arsenalsofnemesis 0420080903arsenalsofnemesis 04
20080903arsenalsofnemesis 04
 
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
 
Reshaping the world of scholarly communication by Dr. Usha Munshi
Reshaping the world of scholarly communication by Dr. Usha MunshiReshaping the world of scholarly communication by Dr. Usha Munshi
Reshaping the world of scholarly communication by Dr. Usha Munshi
 
Libraries meet research 2.0
Libraries meet research 2.0Libraries meet research 2.0
Libraries meet research 2.0
 
Pratt Sils LIS653 4 Fall 2007
Pratt Sils LIS653 4 Fall 2007Pratt Sils LIS653 4 Fall 2007
Pratt Sils LIS653 4 Fall 2007
 
020610
020610020610
020610
 
Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries
 
Building Together With Collaborative Web Technologies Revised
Building Together With Collaborative Web Technologies RevisedBuilding Together With Collaborative Web Technologies Revised
Building Together With Collaborative Web Technologies Revised
 
Mediawiki and Wiki As a Medium
Mediawiki and Wiki As a MediumMediawiki and Wiki As a Medium
Mediawiki and Wiki As a Medium
 
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
 
Wiser Pku Lecture@Life Science School Pku
Wiser Pku Lecture@Life Science School PkuWiser Pku Lecture@Life Science School Pku
Wiser Pku Lecture@Life Science School Pku
 
Wiserpku Lecture@Life Science School Pku
Wiserpku Lecture@Life Science School PkuWiserpku Lecture@Life Science School Pku
Wiserpku Lecture@Life Science School Pku
 
Developments in Access to Art Information: EnCompass Digital Portal. 2003
Developments in Access to Art Information: EnCompass Digital Portal. 2003Developments in Access to Art Information: EnCompass Digital Portal. 2003
Developments in Access to Art Information: EnCompass Digital Portal. 2003
 
Gic2011 aula0-ingles
Gic2011 aula0-inglesGic2011 aula0-ingles
Gic2011 aula0-ingles
 
Gic2011 aula10-ingles
Gic2011 aula10-inglesGic2011 aula10-ingles
Gic2011 aula10-ingles
 
Open Corpus Adaptive Hypermedia
Open Corpus Adaptive HypermediaOpen Corpus Adaptive Hypermedia
Open Corpus Adaptive Hypermedia
 
Bibliotheek & Onderzoek 2.0?
Bibliotheek & Onderzoek 2.0?Bibliotheek & Onderzoek 2.0?
Bibliotheek & Onderzoek 2.0?
 
Hybrid Publishing Lab - Scholarly Communication in the Digital Age
Hybrid Publishing Lab - Scholarly Communication in the Digital AgeHybrid Publishing Lab - Scholarly Communication in the Digital Age
Hybrid Publishing Lab - Scholarly Communication in the Digital Age
 
Elsevier Gran Challenge: The living document
Elsevier Gran Challenge: The living documentElsevier Gran Challenge: The living document
Elsevier Gran Challenge: The living document
 

More from Justo Hidalgo

Measure or die! Tetuan Valley Barcelona, Fall 2014
Measure or die! Tetuan Valley Barcelona, Fall 2014Measure or die! Tetuan Valley Barcelona, Fall 2014
Measure or die! Tetuan Valley Barcelona, Fall 2014
Justo Hidalgo
 

More from Justo Hidalgo (20)

Product Management - much more than coding and designing
Product Management - much more than coding and designingProduct Management - much more than coding and designing
Product Management - much more than coding and designing
 
Idea, Producto y Negocio. Qué hay que saber para crear productos digitales (a...
Idea, Producto y Negocio. Qué hay que saber para crear productos digitales (a...Idea, Producto y Negocio. Qué hay que saber para crear productos digitales (a...
Idea, Producto y Negocio. Qué hay que saber para crear productos digitales (a...
 
Data Analytics for Startups - Tetuan Valley Startup School Fall 2015
Data Analytics for Startups - Tetuan Valley Startup School Fall 2015Data Analytics for Startups - Tetuan Valley Startup School Fall 2015
Data Analytics for Startups - Tetuan Valley Startup School Fall 2015
 
Ebook subscription services - an example of user-focused innovation in publis...
Ebook subscription services - an example of user-focused innovation in publis...Ebook subscription services - an example of user-focused innovation in publis...
Ebook subscription services - an example of user-focused innovation in publis...
 
24symbols' story... so far! Pres at xSpain 2015
24symbols' story... so far! Pres at xSpain 201524symbols' story... so far! Pres at xSpain 2015
24symbols' story... so far! Pres at xSpain 2015
 
IDPF 2015 - How 24symbols makes use of Data Science
IDPF 2015 - How 24symbols makes use of Data Science IDPF 2015 - How 24symbols makes use of Data Science
IDPF 2015 - How 24symbols makes use of Data Science
 
Add a Data Scientist to your startup.. or call it quits!
Add a Data Scientist to your startup.. or call it quits!Add a Data Scientist to your startup.. or call it quits!
Add a Data Scientist to your startup.. or call it quits!
 
May you live in interesting times. Munich Book Academy, December 2014
May you live in interesting times. Munich Book Academy, December 2014May you live in interesting times. Munich Book Academy, December 2014
May you live in interesting times. Munich Book Academy, December 2014
 
Measure or die! Tetuan Valley Barcelona, Fall 2014
Measure or die! Tetuan Valley Barcelona, Fall 2014Measure or die! Tetuan Valley Barcelona, Fall 2014
Measure or die! Tetuan Valley Barcelona, Fall 2014
 
ELS2014 - Add a Data Scientist to your Startup or Call it Quits
ELS2014 - Add a Data Scientist to your Startup or Call it QuitsELS2014 - Add a Data Scientist to your Startup or Call it Quits
ELS2014 - Add a Data Scientist to your Startup or Call it Quits
 
Data Analytics for Startups - Tetuan Valley Startup School Fall 2014
Data Analytics for Startups - Tetuan Valley Startup School Fall 2014Data Analytics for Startups - Tetuan Valley Startup School Fall 2014
Data Analytics for Startups - Tetuan Valley Startup School Fall 2014
 
Metrics: because everything counts. Tetuan Valley Spring Session, 2014
Metrics: because everything counts. Tetuan Valley Spring Session, 2014Metrics: because everything counts. Tetuan Valley Spring Session, 2014
Metrics: because everything counts. Tetuan Valley Spring Session, 2014
 
Building a Books-as-a-Service Platform: Challenges and Opportunities. BiB 2013
Building a Books-as-a-Service Platform: Challenges and Opportunities. BiB 2013Building a Books-as-a-Service Platform: Challenges and Opportunities. BiB 2013
Building a Books-as-a-Service Platform: Challenges and Opportunities. BiB 2013
 
Introduction to Metrics - Tetuan Valley/CEU course, March 2014
Introduction to Metrics - Tetuan Valley/CEU course, March 2014Introduction to Metrics - Tetuan Valley/CEU course, March 2014
Introduction to Metrics - Tetuan Valley/CEU course, March 2014
 
Metrics for Startups - Tetuan Valley Startup School Fall Session, 2013
Metrics for Startups - Tetuan Valley Startup School Fall Session, 2013Metrics for Startups - Tetuan Valley Startup School Fall Session, 2013
Metrics for Startups - Tetuan Valley Startup School Fall Session, 2013
 
Online Marketing and Metrics Presentation at UEIA, 2012
Online Marketing and Metrics Presentation at UEIA, 2012Online Marketing and Metrics Presentation at UEIA, 2012
Online Marketing and Metrics Presentation at UEIA, 2012
 
Metrics. Because everything COUNTS (LeanCamp Madrid 2012)
Metrics. Because everything COUNTS (LeanCamp Madrid 2012)Metrics. Because everything COUNTS (LeanCamp Madrid 2012)
Metrics. Because everything COUNTS (LeanCamp Madrid 2012)
 
Taller Nebrija sobre cursos MOOC
Taller Nebrija sobre cursos MOOCTaller Nebrija sobre cursos MOOC
Taller Nebrija sobre cursos MOOC
 
24symbols at 42Beers
24symbols at 42Beers24symbols at 42Beers
24symbols at 42Beers
 
Sowing the seeds of love - a call for a publishing startup accelerator program
Sowing the seeds of love - a call for a publishing startup accelerator programSowing the seeds of love - a call for a publishing startup accelerator program
Sowing the seeds of love - a call for a publishing startup accelerator program
 

Recently uploaded

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Recently uploaded (20)

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 

HIT project - Humanities Integration Technology

Editor's Notes

  1. HIT is an innovative management system that allows one to administer texts, analyze information, and collate images or other sources in a comprehensive web portal, especially custom made for CLA faculties and students. With HIT, one can access and search information on the Internet held on different public repositories in the field of arts, humanities and social sciences (such as Anthropology, Communication, English, Spanish, History, Philosophy, Political Science, Sociology, Visual and Performing Arts, etc.) in a unified way.
  2. This project was develop in collaboration with Constantino Malagón Luque, associate professor of Artificial Intelligence in the Department of Computer Science at Nebrija University (Madrid, Spain); Justo Hidalgo, Vice President, product management and consulting at Denodo Technologies and co-founder of the 24symbols company; And by Yonsoo Kim, myself, professor of Spanish. What a professor of Spanish has to do in technology and Science?
  3. This project was develop in collaboration with Constantino Malagón Luque, associate professor of Artificial Intelligence in the Department of Computer Science at Nebrija University (Madrid, Spain); Justo Hidalgo, Vice President, product management and consulting at Denodo Technologies and co-founder of the 24symbols company; And by Yonsoo Kim, myself, professor of Spanish. What a professor of Spanish has to do in technology and Science?
  4. This project was develop in collaboration with Constantino Malagón Luque, associate professor of Artificial Intelligence in the Department of Computer Science at Nebrija University (Madrid, Spain); Justo Hidalgo, Vice President, product management and consulting at Denodo Technologies and co-founder of the 24symbols company; And by Yonsoo Kim, myself, professor of Spanish.
  5. Constan and I have co-founded a research team called MMEDIS (Medieval Medicine Documents Identification System), where diverse interdisciplinary researchers pursue their goals to create an automatic transcription program with Artificial Intelligence. We have two essential reasons to carry out the MMEDIS project: First, we aims to analyze how medicine shaped and affected lives of the medieval people. This interest stems from my research on Teresa de Cartagena, a converted Jewish nun who became deaf and wrote religious treatises. I intend to investigate physical disabilities and diseases that inflicted pain on people in medieval Europe. Second, we plan to study and transcribe hand-written documents in more efficient ways than traditional paleographic transcription of manuscripts.
  6. Originally composed in Latin by Gilbertus Anglicus (Gilbert the Englishman), his Compendium of Medicine was a primary text of the medical revolution in thirteenth-century Europe. Composed mainly of medicinal recipes, it offered advice on diagnosis, medicinal preparation, and prognosis. In the fifteenth-century it was translated into Middle English to accommodate a widening audience for learning and medical "secrets." For example, Faye Marie Getz provides a critical edition of the Middle English text, with an extensive introduction to the learned, practical, and social components of medieval medicine and a summary of the text in modern English. Her book entitled Healing and society in medieval England: a Middle English translation of the pharmaceutical writings of Gilbertus Anglicus. Like this type of manuscript, once that it ’ s transcribed people do not go back to the original…. Because of all the intensive work that the manuscript required.
  7. It ’ s a tedious work and only specialist in paleography can read it. But the problem does not end here…
  8. Also, we need to decode all the abbreviations in medieval medical documents. We have submitted and article base on the study of the handwriting recognition process.
  9. Our transcriptions project will take some time to make it work efficiently. However, we realized that there are in the internet some websites that works on manually transcribed manuscripts. For example: Hispanic Seminary has published online 55 texts. SPANISH MEDICAL TEXTS      [55 texts / 2,642,403 tokens]      PREPARED BY:           FRANCISCO GAGO JOVER           Mª TERESA HERRERA           Mª ESTELA GONZÁLEZ DE FAUVE All these texts are available but you cannot access them if you don ’ t know where to find them in the internet.
  10. The significance and originality of the HIT project is to exemplify that knowledge should be presented beyond two-dimensional spaces such as paper (encyclopedia) or as keyword search websites (Wikipedia, Google, Yahoo, etc.). Knowledge has to be obtainable in infinitely explorative and proliferating ways in the mashup, reaching its maximum complexity. The true potential of this project is almost limitless because its integrated knowledge system can be used for research or self-learning in any field. Instead of a mere input-output model, any search and reading will lead to contextualized and integrated learning.
  11. (DO NOT READ) The significance of this project The significance and originality of the HIT project is to exemplify that knowledge should be presented beyond two-dimensional spaces such as paper (encyclopedia) or as keyword search websites (Wikipedia, Google, Yahoo, etc.). Knowledge has to be obtainable in infinitely explorative and proliferating ways in the mashup, reaching its maximum complexity. The true potential of this project is almost limitless because its integrated knowledge system can be used for research or self-learning in any field. Instead of a mere input-output model, any search and reading will lead to contextualized and integrated learning. HIT can address the two major problems of contemporary digital humanities: overload of useless information and lack of textual context. The world of electronic communication is a world of textual overabundance in which the written texts that are offered go far beyond the reader ’ s ability to take advantage of them. Often, researchers have denounced the uselessness of the overload of information on the web. Thus, ideally, one should know where, why, and how she or he should gather the most accurate and reliable texts on the internet. This is precisely what HIT will do by organizing and synthesizing data and texts—all of them available in one single search. In the HIT project, I will research and select information available on the internet and filter out only needed and trustful information. Furthermore, HIT will analyze not only external repositories but also internal repositories, such as Purdue Library ’ s database and catalogs. The other problem facing current digital humanities is that texts, content or information are usually provided without taking into account its context. Reading in front of the computer screen is generally a discontinuous reading process that seeks, using keywords or thematic headings, the fragment that the reader wishes to find: an article in an electronic periodical, a passage in a book, or some information on a website. This is done without necessarily knowing the identity or coherence of the entire text from which the fragment was extracted. In a certain sense, one might say that in the digital world all textual entities are like databases that offer fragments, the reading of which in no way implies a perception of the work or the body of works from which they came. This explains the confusion of the contemporary reader. The HIT platform, for example, when we just search for a keyword, will also make available at the same time the original source from which the fragment was extracted, including, for example, a location map, images, notes, and references. The HIT project will contribute to innovation in the humanities in three key ways: (a) in user interface, by producing a means by which users are able to interact with this integrated knowledge as one can see below; b) in allowing the integration of Purdue library databases (ComDisDome, Historical Abstracts with Full Text, ITER, JSTOR, MUSE, Patrologia Latina Database, etc.); (c) in the integration of valuable humanities contents which could be located on various external sources or repositories to produce original and valuable knowledge. As a consequence, with the integrated knowledge management system, the text itself is presented with its context, which means the humanistic knowledge that integrates the learning environment. Reading will consist of unfolding multiple and unique textual units onto the screen, units that will be created in accordance with each reader ’ s focus or interest.
  12. HIT can address the two major problems of contemporary digital humanities: overload of useless information and lack of textual context. The world of electronic communication is a world of textual overabundance in which the written texts that are offered go far beyond the reader ’ s ability to take advantage of them. Often, researchers have denounced the uselessness of the overload of information on the web. Thus, ideally, one should know where, why, and how she or he should gather the most accurate and reliable texts on the internet. This is precisely what HIT will do by organizing and synthesizing data and texts—all of them available in one single search. What we are going to demonstrate today is only a PROOF OF CONCEPT. However, our initial project was base on these concepts. We researched and selected information available on the internet and filter out only needed and trustful information. We did a survey with different professors from different field in order to find out about their most reliable websites. HIT analyze not only external repositories but also internal repositories, such as Purdue Library ’ s database and catalogs. The other problem facing current digital humanities is that texts, content or information are usually provided without taking into account its context. Reading in front of the computer screen is generally a discontinuous reading process that seeks, using keywords or thematic headings, the fragment that the reader wishes to find: an article in an electronic periodical, a passage in a book, or some information on a website. This is done without necessarily knowing the identity or coherence of the entire text from which the fragment was extracted. In a certain sense, one might say that in the digital world all textual entities are like databases that offer fragments, the reading of which in no way implies a perception of the work or the body of works from which they came. This explains the confusion of the contemporary reader. The HIT platform, for example, when we just search for a keyword, will also make available at the same time the original source from which the fragment was extracted, including, for example, a location map, images, notes, and references. This is my idea of integrating all these information and make it flexible to all the people.
  13. Constantino Malag ón Professor of Computer Engineering Universidad Antonio de Nebrija, Spain Justo Hidalgo Vice-Presindent, Product Management and Consulting at Denodo Technologies Co-Founder of the 24symbols Company Both have to work hard to make my request.
  14. The function and development of the HIT web portal The HIT project will be constructed according to the architecture image shown below. I will explain its four layers starting from the very bottom of the image. Acquisition Layer : The different data sources that provide early modern age documents in digitalized form, their transcriptions, plus any other useful internal or web-based external repositories, will be accessed by the Data Acquisition Layer, as shown at the bottom of the figure. One of the critical assets of this component is that the web data extraction module is capable of extracting web data in a structured manner, therefore converting the web in a “ virtual database. ”   Processing Layer : This platform provides the opportunity of combining, mashing up and transforming the data from heterogeneous databases and sources in an easier and more powerful way. Specifically, the architecture proposed will be able to perform syntactic (i.e. transformations and combinations based on the structure of the content extracted, such as unifying the names of authors based on whether we want a structure of the kind {surname, first_name} or {first_name surname}) and semantic (i.e. transformations and combinations based on the meaning of the content extracted) tasks. From this layer on, Justo Hidalgo, from Denodo Technology, will develop the software. The HIT interface will be built by following the most relevant industry standards, such as JDBC, ODBC, SOAP/WSDL and REST, for both data access and publishing.   Categorization Layer : The categorization module, on top of the data combination layer, sorts out information previously stored or delivered in real time, and it assigns each piece of information to a set of categories.   Final View : Finally, a basic presentation layer is built in order to allow researchers to visualize the overall mashup and categorization results. The platform is built as a series of components, by following the best practices in software engineering, which simplify the development and integration of all the resources. This is shown in the following image.
  15. In order to do that we need: To extend the list of repositories. By repositories we mean two kind of data sources: - Structured: for example, any database ,which has tables, fields, records and values. This includes any sources from Purdue Library. These are called core sources. - Unstructured or semistructured: these include web pages or plain text files. For example, wikipedia. This are called context sources because they provide contextual information based on the author, document or whatever we choose. We have the survey of frequent use databases by different faculty members at CLA. Our first step will be to develop the first rating categories—structured and unstructured—from the list (see attached file). To extend the list of functionalities: To develop the application for mobile devices: Android and Apple iOS To adapt the web design to the Purdue standards. - The results screen should be more interactive (like igoogle, you should be able to move the different panels, and show or hide some of them). In order to do that, we have to develop the system by using the very latest web technologies, like html5. HIT mashup will be stored at Purdue University with a domain name like http://cla.purdue.edu/hit To secure the system. We need users to authenticate with their own Purdue account (user and password), using secure protocols like https. To develop a caching results module - this module will make our system faster.
  16. The HIT system is jointly developing with the collaboration of some of the members of the MMEDIS and the new HIT team members.