Semantic Web

Information Extraction from the WWW using Machine Learning Techniques Lee McCluskey, Dept of Informatics email: lee@hud.ac.uk

Motivation ,[object Object],[object Object],[object Object],[object Object],[object Object]

Overview of Talk ,[object Object],[object Object],[object Object]

Information Extraction from the WWW – WHY? ,[object Object],[object Object],[object Object],[object Object],[object Object]

Information Extraction from the WWW – WHY? ,[object Object],[object Object],[object Object]

Information Extraction from The Web ,[object Object],[object Object],[object Object],“ Natural Language Understanding” - take raw (English) text from a web page and turn into some logic representing its meaning. EASIER HARDER

Information Extraction from The Web WRAPPERS WEB PAGES STRUCTURED DATA BA red 555 sue MSc red 123 dave PhD grey 345 bill BSc blue 664 tom

Information Extraction ,[object Object],[object Object],[object Object],[object Object]

Example of Automated Extraction <residential> <house> < location> <city> Hebden Bridge </city> <county> West Yorkshire </county> <country> UK </country> </location> <agent-phone> 01422 843222 </agent-phone> <listed-price> £350,000 </listed-price> <comments> Bijou residence on the edge of this popular little town... </comments> </house> ... </residential> <h1> Residential Housing </h1> <ul> House For Sale <li> location: Hebden Bridge <li> agent-phone: 01422 843222 <li> listed-price: £350,000 <li> comments: Bijou residence on the edge of this popular little town... </ul> <hr> <ul> House For Sale ... </ul> ... Source: HTML ======> Destination: XML NB: XML + schema + recognised names wrapper

Information Extraction ,[object Object],[object Object],[object Object],[object Object],[object Object]

Using ‘Rule Induction’ to learn wrappers for html pages ,[object Object],[object Object],[object Object],[object Object]

Rule Induction is an area of Machine Learning ,[object Object],Similarity-Based Learning Explanation-Based Learning Neural Networks Learning from Examples Learning by Observation Rule Induction Symbolic Learning Sub-symbolic learning Genetic Approaches

Rule Induction from Examples ,[object Object],[object Object],[object Object],[object Object],[object Object]

Actual IE Example: University of Southern California’s Info Sciences Institute (ISI)’s “Information agent” ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Heracles’ Stalker inductive algorithm ,[object Object],[object Object],[object Object],[object Object]

Example of training examples ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Problems with Wrapper Induction ,[object Object],[object Object],[object Object],[object Object]

Summary ,[object Object],[object Object],[object Object],[object Object]

Extra Reading ,[object Object],[object Object],[object Object],[object Object]

Related Legal/ Ethical/ Professional/ Methodological Issues ,[object Object],[object Object],[object Object],[object Object]

Semantic Web

Recommended

Recommended

More Related Content

Similar to Semantic Web

Similar to Semantic Web (20)

More from butest

More from butest (20)

Semantic Web

Editor's Notes