Powerful Google developer tools for immediate impact! (2023-24 C)
Information Extraction from the Web - In today's web
1. In today's web
Information Extraction
from the Web
Benjamin Habegger
University of Lyon, CNRS, INSA-Lyon, LIRIS, UMR5205
Seminary on Information Extraction from the Web
ENSIAS, Rabat, Morocco - June 19, 2013
3. Where is the web today ?
Web of humans
● Interlinked documents
● Social Web
● Web 2.0
● Crowd-sourcing
Web of machines
● REST / API
● Service Interaction
● Open Data
● Semantic Web
4. Somehow we're creating 2 webs
Web of DataWeb of humans
HTML
Javascript
CSS
RDF
REST
SPARQL
6. Open data still has some way to go
Data thrown on the web in its original format
● Not many standardized formats
● Not many standardized semantics
● Can be
– An Excel, CSV file
– A REST service
7. Still the Linked Open Data and
Semantic Web are emerging
● Vocabularies
– Foaf
– Dublin Core
– …
● Datasets
– DBPedia
– ...
8. But still, can't we dream a little ?
Having (a little) smarter machines...
Shared web
Learning capabilities
9. Making our web robots smarter
could even help improve our web...
What does the following query give you today ?
“lyon informatique emploi”
12. There's still a long way to go...
but information extraction from the web
is a little step in making machines smarter
13. And there are many people
interested out there...
Freelancer.com search for web scrapping
14. So where does information
extraction from the web fit in ?
Open DataOpen Data
Linked DataLinked Data
Semantic WebSemantic Web
Information ExtractionInformation Extraction
Machine LearningMachine Learning
Pattern MiningPattern Mining
Data IntegrationData Integration
Standardized VocabulariesStandardized Vocabularies
Machine LearningMachine Learning
Web ScrappingWeb Scrapping
15. And what is it about ?
...
Data for humans
Data for machines
16. How do we do that ?
We'll see that after the break :)
http://www.slideshare.net/BenjaminHabegger/2013-06ensiasrabatiealg