3. Genesis
• A story of the Internet by
A story of the Internet, by
• Solving the most important problems
l i fl db
• Greatly influenced by one man…
4. Tim Berners‐Lee
Tim Berners Lee
“the World Wide Web is Berners-Lee's
alone. He designed it. He loosed it on the
g
world. And he more than anyone else has
fought to keep it open, nonproprietary and
free.”
Time Magazine, 1999
Time Magazine 1999
5. The Problem
The Problem
• Where can I find the information?
Where can I find the information?
“Our ineptitude in getting at the record is
largely caused by the artificiality of the
systems of indexing ”
indexing.
The Atlantic Monthly, 1945
6. Archie, 1990
Archie, 1990
• Indexed file names and
Indexed file names and
• Returned results based on pattern matching
8. Web1.0
• Means HTML
Means HTML
• Is born in 1991, with the help of
• Tim Berners‐Lee (TBL), who also founded
i ( ) h l f d d
• WWW Consortium (W3C) at MIT, and also
• Created WWW Virtual Library – the 1st catalog
9. Yahoo Directory, 1994
Yahoo Directory, 1994
• Vertical = categories is like
Vertical = categories... is like
• “Show me all the stuff and I’ll handle it”
• Manually indexed stuff, which was
ll i d d ff hi h
• OK for starters, but…
• Websites quickly grew in number and
• Y! started charging money for one listing
Y! started charging money for one listing
• Increasingly more money...
10.
11. ,1994
• First SE to fully search text
First SE to fully search text
• Bought by AOL, then
• S ld
Sold to Excite, which
i hi h
• Excite went bankrupt and
• WebCrawler ends up bought by InfoSpace
12. Other Search Engines
Other “Search Engines”
• 1994, reaches 60mil pages in 96
1994 reaches 60mil pages in ‘96
• 1995, bought by Overture, bought by Y!
• 1996, meta search, bought by Lycos
996 h b h b
• 1997, bought by IAC/InterActiveCorp
• 1999, bought by Overture, meaning Y!
14. , 1998
, 1998
• Open Directory Project
Open Directory Project
• Each listing is checked and certified by a
volunteer
• The main source for Google Directory
16. Web1.0 Problems
• SE couldn’t understand text so
SE couldn t understand text, so
• They said “why don’t you implement some
meta tags (description & keywords) so we can
meta tags (description & keywords) so we can
get a glimpse of what you’re saying”
• Th
The relevancy of a page with respect to a
l f ih
keyword was determined by a few factors, so
• It was very easy to abuse and spam, therefore
p q
• Search Results had poor qualityy
18. Web2.0
• Is coined by Tim O’Reilly yet
Is coined by... Tim O Reilly, yet
• TBL later said that “web2.0” is a stupid,
meaningless term and that he thought of it
meaningless term and that he thought of it
first in ’96 anyway
19. Web2.0 means
Web2.0 means
• which grew apart because of
which grew apart because of
• PageRank (1998) invented by
• Larry & Sergei who adapted the algo from
&S i h d d h l f
• An MIT professor who had developed
• A nasty mathematical formula for positioning
y p
keywords in a 3d space model based on the
relevancy that one kw holds … whatever
20. PageRank actually means
PageRank actually means
• That a link is a vote and
That a link is a vote and
• Not all links are created equal, so
• It matters who links to you
h li k
• Just like in our real life society
21. • Read the content of pages really well just that
Read the content of pages really well, just that
• Pages were crappy:
–NNon‐standard coding
t d d di
– Ugly tech (like applets)
– Senseless IA
• So Google said: “don’t do evil and try to nicely
format the info, according to W3C standards”
(remember TBL)
23. SEO
• Is a multitude of practices aimed at facilitating
Is a multitude of practices aimed at facilitating
the indexing of pages by search engines
• Evolves as the ranking algorithm changes and
Evolves as the ranking algorithm changes, and
• Of course, the algorithm is kept secret.
25. SEO actually means
SEO actually means
• An on‐going battle between bots & SEO guys
An on going battle between bots & SEO guys
• Now 100+ factors influence ranking
• And I’d like to take the time to talk about each
d ’d lik k h i lk b h
one of them in the following…
27. My SEO Cheat Sheet
My SEO Cheat Sheet
• Consider:
1. Page Titles
2. URLs (mod_rewrite)
3. Anchor Text
4. Website Architecture (IA)
5. Link Title & Alt Images
6. Relevant content (text)
7.
7 Sitemap xml
Sitemap.xml
8. Hosting
9. Freshness
28. Resources
Matt Cutts Blog
Mihai’s SEO Cheat Sheet :D
30. Web2.0 Problems
• SE still don’t understand what the $#%@
SE still don t understand what the $#%@
you’re talking about
• Crawling a website’s interface to extract info is
Crawling a website s interface to extract info is
almost insane
32. Web3.0
Web3.0
• Means semantic web
semantic web
• Attention migrates from syntax/formatting to
semantics and
semantics and
• Meta Data (data about the data) becomes...
34. Resource Description Framework
Resource Description Framework
• A kind of XML
A kind of XML
• RDF = Subject + Predicate + Object
• S + P + O creates a Triple which
O i l hi h
• Can describe almost anything in the universe
• Triples are connectable (eg: FOAF)
• RDFa = XHTML + RDF (W3C compliant)
RDFa XHTML + RDF (W3C compliant)
37. SPARQL
• SPARQL Protocol and RDF Query Language
SPARQL Protocol and RDF Query Language
• Standardized on 15th Jan 08 (1 month ago) and
• Endorsed by?... TBL
d db ?
quot;Trying to use the Semantic Web without
SPARQL is like trying to use a relational
Q y g
database without SQL“
TBL
38. Potential
• With SPARQL you skip the presentation layer
With SPARQL you skip the presentation layer
• You can query ad‐hoc any API, so
• You don’t need to crawl in advance, therefore
d ’ d li d h f
• Information will be as fresh as it gets
39. And possibilities
And possibilities
• Query: “I can has pizza?”
Query: I can has pizza?
• Returns:
–Af i d f
A friend of yours (XFN ‐ F b k)
(XFN Facebook)
– has a colleague (FOAF ‐ LinkedIN) who
– said that they make good pizza (hReview ‐ yelp) at
( )
– a restaurant nearby (geo – Gmaps)
– Tip: U2 in concert today (hCalendar ‐ upcoming)
40. Perhaps now we can see
Perhaps now we can see
• Why Social Networking Communities are
Why Social Networking Communities are
worth so much, even though most of them
don’t have a revenue model
– Facebook
– LinkedIN
– Meebo
– Beebo
– Pipu...
• They/We are the databases of the future
41.
42. Thanks!
“Most of the right choices in SEO come from
asking: What’s the best thing for the user?”
g g
Matt Cutts
Mihai Gheza
Mih i Gh
Creative Commons Attribution‐Noncommercial‐Share Alike 3.0 Unported License.