21. What is Beautiful Soup
not Beautiful Soap
• python module
• html/xml parser
• html/xml
•
22. Beautiful Soup
<html>
<head>
<title>
page title
</title>
</head>
<body>
<p id=quot;firstparaquot; align=quot;centerquot;>
first paragraph
<b>
one
</b>
</p>
<p id=quot;secondparaquot; align=quot;blahquot;>
second paragraph
<b>
two
</b>
</p>
</body>
</html>
23. check urllib/urllib2 to see
how to open a url in python
from BeautifulSoup import BeautifulSoup
soup=BeautifulSoup(page)
soup.html.head
#<head><title>page title</title></head>
soup.head
#<head><title>page title</title></head>
soup.body.p
#<p id=quot;firstparaquot; align=quot;centerquot;>This is
paragraph<b>one</b></p>
24. (Cont.)
• parent (go to parent node)
soup.title.parent == soup.head
• next (go to next node)
soup.title.next == ‘page title’
soup.title.next.next == soup.body
• previous (go to previous node)
soup.title.previous == soup.head
sopu.body.p.previous == ‘first paragraph’