Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Python Web Interaction

3 686 vues

Publié le

Dev8D presentation showing my top 10 Python libraries for interacting with the web.

Publié dans : Technologie
  • Soyez le premier à commenter

Python Web Interaction

  1. 1. Rob
Sanderson
 
‐
rsanderson@lanl.gov
 
‐
azaroth42@gmail.com
 
‐
@azaroth42
 Digital
Library
Prototyping
Team
 Los
Alamos
NaBonal
Laboratory,
 USA
 http://www.flickr.com/photos/42311564@N00/2355590274/ Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  2. 2. Overview Top 10 Libraries for Web Interaction •  urllib •  urllib2 •  urlparse •  httplib •  lxml •  rdflib •  json/simplejson •  mod_python, mod_wsgi •  bpython Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  3. 3. urllib >>> import urllib >>> urllib.quote('~azaroth/s?q=http://foo.com/') '%7Eazaroth/s%3Fq%3Dhttp%3A//foo.com/' >>> urllib.unquote('%7Eazaroth/s%3Fq%3Dhttp%3A//foo.com/') '~azaroth/s?q=http://foo.com/' >>> fh = urllib.urlopen('http://www.google.com/') >>> html = fh.read() >>> fh.close() >>> fh.getcode() 200 >>> fh.headers.dict['content-type'] 'text/html; charset=ISO-8859-1' Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  4. 4. urllib2 >>> import urllib2 >>> ph = urllib2.ProxyHandler( {'http' : 'http://proxyout.lanl.gov:8080/'}) >>> opener = urllib2.build_opener(ph) >>> urllib2.install_opener(opener) >>> # From now on, all requests will go through proxy >>> r = urllib2.Request('http://www.google.com/') >>> r.add_header('Referrer', 'http://www.somewhere.net') >>> fh = urllib2.urlopen(r) >>> html = fh.read() >>> fh.close() >>> # fh is the same as urllib's for headers/status Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  5. 5. urlparse >>> import urlparse >>> pr = urlparse.urlparse( 'https://www.google.com/search?q=foo&bar=bz#frag') >>> pr.scheme 'https' >>> pr.hostname 'www.google.com' >>> pr.path '/search' >>> pr.query 'q=foo&bar=bz' >>> pr.fragment 'frag' Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  6. 6. httplib >>> import httplib >>> cxn = httplib.HTTPConnection('www.google.com') >>> hdrs = {'Accept' : 'application/rdf+xml'} >>> path = "/search?q=some+search+query" >>> cxn.request("HEAD", path, headers=hdrs) >>> resp = cxn.getresponse() >>> resp.status 200 >>> resp_hdrs = dict(resp.getheaders()) >>> resp_hdrs['content-type'] # :( 'text/html; charset=ISO-8859-1' >>> data = resp.read() >>> cxn.close() Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  7. 7. lxml $ easy_install lxml >>> from lxml import etree >>> et = etree.XML('<a b="B"> A <c>C</c> </a>') >>> et.text ' A ' >>> et.attrib['b'] 'B' >>> for elem in et.iterchildren(): ... print elem <Element c at 16d1ed0> >>> html = etree.parse(StringIO.StringIO("<html><p>hi"), parser=etree.HTMLParser()) >>> html.xpath('/html/body/p') [<Element p at 16e00f0>] Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  8. 8. rdflib $ easy_install rdflib >>> import rdflib as rdf >>> inp = rdf.URLInputSource( 'http://xmlns.com/foaf/spec/20100101.rdf') >>> inp2 = rdf.StringInputSource("<a> <b> <c> .") >>> graph = rdf.ConjunctiveGraph() >>> graph.parse(inp) >>> sparql = "SELECT ?l WHERE {?w rdfs:label ?l . }" >>> res = graph.query(sparql, initNs={'rdfs':rdf.RDFS.RDFSNS})) >>> res.selected[0] rdf.Literal(u'Given name') >>> nt = graph.serialize(format='nt') Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  9. 9. json / simplejson >>> try: import simplejson as json ... except ImportError: import json >>> data = {'o' : (True, None, 1.0), "ints" : [1,2,3]} >>> json.dumps(data) '{"o": [true, null, 1.0], "ints": [1, 2, 3]}' >>> json.dumps(data, separators=(',', ':')) # compact '{"o":[true,null,1.0],"ints":[1,2,3]}' >>> json.loads('[1,2,"foo",null]') [1, 2, u'foo', None] Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  10. 10. mod_python, mod_wsgi import cgitb from mod_python import apache from mod_python.util import FieldStorage def handler(req): try: form = FieldStorage(req) # dict-like object for query path = req.uri req.status = 200 req.content_type = "text/plain" req.send_http_header() req.write(path) except: req.content_type = "text/html" cgitb.Hook(file=req).handle() return apache.OK Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  11. 11. bpython $ easy_install bpython $ bpython Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London

×