Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

How to make the fastest Router in Python

1 096 vues

Publié le

Router is one of the most important feature or component in Web application framework,
ant it is also one of the performance bottlenecks of framework.
In this session, I'll show you how to make router much faster than ever.

Publié dans : Technologie
  • Did You Get Dumped? Do you still want her back? If you act now, I can help you.  http://ow.ly/mOLD301xGxr
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Soyez le premier à aimer ceci

How to make the fastest Router in Python

  1. 1. How To Make The Fastest Router In Python Makoto Kuwata https://github.com/kwatch/ PloneConference 2018 Tokyo
  2. 2. Abstract ‣ TL;DR ‣ You can make router much faster (max: x10) ‣ Requirements ‣ Python3 ‣ Experience of Web Application Framework (Django, Flask, Plone, etc) ‣ Sample Code ‣ https://github.com/kwatch/router-sample/ ‣ https://github.com/kwatch/keight/tree/python/ (Framework)
  3. 3. Table of Contents ‣ What is Router? ‣ Linear Search ‣ Naive / Prefix String / Fixed Path Dictionary ‣ Regular Expression ‣ Naive / Smart / Optimized ‣ State Machine ‣ Conclusion
  4. 4. What is Router?
  5. 5. What is Router? ‣ Router is a component of web app framework (WAF). ‣ Router determines request handler according to request method and request path.
 Handler A App Server Handler B Handler C Client : HTTP Request : HTTP Response WSGI App Server Side Router determines "which handler?"
  6. 6. Request Handler Example class BooksAPI(RequestHandler): with on.path('/api/books/'): @on('GET') def do_index(self): return {"action": "index"} @on('POST') def do_create(self): return {"action": "create"} .... ← handler class ← URL Path ← request method ← handler func ← request method ← handler func
  7. 7. Request Handler Example .... with on.path('/api/books/{id:int}'): @on('GET') def do_show(self, id): return {"action": "show", "id": id} @on('PUT') def do_update(self, id): return {"action": "update", "id": id} @on('DELETE') def do_delete(self, id): return {"action": "delete", "id": id} ← URL Path ← request method ← handler func ← request method ← handler func ← request method ← handler func
  8. 8. Mapping Example: Request to Handler mapping_table = [ ## method path class func ("GET" , r"/api/books/" , BooksAPI , do_index), ("POST" , r"/api/books/" , BooksAPI , do_create),
 ("GET" , r"/api/books/(d+)" , BooksAPI , do_show), ("PUT" , r"/api/books/(d+)" , BooksAPI , do_update),
 ("DELETE", r"/api/books/(d+)" , BooksAPI , do_delete),
 ("GET" , r"/api/orders/" , OrdersAPI, do_index),
 ("POST" , r"/api/orders/" , OrdersAPI, do_create),
 ("GET" , r"/api/orders/(d+)", OrdersAPI, do_show), ("PUT" , r"/api/orders/(d+)", OrdersAPI, do_update),
 ("DELETE", r"/api/orders/(d+)", OrdersAPI, do_delete),
 .... ]
  9. 9. Mapping Example: Request to Handler mapping_list = [ ## path class {method: func} (r"/api/books/" , BooksAPI , {"GET": do_index, "POST": do_create}), (r"/api/books/(d+)" , BooksAPI , {"GET": do_show, "PUT": do_update, "DELETE": do_delete}),
 (r"/api/orders/" , OrdersAPI, {"GET": do_index, "POST": do_create}), (r"/api/orders/(d+)", OrdersAPI, {"GET": do_show, "PUT": do_update, "DELETE": do_delete}),
 .... ] Same information in different format
  10. 10. Router Example >>> router = Router(mapping_list) >>> router.lookup("GET", "/api/books/") (BooksAPI, do_index, []) >>> router.lookup("GET", "/api/books/123") (BooksAPI, do_show, [123])
  11. 11. Router Example ### 404 Not Found >>> router.lookup("GET", "/api/books/123/comments") (None, None, None) ### 405 Method Not Allowed >>> router.lookup("POST", "/api/books/123") (BooksAPI, None, [123])
  12. 12. Linear Search (Naive)
  13. 13. Linear Search mapping = [ (r"/api/books/" , BooksAPI , {"GET": do_index, "POST": do_create}), (r"/api/books/(d+)" , BooksAPI , {"GET": do_show, "PUT": do_update, "DELETE": do_delete}),
 (r"/api/orders/" , OrdersAPI, {"GET": do_index, "POST": do_create}), (r"/api/orders/(d+)", OrdersAPI, {"GET": do_show, "PUT": do_update, "DELETE": do_delete}),
 .... ]
  14. 14. Router Class class LinearNaiveRouter(Router): def __init__(self, mapping): self._mapping_list = [ (compile_path(path), klass, funcs) for path, klass, funcs in mapping ] def lookup(req_meth, req_path): for rexp, klass, funcs in self._mapping_list: m = rexp.match(req_path) if m: params = [ int(v) for v in m.groups() ] func = funcs.get(req_meth) return klass, func, params return None, None, None
  15. 15. Benchmark (Data) mapping_list = [ (r'/api/aaa' , DummyAPI, {"GET": ...}), (r'/api/aaa/{id:int}', DummyAPI, {"GET": ...}), (r'/api/bbb' , DummyAPI, {"GET": ...}), (r'/api/bbb/{id:int}', DummyAPI, {"GET": ...}), .... (r'/api/yyy' , DummyAPI, {"GET": ...}), (r'/api/yyy/{id:int}', DummyAPI, {"GET": ...}), (r'/api/zzz' , DummyAPI, {"GET": ...}), (r'/api/zzz/{id:int}', DummyAPI, {"GET": ...}), ] ### Benchmark environment: ### AWS EC2 t3.nano, Ubuntu 18.04, Python 3.6.6 See sample code for details
  16. 16. Benchmark 0 10 20 30 40 50 Linear Naive Seconds (1M Requests) /api/aaa /api/aaa/{id} /api/zzz /api/zzz/{id} sec SlowerFaster very fast on top of list (/api/aaa, /api/aaa/{id}) very slow on bottom of list (/api/zzz, /api/zzz/{id})
  17. 17. Pros. Cons. Pros & Cons ✗ Very slow when many mapping entries exist. ✓ Easy to understand and implement
  18. 18. Linear Search (Prefix String)
  19. 19. Prefix String mapping_list = [ ("/books" , r"/books" , BooksAPI , {"GET": ...}),
 ("/books/" , r"/books/(d+)" , BooksAPI , {"GET": ...}),
 ("/orders" , r"/orders" , OrdersAPI, {"GET": ...}),
 ("/orders/", r"/orders/(d+)", OrdersAPI, {"GET": ...}),
 ] for prefix, rexp, klass, funcs in mapping: if not "/api/orders/123".startswith(prefix): continue m = rexp.match("/api/orders/123") if m: ... Much faster than rexp.match() (replace expensive operation with cheap operation) Prefix strings
  20. 20. Router Class def prefix_str(s): return s.split('{', 1)[0] class PrefixLinearRouter(Router): def __init__(self, mapping): for path, klass, funcs in mapping: prefix = prefix_str(path) rexp = compile_path(path) t = (prefix, rexp, klass, funcs) self._mapping_list.append(t) ...
  21. 21. Router Class .... def lookup(req_meth, req_path): for prefix, rexp, klass, funcs in self._mapping: if not req_path.startswith(prefix): continue m = rexp.match(req_path) if m: params = [ int(v) for v in m.groups() ] func = funcs.get(req_meth) return klass, func, params return None, None, None Much faster than rexp.match()
  22. 22. Benchmark 0 10 20 30 40 50 Linear Naive Prefix Str Seconds (1M Requests) /api/aaa /api/aaa/{id} /api/zzz /api/zzz/{id} sec SlowerFaster about twice as fast as naive implementation
  23. 23. Pros. Cons. Pros & Cons ✗ Still slow when many mapping entries exist. ✓ Makes linear search faster. ✓ Easy to understand and implement.
  24. 24. Linear Search (Fixed Path Dictionary)
  25. 25. Fixed Path Dictionary ## variable path (contains one or more path parameters) mapping_list = [ ("/books" , r"/books" , BooksAPI , {"GET": ...}),
 ("/books/" , r"/books/(d+)" , BooksAPI , {"GET": ...}),
 ("/orders" , r"/orders" , OrdersAPI, {"GET": ...}),
 ("/orders/", r"/orders/(d+)", OrdersAPI, {"GET": ...}),
 ] ## fixed path (contains no path parameters) mapping_dict = { r"/books" : (BooksAPI , {"GET": ...}, []), r"/orders": (OrdersAPI, {"GET": ...}, []), } Use fixed path as key of dict Move fixed path to dict
  26. 26. Router Class class FixedLinearRouter(object): def __init__(self, mapping): self._mapping_dict = {} self._mapping_list = [] for path, klass, funcs in mapping: if '{' not in path: self._mapping_dict[path] = (klass, funcs, [])
 else: prefix = prefix_str(path) rexp = compile_path(path) t = (prefix, rexp, klass, funcs) self._mapping_list.append(t) ....
  27. 27. Router Class .... def lookup(req_meth, req_path): t = self._mapping_dict.get(req_path) if t: return t for prefix, rexp, klass, funcs in self._mapping_list:
 if not req_path.startswith(prefix) continue m = rexp.match(req_path) if m: params = [ int(v) for v in m.groups() ] func = funcs.get(req_meth) return klass, func, params return None, None, None Much faster than for-loop Number of entries are reduced
  28. 28. Benchmark 0 10 20 30 40 50 Linear Naive Prefix Str Fixed Path Seconds (1M Requests) /api/aaa /api/aaa/{id} /api/zzz /api/zzz/{id} sec SlowerFaster super fast on fixed path! three times faster than naive implementation
  29. 29. Pros. Cons. Pros & Cons ✗ Still slow when many mapping entries exist. ✓ Makes fixed path search super faster. ✓ Makes variable path search faster,
 because number of entries are reduced. ✓ Easy to understand and implement.
  30. 30. Notice ‣ Don't use r"/api/v{version:int}". ‣ because all API paths are regarded as variable path. ‣ Instead, use r"/api/v1", r"/api/v2", ... ‣ in order to increase number of fixed path.
  31. 31. Regular Expression (Naive)
  32. 32. Concatenate Regular Expressions mapping_list = { (r"/api/books/(d+)" , BooksAPI , {"GET": ...}),
 (r"/api/orders/(d+)", OrdersAPI, {"GET": ...}),
 (r"/api/users/(d+)" , UsersAPI , {"GET": ...}),
 ] arr = [ r"(?P<_0>^/api/books/(d+)$)", r"(?P<_1>^/api/orders/(d+)$)", r"(?P<_2>^/api/users/(d+)$)", ] all_rexp = re.compile("|".join(arr)) Named groups
  33. 33. Matching m = all_rexp.match("/api/users/123") d = m.groupdict() #=> {"_0": None, # "_1": None, # "_2": "/api/users/123"} for k, v in d.items(): if v: i = int(v[1:]) # ex: "_2" -> 2 break _, klass, funcs, pos, nparams = mapping_list[i] arr = m.groups() #=> (None, None, None, None, # "/api/users/123", "123") params = arr[5:6] #=> {"123"}
  34. 34. Router Class class NaiveRegexpRouter(Router): def __init__(self, mapping): self._mapping_dict = {} self._mapping_list = [] arr = []; i = 0; pos = 0 for path, klass, funcs in mapping: if '{' not in path: self._mapping_dict[path] = (klass, funcs, []) else: rexp = compile_path(path); pat = rexp.pattern arr.append("(?P<_%s>%s)" % (i, pat)) t = (klass, funcs, pos, path.count('{')) self._mapping_list.append(t) i += 1; pos += 1 + path.count('{') self._all_rexp = re.compile("|".join(arr))
  35. 35. Router Class .... def lookup(req_meth, req_path): t = self._mapping_dict.get(req_path) if t: return t m = self._all_rexp.match(req_path) if m: for k, v in m.groupdict().items(): if v: i = int(v[1:]) break klass, funcs, pos, nparams = self._mapping_list[i]
 params = m.groups()[pos:pos+nparams] func = funcs.get(req_meth) return klass, func, params return None, None, None find index in list find param values
  36. 36. Benchmark 0 10 20 30 40 50 Linear Regexp Naive Prefix Str Fixed Path Naive Seconds (1M Requests) /api/aaa /api/aaa/{id} /api/zzz /api/zzz/{id} sec SlowerFaster slower than linear search :(
  37. 37. Pros. Cons. Pros & Cons ✗ Slower than linear search ✓ Nothing :(
  38. 38. Notice $ python3 --version 3.4.5 $ python3 >>> import re >>> arr = ['^/(d+)$'] * 101 >>> re.compile("|".join(arr)) File "/opt/vs/python/3.4.5/lib/python3.4/sre_compile.py", line 579, in compile "sorry, but this version only supports 100 named groups" AssertionError: sorry, but this version only supports 100 named groups Python <= 3.4 limits number of groups in a regular expression, and no work around :(
  39. 39. Regular Expression (Smart)
  40. 40. Improved Regular Expression mapping_list = { (r"/api/books/(d+)" , BooksAPI , {"GET": ...}), (r"/api/orders/(d+)" , OrdersAPI , {"GET": ...}), (r"/api/users/(d+)" , UsersAPI , {"GET": ...}), ] arr = [ r"^/api/books/(?:d+)($)", r"^/api/orders/(?:d+)($)", r"^/api/users/(?:d+)($)", ] all_rexp = re.compile("|".join(arr)) m = all_rexp.match("/api/users/123") arr = m.groups() #=> (None, None, "") i = arr.index("") #=> 2 t = mapping_list[i] #=> (r"/api/users/(d+)", # UsersAPI, {"GET": ...}) No more named groups Tuple is much light- weight than dict index() is faster than for-loop
  41. 41. Router Class class SmartRegexpRouter(Router): def __init__(self, mapping): self._mapping_dict = {} self._mapping_list = [] arr = [] for path, klass, funcs in mapping: if '{' not in path: self._mapping_dict[path] = (klass, funcs, [])
 else: rexp = compile_path(path); pat = rexp.pattern
 arr.append(pat.replace("(", "(?:") .replace("$", "($)")) t = (rexp, klass, funcs) self._mapping_list.append(t)
 self._all_rexp = re.compile("|".join(arr))
  42. 42. Router Class ... def lookup(req_meth, req_path): t = self._mapping_dict.get(req_path) if t: return t m = self._all_rexp.match(req_path) if m: i = m.groups().index("") rexp, klass, funcs = self._mapping_list[i] m2 = rexp.match(req_path) params = [ int(v) for v in m2.groups() ] func = funcs.get(req_meth) return klass, func, params return None, None, None Matching to find index in list Matching to get param values
  43. 43. Benchmark 0 10 20 30 40 50 Linear Regexp Naive Prefix Str Fixed Path Naive Smart Seconds (1M Requests) /api/aaa /api/aaa/{id} /api/zzz /api/zzz/{id} sec SlowerFaster Difference between /api/aaa/{id} and /api/zzz/{id} is small
  44. 44. Pros. Cons. Pros & Cons ✗ Slower when number of entries is small.
 (due to overhead of twice matching) ✗ May be difficult to debug large regular expression. ✓ Much faster than ever,
 especially when many mapping entries exist.
  45. 45. Regular Expression (Optimized)
  46. 46. Optimize Regular Expression ## before arr = [r"^/api/books/(?:d+)($)", r"^/api/orders/(?:d+)($)", r"^/api/users/(?:d+)($)"] all_rexp = re.compile("|".join(arr)) ### after arr = [r"^/api", r"(?:", "|".join([r"/books/(?:d+)($)", r"/orders/(?:d+)($)", r"/users/(?:d+)($)"]), r")?"] all_rexp = re.compile("|".join(arr))
  47. 47. Router Class class OptimizedRegexpRouter(Router): def __init__(self, mapping): ## Code is too complicated to show here. ## Please download sample code from github. ## https://github.com/kwatch/router-sample/ def lookup(req_meth, req_path): ## nothing changed; same as previous section
  48. 48. Benchmark 0 10 20 30 40 50 Linear Regexp Naive Prefix Str Fixed Path Naive Smart Optimized Seconds (1M Requests) /api/aaa /api/aaa/{id} /api/zzz /api/zzz/{id} sec SlowerFaster A little faster on /api/zzz/{id}
  49. 49. Pros. Cons. Pros & Cons ✗ Performance benefit is very small (on Python). ✗ Rather difficult to implement and debug. ✓ A little faster than smart regular expression
 when a lot of variable paths exist.
  50. 50. State Machine
  51. 51. State Machine "api" "books" "orders" "123" "456" /d+/ /d+/ /api/books/{id:int}/api/books /api/orders /api/orders/{id:int} : Start : Not Accepted : Accepted
  52. 52. State Machine: Definition path = "/api/books" transition = { "api": { "books": { None: (BooksAPI, {"GET": do_index, ...}), }, }, } >>> transition["api"]["books"][None] (BooksAPI, {"GET": do_index, ...}) >>> transition["api"]["users"][None] KeyError: 'users' Use None as terminator (mark of accepted status)
  53. 53. State Machine: Definition path = "/api/books/{id:int}" transition = { "api": { "books": { None: (BooksAPI, {"GET": do_index, ...}), 1: { None: (BooksAPI, {"GET": do_show, ...}), }, }, }, } >>> transition["api"]["books"][1][None] (BooksAPI, {"GET": do_index, ...}) 1 represents int parameter, 2 represents str parameter.
  54. 54. State Machine: Transition def find(req_path): req_path = req_path.lstrip('/') #ex: "/a/b/c" -> "a/b/c" items = req_path.split('/') #ex: "a/b/c" -> ["a","b","c"] d = transition; params = [] for s in items: if s in d: d = d[s] elif 1 in d: d = d[1]; params.append(int(s)) elif 2 in d: d = d[2]; params.append(str(s)) else: return None if None not in d: return None klass, funcs = d[None] return klass, funcs, params >>> find("/api/books/123") (BooksAPI, {"GET": do_index, ...}, [123])
  55. 55. Router Class class StateMachineRouter(Router): def __init__(self, mapping): self._mapping_dict = {} self._mapping_list = [] self._transition = {} for path, klass, funcs in mapping: if '{' not in path: self._mapping_dict[path] = (klass, funcs, [])
 else: self._register(path, klass, funcs)
  56. 56. Router Class ... PARAM_TYPES = {"int": 1, "str": 2} def _register(self, path, klass, funcs): ptypes = self.PARAM_TYPES d = self._transition for s in path[1:].split('/'): key = s if s[0] == "{" and s[-1] == "}": ## ex: "{id:int}" -> ("id", "int") pname, ptype = s[1:-1].split(':', 1) key = ptypes.get(ptype) or ptypes["str"] d = d.setdefault(key, {}) d[None] = (klass, funcs)
  57. 57. Router Class ... def lookup(self, req_meth, req_path): d = self._transition params = [] for s in req_path[1:].split('/'): if s in d: d = d[s] elif 1 in d: d = d[1]; params.append(int(s)) elif 2 in d: d = d[2]; params.append(str(s)) else: return None, None, None if None in d: klass, funcs = d[None] func = funcs.get(req_meth) return klass, func, params return None, None, None
  58. 58. Benchmark 0 10 20 30 40 50 Linear Regexp StateMachine Naive Prefix Str Fixed Path Naive Smart Optimized Seconds (1M Requests) /api/aaa /api/aaa/{id} /api/zzz /api/zzz/{id} sec SlowerFaster /api/aaa/{id} and /api/zzz/{id} are same performance
  59. 59. Benchmark (PyPy3.5) 0 10 20 30 40 50 Linear Regexp StateMachine Naive Prefix Str Fixed Path Naive Smart Optimized Seconds (1M Requests) /api/aaa /api/aaa/{id} /api/zzz /api/zzz/{id} sec SlowerFaster Regular Expression is very slow in PyPy3.5 String operation is very fast because JIT friendly
  60. 60. Benchmark (PyPy3.5) 0 1 2 3 4 5 Linear Regexp StateMachine Naive Prefix Str Fixed Path Naive Smart Optimized Seconds (1M Requests) /api/aaa /api/aaa/{id} /api/zzz /api/zzz/{id} sec SlowerFaster The fastest method due to Regexp-free (= JIT friendly) A little slower than StateMachine because containing Regexp
  61. 61. Pros. Cons. Pros & Cons ✗ Not support complicated pattern. ✗ Requires some effort to support URL path suffix (ex: /api/books/123.json). ✓ Performance champion in routing area. ✓ Much faster in PyPy3.5, due to regexp-free. JIT friendly!
  62. 62. Conclusion
  63. 63. Conclusion ‣ Linear Search is slow. ‣ Prefix string and Fixed path dict make it faster. ‣ Regular expression is very fast. ‣ Do your best to avoid named group (or named caption). ‣ State Machine is the fastest method in Python. ‣ Especially in PyPy3, due to regexp-free (= JIT friendly).
  64. 64. One More Thing
  65. 65. My Products ‣ Benchmarker.py Awesome benchmarking utility. https://pythonhosted.org/Benchmarker/ ‣ Oktest.py New generation of testing framework. https://pythonhosted.org/Oktest/ ‣ PyTenjin Super fast and feature-rich template engine. https://www.kuwata-lab.com/tenjin/pytenjin-users-guide.html https://bit.ly/tenjinpy_slide (presentation)
  66. 66. Thank You

×