The document discusses various approaches for dealing with blocking code within the asyncio event loop:
1. Check if a compatible asyncio library exists
2. Check if a REST API is available to avoid blocking
3. Check if there is a simple text or binary protocol that can be implemented without blocking
4. Check if there is an existing synchronous Python client that can be adapted
5. Use a thread pool executor to run blocking code in separate threads to avoid blocking the event loop
For filesystem and CPU intensive operations, the document recommends using a thread pool executor due to OS limitations on asynchronous filesystem access. The aiofiles library provides an asynchronous filesystem wrapper that uses threads in the background.
Project Based Learning (A.I).pptx detail explanation
HOW TO DEAL WITH BLOCKING CODE WITHIN ASYNCIO EVENT LOOP
1. HOW TO DEAL WITH
BLOCKING CODE WITHIN
ASYNCIO EVENT LOOP
Nikolay Novik
http://github.com/jettify
2. I AM ...
Software Engineer: at DataRobot Ukraine
Github:
My Projects:
database clients:
aiomysql, aioobc, aiogibson
web and etc:
aiohttp_debugtoolbar, aiobotocore,
aiohttp_mako, aiohttp_sse, aiogearman,
aiomysql_replicatoin
http://github.com/jettify
3. POLL
YOU AND ASYNCIO:
1. I am using asyncio extensively
2. I am using Twisted, Tornado, gevent etc. extensively
3. I think async programming is kinda cool
4. ASYNCIO
The asyncio project was officially launched with the release
of Python 3.4 in March 2014.
Bare: almost no library
One year later, asyncio has a strong community writing
libraries on top of it.
But what to do when available libraries work in sync way,
potentially blocking event loop?
5. RULES OF ASYNC CLUB
RULE #1
You do not block event loop
RULE #2
You never block event loop
6. BLOCKING CALLS IN THIRD PARTY
LIBRARIES
Network IO
API wrappers
Database clients
Message queues
FileSystem IO
CPU
7. Set environment variable
DEBUGGING BLOCKING CALLS TIP
PYTHONASYNCIODEBUG=1
import asyncio
import time
loop = asyncio.get_event_loop()
loop.slow_callback_duration = 0.01
async def sleeper():
time.sleep(0.1) # we block here
loop.run_until_complete(sleeper())
Executing <Task finished coro=<sleeper() done, defined at
code/debug_example.py:9> result=None created at
/usr/local/lib/python3.5/asyncio/base_events.py:323>
took 0.102 seconds
8. APPROACH #1 IS THERE ANY
SUITABLE LIBRARY?
Search asyncio compatible library on:
1. google ~ 98k results
2. pypi ~200 packages
3. asyncio wiki page:
4. aio-libs:
https://github.com/python/asyncio/wiki/ThirdParty
https://github.com/aio-libs
9. Read the (f*g) source code of your libraries! Example of
python code from OneDrive SDK
Most of the time you want to do HTTP requests using event
loop not thread pool.
THIRD PARTY LIBRARIES PRO TIP
@asyncio.coroutine
def get_async(self):
"""Sends the GET request using an asyncio coroutine
....
"""
future = self._client._loop.run_in_executor(None,
self.get)
collection_response = yield from future
return collection_response
10. Most hipsters databases use REST API as primary access
method:
Easy to implement required subset of APIs.
APPROACH #2 IS REST API AVAILABLE?
DynamoDB
Neo4j
Elasticsearch
HBase
HDFS
CouchDB
Riak
VoltDB
InfluxDB
ArangoDB
11. is your friend
Connection pooling helps to save on expensive connection
creation. (PS: checkout new aiohttp 0.18.x release)
REST CLIENT TIP
aiohttp.ClientSession
import asyncio
import aiohttp
# carry the loop Luke!
loop = asyncio.get_event_loop()
async def go():
session = aiohttp.ClientSession(loop=loop)
async with session.get('http://python.org') as resp:
data = await resp.text()
print(data)
session.close()
loop.run_until_complete(go())
12. APPROACH #3 IS THERE SIMPLE TEXT
OR BINARY PROTOCOL?
Example of databases and message queues with binary
protocol:
redis
memcached
couchebase
grearman
beanstalkd
disque
Do not afraid to get your hands dirty.
14. Most binary protocols support pipelining
More info: http://tailhook.github.io/request-pipelining-
presentation/ presentation/index.html
PROTOCOL PIPELINING
15. Example: Simple pipelined binary protocol implementation
See for reference implementation.
def execute(self):
cmd = encode_command(b'GET', 'key')
self.writer.write(cmd)
fut = asyncio.Future(loop=self._loop)
self._queue.append(fut)
return fut
async def reader_task(self):
while True:
header = await self.reader.readexactly(4 + 2 + 1)
unpacked = struct.unpack(b'<HBI', header)
code, gb_encoding, resp_size = unpacked
# wait and read payload
payload = await reader.readexactly(resp_size)
future = self._queue.pop()
future.set_result(payload)
aioredis
16. In good sync database clients IO decoupled from protocol
parsers why not just rewrite IO part?
APPROACH #4 IS SYNC PYTHON CLIENT
AVAILABLE?
1. Locate socket.recv()
2. Replace with await reader.read()
3. Make function coroutine with async def
4. Call this function with await
5. Call parent functions with await
17. Yes. Make every blocking call in separate thread
APPROACH #5 IS THERE UNIVERSAL
SOLUTION TO ALL PROBLEMS?
import asyncio
from pyodbc import connect
loop = asyncio.get_event_loop()
executor = ThreadPoolExecutor(max_workers=4)
async def test_example():
dsn = 'Driver=SQLite;Database=sqlite.db'
conn = await loop.run_in_executor(executor, connect, dsn)
cursor = await loop.run_in_executor(executor, conn.cursor)
conn = await loop.run_in_executor(executor, cursor.execute,
'SELECT 42;')
loop.run_until_complete(test_example())
18. For python code
For Cython
BUT HOW I KNOW WHICH METHOD TO
CALL IN THREAD?
requests.get()
with nogil:
[code to be executed with the GIL released]
For C extension
Py_BEGIN_ALLOW_THREADS
ret = SQLDriverConnect(hdbc, 0, szConnect, SQL_NTS,
0, 0, 0, SQL_DRIVER_NOPROMPT);
Py_END_ALLOW_THREADS
19. Only good way to use files asynchronously by using thread
pools.
WHAT ABOUT FILESYSTEM IO?
asyncio does not support asynchronous operations on the
filesystem due to OS limitations.
20. On background uses ThreadPoolExecutor for
blocking calls.
aiofiles library workaround
async def go():
f = await aiofiles.open('filename', mode='r')
try:
data = await f.read()
finally:
await f.close()
print(data)
loop.run_until_complete(go())
aiofiles
21. WHAT ABOUT CPU INTENSIVE TASK?
loop = asyncio.get_event_loop()
executor = ProcessPoolExecutor(max_workers=3)
def is_prime(n):
if n % 2 == 0: return False
sqrt_n = int(math.floor(math.sqrt(n)))
for i in range(3, sqrt_n + 1, 2):
if n % i == 0: return False
return True
async def go():
n = 112272535095293
result = await loop.run_in_executor(executor, is_prime, n)
loop.run_until_complete(go(loop, executor))