Companies in the process of adoption of a language evaluate several aspects like :
* performance
* integration with existing ecosystem
* productivity
* use case of this language
In this presentation, we'll focus e think about these points sharing the experience of the integration of Python at Terra.
How to Troubleshoot Apps for the Modern Connected Worker
Python For Large Company?
1. PythonIn Large Companies? SébastienTandel sebastien.tandel@corp.terra.com.br sebastien.tandel@gmail.com
2. Plan About Terra The 7 steps Prototype Define the Goals Integration Some Libs Prove It Works Evangelize Next Steps Conclusions
3. About Terra : Web Portal Largest Latin American web portal Located in 18 countries 1000s of servers Brazil : ~7M unique visitors / day ~70M pageviews / day
5. About Terra : Email Plaftorm I’m part of the email team. Some stats : +10M mailboxes +30M inbound emails per day +30M outbound emails per day avg : 300 mail/s, peak : 600 mail/s Systems Main systems : SMTP, LMTP, POP, IMAP, Webmail Total of +30 systems to design/develop/maintain Main languages used C / C++
6.
7. No official “scripting” language (Python, Perl or other)Why? From what I hear Performance Integration with others systems (& legacy) Costs / benefits? Buzzword fear Labor market
8. Flash Python Overview Python is … Interpreted Dynamically Typed Really Concise Multi-paradigm : procedural, OO, functional Exceptions : helpful for robustness, debug (no strace ;)) Garbage Collector : don’t worry about allocation/free
10. Step 1 : Prototype Buggy system re-written as prototype in Python Surprise! Worked a lot better than its C cousin Prototype is now in production! Spread the word about this rewrite around me Some technical people liked the idea One has not been so enthusiast … my manager Cons: no integration with homemade systems Just one example
11. Step 1 : Prototype Introducing new ideas is a long and though way
12. Step 2 : Define the Goals Performance critical systems : postfix, lmtp, imap / pop
28. Step 3 : Integration Python could be used with every systems but how can I interface with the homemade systems (legacy) ?
29. Step 3 : Integration Various way to create Python Bindings : Python C API: the “hard” way
30. Step 3 : Integration Various way to create Python Bindings : Python C API: the “hard” way swig : the lazy way won’t create a Pythonic API for you
31. Step 3 : Integration Various way to create Python Bindings : Python C API : the “hard” way swig : the lazy way ctypes: the stupidly easy way from ctypes import cdll l = cdll.LoadLibrary(“libc.so.6”) l.mkdir(“python-mkdir-test”)
32. Step 3 : Integration Various way to create Python Bindings : Python C API : the “hard” way swig : the lazy way ctypes : the stupidly easy way Cython : write python, compile with gcc
33. Step 3 : Integration Wrote bindings to interface with all major internal systems (thanks to ctypes) With pythonic API!
38. Step 4 : Some LibsMaster / Slave Master responsible for : Forking the slaves Reading a “list” of tasks Distribution of the tasks to the slaves Slave responsible for : Execution of the task Return execution status to the master Key characteristics : Slave death detection Handle unhandled exceptions(+ hook) Master <-> slave protocol allows temporary error code Timeout of the tasks
39. Step 4 : Some LibsMaster / Slave One neat characteristic : System might got bug in prod w/ minimal impact If unhandled exception occurs Only one slave dies It is detected and master will fork a new one (if needed) The lib handles the exception : Default behavior : prints to console User defined (callback) : e.g. write the stack trace to a file! Cherry on the cake : getting specific production data about faulty task
40. from robustpools.process_pool import master_task_list from robustpools.process_pool import slave_task_list m_config = { 'INFINITE_LOOP' : 0 } class list_task(object): def __init__(self, list, num, timeout_validity=600): self.__num = num def _id(self): return self.__num id = property(_id) class list_slave(slave_task_list): def __init__(self): super(list_slave, self).__init__(list_task) def run(self, task): print task.id return 0, "ok” list = xrange(10) m = master_task_list(list, num_slave=5, slave_class=list_slave, config=m_config) m.start()
41. Step 4 : Some LibsTCP Sockets Pool Manage connections to a pool of servers send in a round-robin/priority way to each server Detect connection errors Retry to connect Number of retries limited => after mark as dead Retry again later with exponential backoff
43. Step 5 : Prove It Works Prove = collect data … How? Write integrated systems using bindings and libs of previous steps. Show it works Performance Productivity
44. Step 5 : Prove It Works Performance, one obvious thought : C/C++ PINCS Performance is not C, Stupid!
45. Step 5 : Prove It WorksPerformance Some of the rewrites works faster than C/C++ cousins Why? OS / Systems limits Libs (legacy) Algorithms Software Architecture Infrastructure
46.
47. Step 5 : Prove It WorksProductivity http://page.mi.fu-berlin.de/prechelt/Biblio/jccpprt2_advances2003.pdf
48. Step 5 : Prove It WorksProductivity http://www.ohloh.net
49. Step 5 : Prove It WorksProductivity http://www.ohloh.net
50. Step 5 : Prove It WorksProductivity http://www.ohloh.net
52. Step 5 : Prove It WorksProductivity Some existing C/C++ systems re-written in Python Original C/C++ versions total of ~20.000 LOC In Python, 4-6x less code ! The previous numbers do not seem to lie
53. Step 5 : Prove It WorksProductivity Oh, parsing an email? Any idea in C/C++?
54.
55.
56.
57. content types of partsfrom email import message_from_file def get_mail(filename): fh = open(filename, “r”) mail = message_from_file(fh) fh.close() return mail mail = get_mail(filename) for part in mail.walk(): print part.get_content_type()
61. Any idea in C/C++?from email import message_from_file def get_mail(filename): fh = open(filename, “r”) mail = message_from_file(fh) fh.close() return mail mail = get_mail(filename) for part in mail.walk(): print part.get_content_type()
65. Python libs are just that simple!… and there are a lot! from email import message_from_file def get_mail(filename): fh = open(filename, “r”) mail = message_from_file(fh) fh.close() return mail mail = get_mail(filename) for part in mail.walk(): print part.get_content_type() print mail[“From”] print mail[“Subject”]
66. Step 5 : Prove It WorksPerformance (Again?) For equivalent architecture (libs, algorithm, infrastructure) C is a best performer than Python! Python Is Not C, Stupid!
67. Step 5 : Prove It WorksPerformance (Again?) Bottleneck discovered! PINCS! : think first to architecture!
73. Cython: write python, obtain a gcc compiled libPsyco: JIT for python Just an additional module import in your code 2 – 100x times faster than normal Python Requires a bit more memory
81. Step 6 : Evangelize Once having stopped and look at what have been accomplished … Show it, Evangelize!
82. Step 6 : Evangelize Because introducing a “new technology” is not just about teaching something to users. You’ve got to play the role of evangelist! Innovators (3.5%) New stuffs? they’re in!
93. Step 6 : Evangelize During work, I constantly spoke (a lot) to others Presentation on Python made for all Present to a large audience what has been done Open discussion Poster resuming what has been done Wiki page documenting Python stuffs Specific mailing-list related to Python
94. Step 6 : Evangelize lot of work and slow process but I won some allies Some technical people are convinced that Python is useful Some managers are convinced that Python could be a good thing for Terra Starting evaluation in some specific cases
96. Step 7 : Next Steps Proven that Python could be useful in some cases. Don’t forget my Grail! The way has not ended … I’m lobbying to start using Python for web development. And again, I made a prototype
97.
98. Step 7 : Next Steps Login : Module auth already exists. Easy to tell django that authentication is required @login_required def list_abook(request, username): … login_requiredis a python decorator
99. Step 7 : Next Steps Caching information (memcache, bd, file, …) 4 levels : Per site : one config line Per view : one python decorator @cache_page(60 * 15) def list_abook(request, username): … In templates : maybe better to let this one out! Low-level cache access : cache.get(id) cache.set(id, value, timeout)
100. Step 7 : Next Steps Address book Web Service Retrieve address book of one user, Add an account, Add an entry to the address book of a user, View all the address book entries, Output in HTML, JSON and CSV < 100 LOC 2 hours (w/o knowing the framework) Not one line of SQL just usefulcode
101. Conclusions One year and a half … and Evangelization is not done yet! Email Team : Several systems have been written in Python and works really fine … even with the Terra high load! Web project should start right now People are starting using/learning it inside the company Some teams are starting evaluating Python Some Terra employees here at this conference!
C / C++ / Java a lot of projects … definitely references languagesC# going up but yet a few projectsRuby is not well established .. Yet? Only for web???Python has already a lot of projects and has the lowest LOC per project