Friday, July 22, 2011

select, poll and epoll, a twisted story

Some time back I had to write a network server which need to support ~50K concurrent clients in a single box. Server-Client communication used a propitiatory protocol on top of TCP where JSON/NetString is used as the messaging format. Clients exchanged periodic keep-alives which server used to check health state. Server also exposed a REST interface for other modules to communicate with client.  As most of the operations were IO based(socket/db) we decided to used python/twisted to implement server.

On performing load tests we found that server is able to handle only 1024 client after which connections are failing. Increased per process max open files (1024) to 100000 (ulimit -n 100000) and still the connections failed at 1024.

select limitation
select fails after 1024 fds as FD_SETSIZE max to 1024. Refer to blog post for details. Twisted's default reactor seems to be based on select. As a natural progression poll was tried next to over come max open fd issue.

poll limitation
poll solves the max fd issue. But as the number of concurrent clients started increasing, performance dropped drastically. Poll implementation does O(n) operations internally and performance drops as number of fds increases.

epoll
Epoll reactor solved both problems and gave awesome performance. libevent is another library build on top of epoll.

Async frameworks
So next time I will not waste time with 'select/poll' based approaches if the number of concurrent connection expected is above 1K.  Following are some of the event-loop based frameworks where this is applicable.

  • Eventlet (python)
  • Gevent (python) is similar to eventlet uses libevent which is build on top of epoll.
  • C++ ACE 
  • Java Netty
  • Ruby Eventmachine

No comments:

Post a Comment