Friday, July 22, 2011

select, poll and epoll, a twisted story

Some time back I had to write a network server which need to support ~50K concurrent clients in a single box. Server-Client communication used a propitiatory protocol on top of TCP where JSON/NetString is used as the messaging format. Clients exchanged periodic keep-alives which server used to check health state. Server also exposed a REST interface for other modules to communicate with client.  As most of the operations were IO based(socket/db) we decided to used python/twisted to implement server.

On performing load tests we found that server is able to handle only 1024 client after which connections are failing. Increased per process max open files (1024) to 100000 (ulimit -n 100000) and still the connections failed at 1024.

select limitation
select fails after 1024 fds as FD_SETSIZE max to 1024. Refer to blog post for details. Twisted's default reactor seems to be based on select. As a natural progression poll was tried next to over come max open fd issue.

poll limitation
poll solves the max fd issue. But as the number of concurrent clients started increasing, performance dropped drastically. Poll implementation does O(n) operations internally and performance drops as number of fds increases.

Epoll reactor solved both problems and gave awesome performance. libevent is another library build on top of epoll.

Async frameworks
So next time I will not waste time with 'select/poll' based approaches if the number of concurrent connection expected is above 1K.  Following are some of the event-loop based frameworks where this is applicable.

  • Eventlet (python)
  • Gevent (python) is similar to eventlet uses libevent which is build on top of epoll.
  • C++ ACE 
  • Java Netty
  • Ruby Eventmachine

Friday, July 1, 2011

Python, Ruby and OOPs

I have found quite some blogs and articles mentioning python as not being completly object oriented and ruby as a better alternative in this perspective. Of course all entities in python are objects. But instead of providing methods to objects to solve all problems, python provides a mixture of builtin functions, module functions and object methods. I like python's approach and feel that is the right one. Can all problems be modeled and solved elegantly using object and methods ? I don't think so.

Following is from an interview with Alexander Stepanov the creator of C++ STL where he mentions.

I find OOP methodologically wrong. It starts with classes. It is as if mathematicians would start with  axioms. You  do not start with axioms - you start with proofs. Only when you have found a bunch of related proofs, can you come up with axioms. You end with axioms. The same thing is true in programming: you have to start with interesting algorithms. Only when you understand them well, can you come up with an interface that will let them work.

Operations in STL  algorithm are a good set of examples where independent functions makes sense over object methods. Some such examples in python and ruby where they take different approach are

a built-in function provided by python which takes a iterables and returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables.
zip([1,2], [3,4], [4,5]) = [(1, 3, 4), (2, 4, 5)]

a module(itertools) function which does a cartesian product of input sequences.
list(product([1,2], [2,3], [4,5])) = [(1,2,4), (1,2,5), (1,3,4), (1,3,5), (2,2,4), (2,2,5), (2,3,4), (2,3,5)]

Ruby provides zip and product as a methods of list through Enumerable mixin.
[1,2].zip([3,4], [4,5]) = [(1, 3, 4), (2, 4, 5)]
[1,2].product([3,4],[5,6]) = [[1,3,5],[1,3,6],[1,4,5],[1,4,6],  [2,3,5],[2,3,6],[2,4,5],[2,4,6]]

Ultimately both solves the problem. But what is the meaning of [1,2].zip([3,4], [4,5]) or [1,2].product([3,4],[5,6]) ? product is defined as the cartesian product of a set of items where no item has a special meaning. But with oops representation the first item is special and has a different meaning. It is providing a functionality to combine others.

Python and anonymous blocks

I use Python, Java and C++ for writing server softwares. Python is always the first choice and the other 2 are used in typically due to performance reasons. One of the feature which all these languages miss is anonymous blocks. Java provides anonymous classes (Looks like Java 7-8 provides closures) to workaround this.

As indentation is used to represent bocks in python, supporting anonymous blocks is difficult or impossible without significantly altering syntax.  Initially I thought it as a significant drawback. On close inspection of the use cases I have encountered, my views changed. Isn't python providing good alternatives for most of the use cases ?.

I analyzed some of the them where ruby uses anonymous blocks.Except for  DSLs(Domain Specific Languages) I have not found use cases where anonymous blocks provides a better solution. Overall I feel current constructs or alternatives provided by python are good enough for more than 90% of my use cases. Also I have a personal preference for having separate syntax(comprehensions, with) for semantically different use cases than a single solution(anonymous code blocks) for all.

Following are some of the uses cases where ruby uses anonymous blocks and python provides alternatives.

Sorting is a common use case where the api expects a dynamic code block to compare elements. Anonymous blocks can express throw away comparators cleanly. But I like python's solution to this. Python sorting api takes a key function instead of a comparator. Key functions are supposed to return tuple of fields on which object has to be sorted. Python knows how to compare tuples and hence simplifies the problem.
  • sort on age - sorted(users, key=lambda u : u.age)
  • sort on name, age - sorted(users, key=lambda u : (, u.age))
Select, Map on collections
  • Select a set of objects based on some condition.
  • Apply a function/conversion on elements of a collection.
As long as the conditions for selection and conversion function for mapping are simple expressions ,python list/set/dict/generator comprehensions seems to much more elegant than ruby solutions using anonymous blocks. In fact most of the use cases need only simple conditions.

Select a set of users whose age > 25:  
        [u for u in users if u.age > 25]
Given a list of numbers select the set of unique numbers < 25: 
        {n for n in numbers if n <25}
Select a set of users whose age > 25 as a map keyed on emailid
        {u.emailid: u for u in users if u > 25}
Select emailid of users from a map of emailid to user where age >25
        [emailid for emailid, u from users.items() if u.age > 25]

If the condition for selection criteria becomes complex an inner function has to be defined.

def condition(u):
    if u.age < 25:
        return ...
   elif u.age < 40:
        return ...
       return .. 

[u for u in users if condition(u)]

In ruby it could be done as below which definitely is elegant. do |u| 
      if u.age > 25

But how often do we have such use cases ?

      with lock:

File handles
      with open('/var/log/httpd/access.log') as f:
           for line in f:

Network connection handles and pools
      with connmanager.handle() as handle:

Retry-able connection handles and pools
Most intuitive way to solve this use case is to have a contextmanager which internally performs retries on failure.

with connmgr.handle(retry=2) as h

Connection manager retries with a new handle if the current one fails. This is typically seen with connection pools where a TCP connection might have broken without a clean socket close. Context manager  which performs retry-able execution may be expressed as .

def handle(retry=2):
        for _ in range(retry):
            yield newhandle
     except ConnectionError, e:

But this is incorrect. The context is invalid after the first iteration of the loop. More than a limitation this seems to be a constraint added by language designers to avoid hidden flow controls. Last section of this link explains the details. The solution to this is

for handle in connmgr.handle(retry=2):
    with handle:

Now lets analyze some use cases where python doesn't have an equivalent elegant construct. Defining throw away functions seems to be the only option. The common property of these use cases  is that the code block is executed in a different context than to which it is passed.

Typical examples include code blocks invoked on timer expiry, reception of packets, trigger of events etc. Twisted is a python network framework which heavily uses callbacks. Node.js and EventMachine are equivalent frameworks in javascript and ruby respectively where anonymous code blocks provides elegant
way to express callbacks.

I had to use callback very often. However I found that most of the times callback code is complex and is better to express it with a properly named function. Even with anonymous blocks, too much callbacks in eventmachine/node.js can make code complex and difficult to follow. Its better to avoid callbacks if possible. Like in the case of 'sorting' python has a much cleaner way to express networking code using eventlets or gevents. I switched to eventlets and avoided the usage of callbacks.

Thread/Process runnables
Thread and Process apis takes a function which will be executed in their context. Anonymous blocks are pretty useful here. But if the runnable has significant logic, defining a separate function is not bad either.

Python, json and garbage collection

We have a webapp which exposes a REST interface. Json is used as the data format for most of the apis.  98% of those json messages were less than 500 Kb. But 2% of them can go above 100 MB. It was observed that after processing one such 100 MB json message, the process memory went up to 500 MB and stayed there. It never came down even after running the webapp for hours and processing small json messages. Interesting observation is that memory remained at 500 MB even after processing multiple 100 MB json messages. On analyzing the problem it was found that 'json.loads' is the culprit. Calling gc.collect does releases the memory. And for now that seems to be the only solution.

The memory is not held up in any caches or python's internal memory allocator as the explicit call to gc.collect is releasing memory. It seems the gc threshold was never reached and as a result garbage collection never kicked in. But it seems strange that threshold was never reached even after running the webapp for hours.

Test code

The test code shown below simulates the situation .If the call to json.loads is omitted then this issue is not observed.  GC counts printed after invocation of jsontest is (150, 9, 0)  which indicates that gc threshold (700, 10, 10) is not met.

import json
import time
import gc
from itertools import count


def keygen(size):
    for i in count(1):
        s = str(i)
        yield '0' * (size - len(s)) + str(s)

def jsontest(num):
    keys = keygen(20)
    kvjson = json.dumps(dict((, '0' * 200) for i in range(num)))
    kvpairs = json.loads(kvjson)
    del kvpairs # Not required. Just to see if it makes any difference                            
    print 'load completed'

print gc.get_count()

while 1: