Posts

Showing posts from 2011

select, poll and epoll, a twisted story

Some time back I had to write a network server which need to support ~50K concurrent clients in a single box. Server-Client communication used a propitiatory protocol on top of TCP where JSON/NetString is used as the messaging format. Clients exchanged periodic keep-alives which server used to check health state. Server also exposed a REST interface for other modules to communicate with client.  As most of the operations were IO based(socket/db) we decided to used python/twisted to implement server. On performing load tests we found that server is able to handle only 1024 client after which connections are failing. Increased per process max open files (1024) to 100000 (ulimit -n 100000) and still the connections failed at 1024. select limitation select fails after 1024 fds as  FD_SETSIZE max to 1024. Refer to  blog post  for details. Twisted's default reactor seems to be based on select. As a natural progression  poll  was tried nex...

Python, Ruby and OOPs

I have found quite some blogs and articles mentioning python as not being completly object oriented and ruby as a better alternative in this perspective. Of course all entities in python are objects. But instead of providing methods to objects to solve all problems, python provides a mixture of  builtin functions , module functions and object methods. I like python's approach and feel that is the right one. Can all problems be modeled and solved elegantly using object and methods ? I don't think so. Following is from an  interview  with Alexander Stepanov the creator of C++ STL where he mentions. I find OOP methodologically wrong. It starts with classes. It is as if mathematicians would start with  axioms. You  do not start with axioms - you start with proofs. Only when you have found a bunch of related proofs, can you come up with axioms. You end with axioms. The same thing is true in programming: you have to start with interesting algorithms...

Python and anonymous blocks

I use Python, Java and C++ for writing server softwares. Python is always the first choice and the other 2 are used in typically due to performance reasons. One of the feature which all these languages miss is anonymous blocks. Java provides anonymous classes (Looks like Java 7-8 provides closures) to workaround this. As indentation is used to represent bocks in python, supporting anonymous blocks is difficult or impossible without significantly altering syntax.  Initially I thought it as a significant drawback. On close inspection of the use cases I have encountered, my views changed. Isn't python providing good alternatives for most of the use cases ?. I analyzed  some  of the them where ruby uses anonymous blocks.Except for  DSLs(Domain Specific Languages) I have not found use cases where anonymous blocks provides a better solution. Overall I feel current constructs or alternatives provided by python are good enough for...

Python, json and garbage collection

We have a webapp which exposes a REST interface. Json is used as the data format for most of the apis.  98% of those json messages were less than 500 Kb. But 2% of them can go above 100 MB. It was observed that after processing one such 100 MB json message, the process memory went up to 500 MB and stayed there. It never came down even after running the webapp for hours and processing small json messages. Interesting observation is that memory remained at 500 MB even after processing multiple 100 MB json messages. On analyzing the problem it was found that ' json.loads ' is the culprit. Calling   gc.collect does releases the memory. And for now that seems to be the only solution. The memory is not held up in any  caches or python's internal memory allocator as the explicit call to gc.collect is releasing memory.  It seems the gc threshold was never reached and as a result garbage collection never kicked in. But it seems strange that threshold was ...