Friday, July 1, 2011

Python and anonymous blocks

I use Python, Java and C++ for writing server softwares. Python is always the first choice and the other 2 are used in typically due to performance reasons. One of the feature which all these languages miss is anonymous blocks. Java provides anonymous classes (Looks like Java 7-8 provides closures) to workaround this.

As indentation is used to represent bocks in python, supporting anonymous blocks is difficult or impossible without significantly altering syntax.  Initially I thought it as a significant drawback. On close inspection of the use cases I have encountered, my views changed. Isn't python providing good alternatives for most of the use cases ?.

I analyzed some of the them where ruby uses anonymous blocks.Except for  DSLs(Domain Specific Languages) I have not found use cases where anonymous blocks provides a better solution. Overall I feel current constructs or alternatives provided by python are good enough for more than 90% of my use cases. Also I have a personal preference for having separate syntax(comprehensions, with) for semantically different use cases than a single solution(anonymous code blocks) for all.

Following are some of the uses cases where ruby uses anonymous blocks and python provides alternatives.

Sorting is a common use case where the api expects a dynamic code block to compare elements. Anonymous blocks can express throw away comparators cleanly. But I like python's solution to this. Python sorting api takes a key function instead of a comparator. Key functions are supposed to return tuple of fields on which object has to be sorted. Python knows how to compare tuples and hence simplifies the problem.
  • sort on age - sorted(users, key=lambda u : u.age)
  • sort on name, age - sorted(users, key=lambda u : (, u.age))
Select, Map on collections
  • Select a set of objects based on some condition.
  • Apply a function/conversion on elements of a collection.
As long as the conditions for selection and conversion function for mapping are simple expressions ,python list/set/dict/generator comprehensions seems to much more elegant than ruby solutions using anonymous blocks. In fact most of the use cases need only simple conditions.

Select a set of users whose age > 25:  
        [u for u in users if u.age > 25]
Given a list of numbers select the set of unique numbers < 25: 
        {n for n in numbers if n <25}
Select a set of users whose age > 25 as a map keyed on emailid
        {u.emailid: u for u in users if u > 25}
Select emailid of users from a map of emailid to user where age >25
        [emailid for emailid, u from users.items() if u.age > 25]

If the condition for selection criteria becomes complex an inner function has to be defined.

def condition(u):
    if u.age < 25:
        return ...
   elif u.age < 40:
        return ...
       return .. 

[u for u in users if condition(u)]

In ruby it could be done as below which definitely is elegant. do |u| 
      if u.age > 25

But how often do we have such use cases ?

      with lock:

File handles
      with open('/var/log/httpd/access.log') as f:
           for line in f:

Network connection handles and pools
      with connmanager.handle() as handle:

Retry-able connection handles and pools
Most intuitive way to solve this use case is to have a contextmanager which internally performs retries on failure.

with connmgr.handle(retry=2) as h

Connection manager retries with a new handle if the current one fails. This is typically seen with connection pools where a TCP connection might have broken without a clean socket close. Context manager  which performs retry-able execution may be expressed as .

def handle(retry=2):
        for _ in range(retry):
            yield newhandle
     except ConnectionError, e:

But this is incorrect. The context is invalid after the first iteration of the loop. More than a limitation this seems to be a constraint added by language designers to avoid hidden flow controls. Last section of this link explains the details. The solution to this is

for handle in connmgr.handle(retry=2):
    with handle:

Now lets analyze some use cases where python doesn't have an equivalent elegant construct. Defining throw away functions seems to be the only option. The common property of these use cases  is that the code block is executed in a different context than to which it is passed.

Typical examples include code blocks invoked on timer expiry, reception of packets, trigger of events etc. Twisted is a python network framework which heavily uses callbacks. Node.js and EventMachine are equivalent frameworks in javascript and ruby respectively where anonymous code blocks provides elegant
way to express callbacks.

I had to use callback very often. However I found that most of the times callback code is complex and is better to express it with a properly named function. Even with anonymous blocks, too much callbacks in eventmachine/node.js can make code complex and difficult to follow. Its better to avoid callbacks if possible. Like in the case of 'sorting' python has a much cleaner way to express networking code using eventlets or gevents. I switched to eventlets and avoided the usage of callbacks.

Thread/Process runnables
Thread and Process apis takes a function which will be executed in their context. Anonymous blocks are pretty useful here. But if the runnable has significant logic, defining a separate function is not bad either.

No comments:

Post a Comment