Python 3 Porting Fun Redux

Written by Barry Warsaw in technology on Fri 06 January 2012. Tags: python3,

My last post on Python 3 porting got some really great responses, and I've learned a lot from the feedback I've seen. I'm here to rather briefly outline a few additional tips and tricks that folks have sent me and that I've learned by doing other ports since then. Please keep them coming, either in the blog comments or to me via email. Or better yet, blog about your experiences yourself and I'll link to them from here.

One of the big lessons I'm trying to adopt is to support Python 3 in pure-Python code with a single code base. Specifically, I'm trying to avoid using 2to3 as much as possible. While I think 2to3 is an excellent tool that can make it easier to get started supporting both Python 2 and Python 3 from a single branch of code, it does have some disadvantages. The biggest problem with 2to3 is that it's slow; it can take a long time to slog through your Python code, which can be a significant impediment to your development velocity. Another 2to3 problem is that it doesn't always play nicely with other development tools, such as python setup.py test and virtualenv, and you occasionally have to write additional custom fixers for conversion that 2to3 doesn't handle.

Given that almost all the code I'm writing these days targets Python 2.6 as the minimal supported Python 2 version, 2to3 may just be unnecessary. With my dbus-python port to Python 3, and with my own flufl packages, I'm experimenting with ignoring 2to3 and trying to write one code base for all of Python 2.6, 2.7, and 3.2. My colleague Michael Foord has been pretty successful with this approach going back all the way to Python 2.4, so 2.6 as a minimum should be no problem! C extensions are pretty easy because you have the C preprocessor to help you. But it turns out that it's usually not too difficult in pure-Python either. I've done this in my latest release of the flufl.bounce package, and intend to eliminate 2to3 in my other flufl packages soon too.

The first thing I've done is add print_function to the __future__ import in all my modules. Previously, I was only importing unicode_literals and absolute_import. But doctests tend to use a lot of print statements, so switching to the print() function explicitly removes one big 2to3 conversion. Aside from having to unlearn decades of print statement muscle memory, the print() function is actually rather nice. So my module template now looks like this (with the copyright comment block omitted):

from __future__ import absolute_import, print_function, unicode_literals
__metaclass__ = type
__all__ = [
    ]

Speaking of doctests, you really want them to have the same set of future imports as all your other code. I'll talk more about how my own packages set up doctests later, but for now, it's useful to know that I create a doctest.DocFileSuite for every doctest in my package. These suites all have a setup() function and Python's testing framework will call these at the appropriate time, passing in a testobj parameter. This argument has a globs attribute which serves as the module globals for the doctest. All you need to do to enable the future imports in your doctests is to do something like this:

def setup(testobj):
    try:
        testobj.globs['absolute_import'] = absolute_import
        testobj.globs['print_function'] = print_function
        testobj.globs['unicode_literals'] = unicode_literals
    except NameError:
        pass

The try-except really is only necessary if you keep using 2to3, since that tool will remove the future imports from all the modules it processes. The future imports still exist in Python 3 of course, since future imports are never, ever removed. So if you ditch 2to3, you can get rid of the try-except too.

In the latest release of flufl.bounce, I changed the API so that the detected email addresses are all explicitly bytes objects in Python 3 (and 8-bit strings in Python 2). This caused some problems with my doctests because the repr of Python 3 bytes objects is different than the repr of 8-bit strings in Python 2. When you print the object in Python 2, you get just the contents of the string, but when you print them in Python 3, you get the b'' -prefix:

% python
Python 2.7.2+ (default, Dec 18 2011, 17:30:39)
[GCC 4.6.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print b'foo'
foo

% python3
Python 3.2.2+ (default, Dec 19 2011, 12:03:32)
[GCC 4.6.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print(b'foo')
b'foo'

This means your doctest cannot be written to easily support both versions of Python when bytes/8-bit strings are used. I use the following helper to get around this:

def print_bytes(obj):
    if bytes is not str:
        obj = repr(obj)[2:-1]
    print(obj)

Remember that in Python 2, bytes is just an alias for str so this code only gets invoked in Python 3.

Another fun bytes/8-bit-string issue is that in Python 3, bytes objects have no .format() method. So if you're doing something like b'foo {0}'.format(obj) this will work in Python 2, but fail in Python 3. The best I've come up with for this is to use concatenation instead, or do the format using unicodes and then encode them to their bytes object (but then you have the additional fun of choosing an appropriate encoding!).

Did you know that the re module can scan either unicodes or bytes in Python 3? The switch is made by passing in either a bytes pattern or a str pattern, and then passing in the appropriate type of object to parse. But, if you use the r'' -prefix (i.e. raw strings) for saner handling of backslashes, you've got another problem when you want to parse bytes. Python does not support rb'' -prefixes, meaning you can have either raw string literals or bytes string literals but not both. You have to forgo one or the other, and I usually come down on the side of ditching the raw strings and suffering the pain of backslash proliferation.

Some of the code I was porting was using itertools.izip_longest(), but this doesn't exist in Python 3. Instead you have itertools.zip_longest(). You'll have to do a conditional import (i.e. try-except) around this to get the right version.

Do you use zope.interfaces? You'll be interested to know that the syntax we've long been accustomed to for declaring that a class implements an interface does not work in Python 3. For example:

from zope.interface import Interface, implements

class MyInterface(Interface):
    pass

class MyClass:
    implements(MyInterface)

This is because the stack hacking that implements() uses doesn't work in Python 3. Fortunately, the latest version of zope.interface has a new class decorator that you can use instead. This works in Python 2.6 and 2.7 too, so change your code to use this:

from zope.interface import Interface, implementer

class MyInterface(Interface):
    pass

@implementer(MyInterface)
class MyClass:
    pass

I kind of like the use of class decorators better anyway.

Here's a tricky one. Did you know that Python 2 provides some codecs for doing interesting conversions such as Caeser rotation (i.e. rot13)? Thus, you can do things like:

>>> 'foo'.encode('rot-13')
'sbb'

This doesn't work in Python 3 though, because even though certain str-to-str codecs like rot-13 still exist, the str.encode() interface requires that the codec return a bytes object. In order to use str-to-str codecs in both Python 2 and Python 3, you'll have to pop the hood and use a lower-level API, getting and calling the codec directly:

>>> from codecs import getencoder
>>> encoder = getencoder('rot-13')
>>> rot13string = encoder(mystring)[0]

You have to get the zeroth-element from the return value of the encoder because of the codecs API. A bit ugly, but it works in both versions of Python.

That's all for now. Happy porting!


Comments

comments powered by Disqus