Written by Barry Warsaw in technology on Fri 22 June 2012. Tags: i18n, mailman, python, python3, ubuntu,
Recently, as part of our push to ship only Python 3 on the Ubuntu 12.10 desktop, I've helped several projects update their internationalization (i18n) support. I've seen lots of instances of suboptimal Python 2 i18n code, which leads to liberal sprinkling of cargo culted .decode() and .encode() calls simply to avoid the dreaded UnicodeError s. These get worse when the application or library is ported to Python 3 because then even the workarounds aren't enough to prevent nasty failures in non-ASCII environments (i.e. the non-English speaking world majority :).
Let's be honest though, the problem is not because these developers are crappy coders! In fact, far from it, the folks I've talked with are really really smart, experienced Pythonistas. The fundamental problem is Python 2's 8-bit string type which doubles as a bytes type, and the terrible API of the built-in Python 2 gettext module, which does its utmost to sabotage your Python 2 i18n programs. I take considerable blame for the latter, since I wrote the original version of that module. At the time, I really didn't understand unicodes (this is probably also evident in the mess I made of the email package). Oh, to really have access to Guido's time machine.
The good news is that we now know how to do i18n right, especially in a bilingual Python 2/3 world, and the Python 3 gettext module fixes the most egregious problems in the Python 2 version. Hopefully this article does some measure of making up for my past sins.
Stop right here and go watch Ned Batchelder's talk from PyCon 2012 entitled Pragmatic Unicode, or How Do I Stop the Pain? It's the single best description of the background and effective use of Unicode in Python you'll ever see. Ned does a brilliant job of …