Wednesday, June 22, 2011

PEP 382 sprint summary

So, yesterday (June 21, 2011), six talented and motivated Python hackers from the Washington DC area met at Panera Bread in downtown Silver Spring, Maryland to sprint on PEP 382. This is a Python Enhancement Proposal to introduce a better way for handling namespace packages, and our intent is to get this feature landed in Python 3.3. Here then is a summary, from my own spotty notes and memory, of how the sprint went.

First, just a brief outline of what the PEP does. For more details please read the PEP itself, or join the newly resurrected import-sig for more discussions. The PEP has two main purposes. First, it fixes the problem of which package owns a namespace's __init__.py file, e.g. zope/__init__.py for all the Zope packages. In essence, it eliminate the need for these by introducing a new variant of .pth files to define a namespace package. Thus, the zope.interfaces package would own zope/zope-interfaces.pth and the zope.components package would own zope/zope-components.pth.  The presence of either .pth file is enough to define the namespace package.  There's no ambiguity or collision with these files the way there is for zope/__init__.py.  This aspect will be very beneficial for Debian and Ubuntu.

Second, the PEP defines the one official way of defining namespace packages, rather than the multitude of ad-hoc ways currently in use.  With the pre-PEP 382 way, it was easy to get the details subtly wrong, and unless all subpackages cooperated correctly, the packages would be broken.  Now, all you do is put a * in the .pth file and you're done.

Sounds easy, right?  Well, Python's import machinery is pretty complex, and there are actually two parallel implementations of it in Python 3.3, so gaining traction on this PEP has been a hard slog.  Not only that, but the PEP has implications for all the packaging tools out there, and changes the API requirements for PEP 302 loaders.  It doesn't help that import.c (the primary implementation of the import machinery) has loads of crud that predates PEP 302.

On the plus side, Martin von Loewis (the PEP author) is one of the smartest Python developers around, and he's done a very good first cut of an implementation in his feature branch, so there's a great place to start.

Eric Smith (who is the 382 BDFOP, or benevolent dictator for one pep), Jason Coombs, and I  met once before to sprint on PEP 382, and we came away with more questions than answers.  Eric, Jason, and I live near each other so it's really great to meet up with people for some face-to-face hacking.  This time, we made a wider announcement, on social media and the BACON-PIG mailing list, and were joined by three other local Python developers.  The PSF graciously agreed to sponsor us, and while we couldn't get our first, second, third, or fourth choices of venues, we did manage to score some prime real-estate and free wifi at Panera.

So, what did we accomplish?  Both a lot, and a little.  Despite working from about 4pm until closing, we didn't commit much more than a few bug fixes (e.g. an uninitialized variable that was crashing the tests on Fedora), a build fix for Windows, and a few other minor things.  However, we did come away with a much better understanding of the existing code, and a plan of action to continue the work online.  All the gory details are in the wiki page that I created.


One very important thing we did was to review the existing test suite for coverage of the PEP specifications.  We identified a number of holes in the existing test suite, and we'll work on adding tests for these.  We also recognized that importlib (the pure-Python re-implementation of the import machinery) wasn't covered at all in the existing PEP 382 tests, so Michael worked on that.  Not surprisingly, once that was enabled, the tests failed, since importlib has not yet been modified to support PEP 382.


We also came up with a number of questions where we think the PEP needs clarification.  We'll start discussion about these on the relevant mailing lists.


Finally, Eric brought up a very interesting proposal.  We all observed how difficult it is to make progress on this PEP, and Eric commented on how there's a lot of historical cruft in import.c, much of which predates PEP 302.  That PEP defines an API for extending the import machinery with new loaders and finders.  Eric proposed that we could simplify import.c by removing all the bits that could be re-implemented as PEP 302 loaders, specifically the import-from-filesystem stuff.  The other neat thing is that the loaders could probably be implemented in pure-Python without much of a performance hit, since we surmise that the stat calls dominate. If that's true, then we'd be able to refactor importlib to share a lot of code with the built-in C import machinery.  This could have the potential to greatly simplify import.c so that it contains just the PEP 302 machinery, with some bootstrapping code.  It may even be possible to move most of the PEP 382 implementation into the loaders.  At the sprint we did a quick experiment with zipping up the standard library and it looked promising, so Eric's going to take a crack at this.


This is briefly what we accomplished at the sprint.  I hope we'll continue the enthusiasm online, and if you want to join us, please do subscribe to the import-sig!