We Fear Change

Resource management in Python 3.3, or contextlib.ExitStack FTW!

Written by Barry Warsaw in technology on Fri 10 May 2013. Tags: python, python3, ubuntu,

I'm writing a bunch of new code these days for Ubuntu Touch's Image Based Upgrade system. Think of it essentially as Ubuntu Touch's version of upgrading the phone/tablet (affectionately called phablet) operating system in a bulk way rather than piecemeal apt-get s the way you do it on a traditional Ubuntu desktop or server. One of the key differences is that a phone has to detour through a reboot in order to apply an upgrade since its Ubuntu root file system is mounted read-only during the user session.

Anyway, those details aren't the focus of this article. Instead, just realize that because it's a pile of new code, and because we want to rid ourselves of Python 2, at least on the phablet image if not everywhere else in Ubuntu, I am prototyping all this in Python 3, and specifically 3.3. This means that I can use all the latest and greatest cool stuff in the most recent stable Python release. And man, is there a lot of cool stuff!

One module in particular that I'm especially fond of is contextlib. Context managers are objects implementing the protocol behind the with statement, and they are typically used to guarantee that some resource is cleaned up properly, even in the event of error conditions. When you see code like this:

with open(somefile) as fp:
    data = fp.read()

you are invoking a context manager. Python was clever enough to make file objects support the context manager protocol so that you never have to explicitly close the file; that happens automatically when the with statement completes, regardless of whether the code inside the with statement succeeds or raises an exception.

It's also very easy to define your own context managers to properly handle other kinds of resources. I won't go into too much detail here, because this is all well-established; the with statement has been, er, with us since Python 2.5.

You may be familiar with the contextlib module because of the @contextmanager decorator it provides. This makes it trivial to define a new context manager without having to deal with all the intricacies of the protocol. For example, here's how you would implement a context manager that temporarily changes the current working directory:

import os
from contextlib import contextmanager

@contextmanager
def chdir(dir):
    cwd = os.getcwd()
    try:
        os.chdir(dir)
        yield
    finally:
        os.chdir(cwd)

In this example, the yield cedes control back to the body of the with statement, and when that completes, the code after the yield is executed. Because the yield is wrapped inside a try/finally, it is guaranteed that the original working directory is restored. You would use this code like so:

with chdir('/tmp'):
    print(os.getcwd())

So far, so good, but this is nothing revolutionary. Python 3.3 brings additional awesomeness to contextlib by way of the new ExitStack class.

The documentation for ExitStack is a bit dense, and even the examples didn't originally make it clear to me how amazing this new API is. In my opinion, this is so powerful, it changes completely the way you think about deploying safe code.

So what is an ExitStack? One way to think about it is as an extensible context manager. It's used in with statements just like any other context manager:

from contextlib import ExitStack
with ExitStack() as stack:
    # do some magical stuff

Just like any other context manager, the ExitStack 's "exit" code is guaranteed to be run at the end of the with statement. It's the programmable extensibility of the ExitStack where the cool stuff happens.

The first interesting method of an ExitStack you might use is the callback() method. Let's say for example that in your with statement, you are creating a temporary directory and you want to make sure that temporary directory gets deleted when the with statement exits. You could do something like this:

import shutil, tempfile
with ExitStack() as stack:
    tempdir = tempfile.mkdtemp()
    stack.callback(shutil.rmtree, tempdir)

Now, when the with statement completes, it calls all of its callbacks, which includes removing the temporary directory.

So, what's the big deal? Let's say you're actually creating three temporary directories and any of those calls could fail. To guarantee that all successfully created directories are deleted at the end of the with statement, regardless of whether an exception occurred in the middle, you could do this:

with ExitStack() as stack:
    tempdirs = []
    for i in range(3):
        tempdir = tempfile.mkdtemp()
        stack.callback(shutil.rmtree, tempdir)
        tempdirs.append(tempdir)
    # Do something with the tempdirs

If you knew statically that you wanted three temporary directories, you could set this up with nested with statements, or a single with statement containing multiple backslash-separated targets, but that gets unwieldy very quickly. And besides, that's impossible if you only know the number of directories you need dynamically at run time. On the other hand, the ExitStack makes it easy to guarantee everything gets cleaned up and there are no leaks.

That's powerful enough, but it's not all you can do! Another very useful method is enter_context().

Let's say that you are opening a bunch of files and you want the following behavior: if all of the files open successfully, you want to do something with them, but if any of them fail to open, you want to make sure that the ones that did get open are guaranteed to get closed. Using ExitStack.enter_context() you can write code like this:

files = []
with ExitStack() as stack:
    for filename in filenames:
        # Open the file and automatically add its context manager to the
        # stack. enter_context() returns the passed in context manager,
        # i.e. the file object.
        fp = stack.enter_context(open(filename))
        files.append(fp)
    # Capture the close method, but do not call it yet.
    close_all_files = stack.pop_all().close

(Note that the contextlib documentation contains a more efficient, but denser way of writing the same thing.)

So what's going on here? First, the open(filename) does what it always does of course, it opens the file and returns a file object, which is also a context manager. However, instead of using that file object in a with statement, we add it to the ExitStack by passing it to the enter_context() method. For convenience, this method returns the passed in object.

So what happens if one of the open() calls fail before the loop completes? The with statement will exit as normal and the ExitStack will exit all the context managers it knows about. In other words, all the files that were successfully opened will get closed. Thus, in an error condition, you will be left with no open files and no leaked file descriptors, etc.

What happens if the loop completes and all files got opened successfully? Ah, that's where the next bit of goodness comes into play: the ExitStack 's pop_all() method.

pop_all() creates a new ExitStack, and populates it from the original ExitStack, removing all the context managers from the original ExitStack. So, after stack.pop_all() completes, the original ExitStack, i.e. the one used in the with statement, is now empty. When the with statement exits, the original ExitStack contains no context managers so none of the files are closed.

Well, then, how do you close all the files once you're done with them? That's the last bit of magic. ExitStack s have a .close() method which unwinds all the registered context managers and callbacks and invokes their exit functionality. So, after you're finally done with all the files and you want to clean everything up, you would just do:

close_all_files()

And that's it.

Hopefully that all makes sense. I know it took a while to sink in for me, but now that it has, it's clear the enormous power this gives you. You can write much safer code, in the sense that it's easier to ensure much better guarantees that your resources are cleaned up at the right time.

The real power comes when you have many different disparate resources to clean up for a particular operation. For example, in the test suite for the Image Based Upgrader, I have a test where I need to create a temporary directory and start an HTTP server in a thread. Roughly, my code looks like this:

@classmethod
def setUpClass(cls):
    cls._cleaner = ExitStack()
    try:
        cls._serverdir = tempfile.mkdtemp()
        cls._cleaner.callback(shutil.rmtree, cls._serverdir)
        # ...
        cls._stop = make_http_server(cls._serverdir)
        cls._cleaner.callback(cls._stop)
    except:
        cls._cleaner.pop_all().close()
        raise

@classmethod
def tearDownClass(cls):
    cls._cleaner.close()

Notice there's no with statement there at all. :) This is because the resources must remain open until tearDownClass() is called, unless some exception occurs during the setUpClass(). If that happens, the bare except will ensure that all the context managers are properly closed, leaving the original ExitStack empty. (The bare except is acceptable here because the exception is re-raised after the resources are cleaned up.) Even though the exception will prevent the tearDownClass() from being called, it's still safe to do so in case it is called for some odd reason, because the original ExitStack is empty.

But if no exception occurs, the original ExitStack will contain all the context managers that need to be closed, and calling .close() on it in the tearDownClass() does exactly that.

I have one more example from my recent code. Here, I need to create a GPG context (the details are unimportant), and then use that context to verify the detached signature of a file. If the signature matches, then everything's good, but if not, then I want to raise an exception and throw away both the data file and the signature (i.e. .asc) file. Here's the code:

with ExitStack() as stack:
    ctx = stack.enter_context(Context(pubkey_path))
    if not ctx.verify(asc_path, channels_path):
        # The signature did not verify, so arrange for the .json and .asc
        # files to be removed before we raise the exception.
        stack.callback(os.remove, channels_path)
        stack.callback(os.remove, asc_path)
        raise FileNotFoundError

Here we create the GPG context, which itself is a context manager, but instead of using it in a with statement, we add it to the ExitStack. Then we verify the detached signature (asc_path) of a data file (channels_path), and only arrange to remove those files if the verification fails. When the FileNotFoundError is raised, the ExitStack in the with statement unwinds, removing both files and closing the GPG context. Of course, if the signature matches, only the GPG context is closed -- the channels_path and asc_path files are not removed.

You can see how an ExitStack actually functions as a fairly generic resource manager!

To me, this revolutionizes the management of external resources. The new ExitStack object, and the methods and semantics it exposes, make it so much easier to manage those resources, guaranteeing that they get cleaned up at the right time, once and only once, regardless of whether errors occur or not.

ExitStack takes the already powerful concept of context managers and turns it up to 11. There's more you can do, and it's worth spending some time reading the contextlib documentation in Python 3.3, especially the examples and recipes.

As I mentioned on Twitter, it's features like this that make using Python 2 seem downright barbaric.

Resource management in Python 3.3, or contextlib.ExitStack FTW!

Written by Barry Warsaw in technology on Fri 10 May 2013. Tags: python, python3, ubuntu,

Comments