Wednesday, November 21, 2012

UDS Update #1 - OAuth

For UDS-R for Raring (i.e. Ubuntu 13.04) in Copenhagen, I sponsored three blueprints.  These blueprints represent most of the work I will be doing for the next 6 months, as we're well on our way to the next LTS, Ubuntu 14.04.

I'll provide some updates to the other blueprints later, but for now, I want to talk about OAuth and Python 3.  OAuth is a protocol which allows you to programmatically interact with certain website APIs, in an authenticated manner, without having to provide your website password.  Essentially, it allows you to generate an authorization token which you can use instead, and it allows you to manage and share these tokens with applications, so that you can revoke them if you want, or decide how and which applications to trust to act on your behalf.

A good example of a site that uses OAuth is Launchpad, but many other sites also support OAuth, such as Twitter and Facebook.

There are actually two versions of OAuth out there.  OAuth version 1 is definitely the more prevelent, since it has been around for years, is relatively simple (at least on the client side), and enshrined in RFC 5849.  There are tons of libraries available that support OAuth v1, in a multitude of languages, with Python being no exception.

OAuth v2 is much less common, since it is currently only a draft specification, and has had its share of design-by-committee controversy.  Still, some sites such as Facebook do require OAuth v2.

One of the very earliest Python libraries to support OAuth v1, on both the client and server side, was python-oauth (I'll use the Debian package names in this post), and on the Ubuntu desktop, you'll find lots of scripts and libraries that use python-oauth.  There are major problems with this library though, and I highly recommend not using it.  The biggest problems are that the code is abandoned by its upstream maintainer (it hasn't be updated on PyPI since 2009), and it is not Python 3 compatible.  Because the OAuth v2 draft came after this library was abandoned, it provides no support for the successor specification.

For this reason, one of the blueprints I sponsored was specifically to survey the alternatives available for Python programmers, and make a decision about which one we would officially endorse for Ubuntu.  By "official endorsement" I mean promote the library to other Python programmers (hence this post!) and to port all of our desktop scripts from python-oauth to the agreed upon library.

After some discussion, it was unanimous by the attendees of the UDS session (both in-person and remotely), to choose the python-oauthlib as our preferred library.

python-oauthlib has a lot going for it.  It's Python 3 compatible, has an active upstream maintainer, supports both RFC 5849 for v1, and closely follows the draft for v2.  It's a well-tested, solid library, and it is available in Ubuntu for both Python 2 and Python 3.  Probably the only negative is that the library does not provide any support for the server side.  This is not a major problem for our immediate plans, since there aren't any server applications on the Ubuntu desktop requiring OAuth.  Eventually, yes, we'll need server side support, but we can punt on that recommendation for now.

Another cool thing about python-oauthlib is that it has been adopted by the python-requests library, meaning, if you want to use a modern replacement for the urllib2/httplib2 circus which supports OAuth out of the box, you can just use python-requests, provide the appropriate parameters, and you get request signing for free.

So, as you'll see from the blueprint, there are several bugs linked to packages which need porting to python-oauthlib for Ubuntu 13.04, and I am actively working on them, though contributions, as always, are welcome!  I thought I'd include a little bit of code to show you how you might port from python-oauth to python-oauthlib.  We'll stick with OAuth v1 in this discussion.

The first thing to recognize is that python-oauth uses different, older terminology that predates the RFC.  Thus, you'll see references to a token key and token secret, as well as a consumer key and consumer secret.  In the RFC, and in python-oauthlib, these terms are client key, client secret, resource owner key, and resource owner secret respectively.  After you get over that hump, the rest pretty much falls into place.  As an example, here is a code snippet from the piston-mini-client library which used the old python-oauth library:

class OAuthAuthorizer(object):
    """Authenticate to OAuth protected APIs."""
    def __init__(self, token_key, token_secret, consumer_key, consumer_secret,
                 oauth_realm="OAuth"):
        """Initialize a ``OAuthAuthorizer``.

        ``token_key``, ``token_secret``, ``consumer_key`` and
        ``consumer_secret`` are required for signing OAuth requests.  The
        ``oauth_realm`` to use is optional.
        """
        self.token_key = token_key
        self.token_secret = token_secret
        self.consumer_key = consumer_key
        self.consumer_secret = consumer_secret
        self.oauth_realm = oauth_realm

    def sign_request(self, url, method, body, headers):
        """Sign a request with OAuth credentials."""
        # Import oauth here so that you don't need it if you're not going
        # to use it.  Plan B: move this out into a separate oauth module.
        from oauth.oauth import (OAuthRequest, OAuthConsumer, OAuthToken,
                                 OAuthSignatureMethod_PLAINTEXT)
        consumer = OAuthConsumer(self.consumer_key, self.consumer_secret)
        token = OAuthToken(self.token_key, self.token_secret)
        oauth_request = OAuthRequest.from_consumer_and_token(
            consumer, token, http_url=url)
        oauth_request.sign_request(OAuthSignatureMethod_PLAINTEXT(),
                                   consumer, token)
        headers.update(oauth_request.to_header(self.oauth_realm))


The constructor is pretty simple, and it uses the old OAuth terminology.  The key thing to notice is the way the old API required you to create a consumer, a token, and then a request object, then ask the request object to sign the request.  On top of all the other disadvantages, this isn't a very convenient API.  Let's look at the snippet after conversion to python-oauthlib.

class OAuthAuthorizer(object):
    """Authenticate to OAuth protected APIs."""
    def __init__(self, token_key, token_secret, consumer_key, consumer_secret,
                 oauth_realm="OAuth"):
        """Initialize a ``OAuthAuthorizer``.

        ``token_key``, ``token_secret``, ``consumer_key`` and
        ``consumer_secret`` are required for signing OAuth requests.  The
        ``oauth_realm`` to use is optional.
        """
        # 2012-11-19 BAW: python-oauthlib requires unicodes for its tokens and
        # secrets.  Assume utf-8 values.
        # https://github.com/idan/oauthlib/issues/68
        self.token_key = _unicodeify(token_key)
        self.token_secret = _unicodeify(token_secret)
        self.consumer_key = _unicodeify(consumer_key)
        self.consumer_secret = _unicodeify(consumer_secret)
        self.oauth_realm = oauth_realm

    def sign_request(self, url, method, body, headers):
        """Sign a request with OAuth credentials."""
        # 2012-11-19 BAW: In order to preserve API backward compatibility,
        # convert empty string body to None.  The old python-oauth library
        # would treat the empty string as "no body", but python-oauthlib
        # requires None.
        if not body:
            body = None
        # Import oauthlib here so that you don't need it if you're not going
        # to use it.  Plan B: move this out into a separate oauth module.
        from oauthlib.oauth1 import Client, SIGNATURE_PLAINTEXT
        oauth_client = Client(self.consumer_key, self.consumer_secret,
                              self.token_key, self.token_secret,
                              signature_method=SIGNATURE_PLAINTEXT,
                              realm=self.oauth_realm)
        uri, signed_headers, body = oauth_client.sign(
            url, method, body, headers)
        headers.update(signed_headers)


See how much nicer this is?  You need only create a client object, essentially using all the same bits of information.  Then you ask the client to sign the request, and update the request headers with the signature.  Much easier.

Two important things to note.  If you are doing an HTTP GET, there is no request body, and thus no request content which needs to contribute to the signature.  In python-oauth, you could specify an empty body by using either None or the empty string.  piston-mini-client uses the latter, and this is embodied in its public API.  python-oauthlib however, treats the empty string as a body being present, so it would require the Content-Type header to be set even for an HTTP GET which has no content (i.e. no body).  This is why the replacement code checks for an empty string being passed in (actually, any false-ish value), and coerces that to None.

The second issue is that python-oauthlib requires the keys and secrets to be Unicode objects; they cannot be bytes objects.  In code ported straight from Python 2 however, these values are usually 8-bit strings, and so become bytes objects in Python 3.  python-oauthlib will raise a ValueError during signing if any of these are bytes objects.  Thus the use of the _unicodeify() function to decode these values to unicodes.

def _unicodeify(s):
    if isinstance(s, bytes):
        return s.decode('utf-8')
    return s


The above works in both Python 2 and Python 3.  Of course, we don't know for sure that the bytes values are UTF-8, but it's the only sane encoding to expect, and if a client of piston-mini-client were to be so insane as to use an incompatible encoding (US-ASCII is fine because it's compatible with UTF-8), it would be up to the client to just pass in unicodes in the first place.  At the time of this writing, this is under active discussion with upstream, but for now, it's not too difficult to work around.

Anyway, I hope this helps, and I encourage you to help increase the popularity of python-oauthlib on the Cheeseshop, so that we can one day finally kill off the long defunct python-oauth library.

2 comments:

  1. OAuth 2 is not a draft anymore, for more than a month:
    http://dickhardt.org/2012/10/oauth-2-0/

    It has not one but two RFCs: 6749 and 6750.

    If you have an option, go with OAuth 2.

    (the captcha is atrocious)

    ReplyDelete
  2. @mariuss, thanks for the pointers to the rfcs. Here is the upstream github issue tracking library work to conform to the rfcs:

    https://github.com/idan/oauthlib/issues/75

    ReplyDelete