cat /dev/brain - Adding Python 3 Compatibility to an Existing Codebase

Adding Python 3 Compatibility to an Existing Codebase

Lately, I found a new amount of energy and decided to help SQLObject add compatibility with Python 3. If you look at the Python 3 Wall of Superpowers you will notice that currently SQLObject is currently listed as Python 2 only (which is indicated by the lock symbol). While talking about this with some friends on IRC, a friend of one of my friends asked me to write up a detailed post about how to migrate a project to Python 3.

To provide context, I am vehemently opposed to projects that decide to maintain two separate code-bases for Python 2.x and 3.x compatibility. Flake8, PyFlakes, and pep8 all support Python 2.5, 2.6, 2.7, 3.2, 3.3, and 3.4. When 3.5 comes out next year, we'll also support that. Requests, Requests-Toolbelt, github3.py, betamax, and most of my other projects support all of the above except for 2.5. SQLObject recently dropped support for 2.5 in version 2.0.

My current plan for adding Python 3 compatibility to SQLObject is to first add support for Python 3.4. This is quite frankly the easiest version to target first. After you have it working successfully you can add support for 3.3 and even 3.2 if you so desire.

Too Long; Didn't Read

To do this you need:

A continuous integration system (like Jenkins or Travis CI)

This needs to run on every push to the project and preferably for each pull request made to the project.
Tests

If you don't have tests for your project, start writing them before you start porting your project. This is a good time to start writing a subset of Python 2 and 3 so you don't have to port your tests too.
Use PyFlakes or Flake8

If you hate PEP 8 or disagree with pep8 on some other level, then you can just use PyFlakes (Flake8 uses PyFlakes and pep8 by default). PyFlakes needs to be installed in virtual environments under both python 2 and python 3. It uses the AST to analyze your code and will catch problems like syntax errors.
Make the code run, then make it pass the tests

Once the codebase can run on Python 3 you'll probably find some problems. You may run into unicode (or bytes) issues or division problems. Either way, make sure the code can actually be executed by the interpreter, then fix it.
Take advantage of __future__.
Don't be afraid to use a library like six

Continuous Integration

The part that will make this easiest is if you have continuous integration configured for your project. You'll need a service that supports running your tests on more than one version of the python interpreter. To an extent, this excludes services like Drone IO which limit you to one version of Python. We want to add Python 3 compatibility, not port the codebase to only work on Python 3.

Pull Requests (Merge Requests, or whatever you call them)

When someone contributes to your project, you want to make sure their changes do not break compatibility with Python 2 or Python 3. Ideally, you want to find this out before you accept their contribution. Your continuous integration system should run for pull requests.

Tests

These are kind of crucial to the whole point of setting up CI. Your tests help ensure that you don't break too much while you add Python 3 compatibility to your codebase. If you don't already have tests, write these first. You should set continuous integration and your tox config now too but don't start trying to add Python 3 compatibility.

If you've never written tests before, seach Google. It has far better answers for you than I do. (Or you can wait long enough for me to blog about it.)

Add Flake8 (or just PyFlakes)

PyFlakes (and as a consequence, Flake8) can be installed on Python 3.4. When you install it on a specific version of Python it uses that version's ast module and will find mostly the same problems between versions. The difference is that syntax changes are caught by PyFlakes. By fixing these you can move your code towards being run by the Python 3 interpreter.

In this case, I tend to use tox to enable the tests to run Flake8 as a separate job on Travis CI. My tox.ini typically looks like

[tox]
envlist = py2{6,7},py3{2,3,4},pypy,{py27,py34}-flake8

# ...

[testenv:py27-flake8]
basepython = python2.7
deps =
    flake8
commands = flake8 {posargs}

[testenv:py34-flake8]
basepython = python3.4
deps =
    flake8
commands = flake8 {posargs}

And my .travis.yml looks something like

language: python
install: pip install tox>=1.8
script: tox -e ${TOX_ENV}
env:
  - TOXENV=py26
  - TOXENV=py27
  - TOXENV=py32
  - TOXENV=py33
  - TOXENV=py34
  - TOXENV=py27-flake8
  - TOXENV=py34-flake8

I tend to allow certain things to fail while working on adding compatibility. You can read how to do that over on Travis' documentation.

Make the code, then make it correct

Once you can get your tests to run on Python 3, your code probably runs on Python 3 too. If your tests aren't failing, then congratulations! You're a unicorn! In all likelihood, you now have to go correct your codebase, your tests, or both! Work through them slowly. Try to add more tests to make sure you don't add unintended behaviour when you find a failing test.

Take advantage of `future`

If you aren't already familiar with it, the __future__ module provides a large number of features for your convenience. For example, on Python 3, implicit relative imports are no longer allowed. All imports must be absolute.

For example, SQLObject's sqlobject/__init__.py file currently looks like:

"""SQLObject"""
from __version__ import version, version_info

from col import *
from index import *
from joins import *
from main import *
from sqlbuilder import AND, OR, NOT, IN, LIKE, RLIKE, DESC, CONTAINSSTRING, const, func
from styles import *
from dbconnection import connectionForURI
import dberrors

Implicitly importing modules that are relative is no longer allowed. If we rewrite this to work on Python 3, it would look something like:

"""SQLObject"""
from .__version__ import version, version_info

from .col import *
from .index import *
from .joins import *
from .main import *
from .sqlbuilder import AND, OR, NOT, IN, LIKE, RLIKE, DESC, CONTAINSSTRING, const, func
from .styles import *
from .dbconnection import connectionForURI
from . import dberrors

If you want to guard against this regressing or having new contributions break this, you can enforce this rule on Python 2 by adding:

from __future__ import absolute_import

This should be the first line of every module that imports anything.

__future__ also has print_function so that you can force usage of print to be a function instead of a statement. Also, let's be honest. The print function is way better than the old print statement.

Finally the other big deal is usually unicode. Python 2 has a way of working with the new unicode literals in Python 3 from __future__. If you do:

from __future__ import unicode_literals

a = "Some string"

Then a is actually bound to a unicode string on both Python 2 and Python 3. If you need ASCII/bytes you just do:

a = b"Some bytes"

In short, after module doc-strings and comments, the first imports in your files should be (as necessary):

from __future__ import absolute_import
from __future__ import print_function
from __future__ import unicode_literals
# or
from __future__ import (absolute_import, print_function, unicode_literals)

Use `six`

The six library by Benjamin Peterson is an excellent way to work around Python 2 and 3 compatibility problems. If you don't want your string literals to be unicode and you want it to be more explicit (while supporting Python 3.2), you'll love six.u:

my_unicode = six.u("Some text")
my_bytes = b"Some bytes"
my_native_string = "A string"  # ASCII/bytes on Py2, Unicode on Py3

Do you rely heavily on xrange, httplib, urllib2, urlparse, or any number of other renamed standard library modules? Good news, you can continue to import them from six.moves. For example:

from six.moves import httplib
# httplib will be httplib on Py2 and http.client on Py3

# The equivalent:
try:
    import httplib
except ImportError:
    import http.client as httplib

The former is much better, especially when you have a number of these to work with.

Conclusion

I don't have a very good conclusion for this post. There are probably tips and steps that I'm missing, but really it is just a bit of hard work and patience. You'll get there eventually. It isn't easy but I think the tips above make it easier. I really wish I had known all of this when I made chardet compatible in one codebase with both Python 2 and Python 3.

If you're interested in helping SQLObject we currently have some low hanging fruit that you can help us with. Make sure you comment to indicate you're working on one of the checklist items. We're still trying to make the code-base run on Python 3. Then we'll be able to focus on making it correct.

I hope this is helpful and somewhat insightful.