Test the speed of your code interactively in IPython

So you’ve come up with a couple of ways to solve a problem in code. But how do you decide which way is the best? One criterion to decide on is to use the one that makes the most sense to you. Another criterion is to use the version that is fastest. Here’s how to quickly determine which way is fastest using the interactive interpreter IPython.

Recently I posted a couple of different ways to sort one list by another list. But how to decide which one is faster?

I like using the timeit magic function in IPython to time Python code. It takes, as its argument, a single Python expression. timeit then returns the time it took the CPU to run that code. Problem is, timeit takes a single expression, but those sorting methods are multi-line statements.

No matter, we just wrap them in a function, and call the function as a single expression. Here are the functions I’ll be testing:

import numpy
def method1(x,y):
    z = zip(x,y)
    z.sort()
    return zip(*z)

def method2(x,y):
    inds = numpy.argsort(x)
    return numpy.take(y,inds)

def method3(x,y):
    xa = numpy.array(x)
    ya = numpy.array(y)
    inds = xa.argsort()
    return ya[inds]

OK, we need some lists to use for the test sorting. How about the ones used from the previously mentioned post:

people = ['Jim', 'Pam', 'Micheal', 'Dwight']
ages = [27, 25, 4, 9]

And now, to test each method. If it’s a really fast bit of code, timeit will run it many times (here, 100,000 or 10,000 times) to get a good estimate of how long it takes.

timeit method1(ages,people)  # 100000 loops, best of 3: 3.6 µs per loop
timeit method2(ages,people)  # 10000 loops, best of 3: 63.7 µs per loop
timeit method3(ages,people)  # 10000 loops, best of 3: 36.8 µs per loop

These results suggest that the first method is an order of magnitude faster. But wait a minute, method3() converts the input lists into arrays . . . I bet that takes some time. How fast does it run if the data are already in an array? Time for another function. This one expects its arguments to be arrays already.

def method3a(x,y):
    inds = x.argsort()
    return y[inds]

And let’s convert ages and people into arrays ahead of time so we can use method3a:

array_ages = numpy.array(ages)
array_people = numpy.array(people)

The other methods still ought to run when the data are arrays. Here are the results on my machine for all the methods so far:

timeit method1(array_ages, array_people)   # 100000 loops, best of 3: 6.29 µs per loop
timeit method2(array_ages, array_people)   # 100000 loops, best of 3: 6.88 µs per loop
timeit method3(array_ages, array_people)   # 100000 loops, best of 3: 5.12 µs per loop
timeit method3a(array_ages, array_people)  # 100000 loops, best of 3: 4.02 µs per loop

So it looks like if the data are already in an array form, method3a looks like the fastest.

Let’s see if the results are consistent with a larger dataset, where you might actually perceive a difference in speed.

#The test arrays
xa = numpy.random.random(100000)
ya = numpy.random.random(100000)

# The test arrays converted into test lists
xlist = xa.tolist()
ylist = ya.tolist()

# Test the speed of sorting arrays
timeit method1(xa,ya)  # 10 loops, best of 3: 443 ms per loop
timeit method2(xa,ya)  # 10 loops, best of 3: 20.6 ms per loop
timeit method3(xa,ya)  # 10 loops, best of 3: 18.9 ms per loop
timeit method3a(xa,ya) # 10 loops, best of 3: 19.1 ms per loop

# Test the speed of sorting lists
timeit method1(xlist,ylist)  # 10 loops, best of 3: 391 ms per loop
timeit method2(xlist,ylist)  # 10 loops, best of 3: 51.3 ms per loop
timeit method3(xlist,ylist)  # 10 loops, best of 3: 55.1 ms per loop
# (can't test method3a since it won't accept lists)

Interesting. When your data is already in an array, method3 (which converts x and y into arrays) is actually faster than method3a (which assumes they are already arrays)! Not by much, but I wouldn’t have expected that two extra lines of code would actually make it faster.

The final results show that the answer depends on what form your data are in:

  • For small lists, use method1.
  • For large lists, use method2.
  • For small arrays, use method3a.
  • For large arrays, use method3.

Tags: , ,

2 Responses to “Test the speed of your code interactively in IPython”

  1. Profiling Python functions with IPython’s timeit | OMGWTF GAMES !!1! Says:

    [...] at scienceoss.com theres a nice post about profiling functions using “timeit” within IPython. Essentially, in IPython, you can [...]

  2. xenocoder Says:

    Excellent tutorial! Now I see where the timeit function came from in that example that you used. Really clever and I am sure that this will be helpful in the future!

Leave a Reply