So you’ve come up with a couple of ways to solve a problem in code. But how do you decide which way is the best? One criterion to decide on is to use the one that makes the most sense to you. Another criterion is to use the version that is fastest. Here’s how to quickly determine which way is fastest using the interactive interpreter IPython.
Recently I posted a couple of different ways to sort one list by another list. But how to decide which one is faster?
I like using the timeit magic function in IPython to time Python code. It takes, as its argument, a single Python expression. timeit then returns the time it took the CPU to run that code. Problem is, timeit takes a single expression, but those sorting methods are multi-line statements.
No matter, we just wrap them in a function, and call the function as a single expression. Here are the functions I’ll be testing:
import numpy def method1(x,y): z = zip(x,y) z.sort() return zip(*z) def method2(x,y): inds = numpy.argsort(x) return numpy.take(y,inds) def method3(x,y): xa = numpy.array(x) ya = numpy.array(y) inds = xa.argsort() return ya[inds]
OK, we need some lists to use for the test sorting. How about the ones used from the previously mentioned post:
people = ['Jim', 'Pam', 'Micheal', 'Dwight'] ages = [27, 25, 4, 9]
And now, to test each method. If it’s a really fast bit of code, timeit will run it many times (here, 100,000 or 10,000 times) to get a good estimate of how long it takes.
timeit method1(ages,people) # 100000 loops, best of 3: 3.6 µs per loop timeit method2(ages,people) # 10000 loops, best of 3: 63.7 µs per loop timeit method3(ages,people) # 10000 loops, best of 3: 36.8 µs per loop
These results suggest that the first method is an order of magnitude faster. But wait a minute, method3() converts the input lists into arrays . . . I bet that takes some time. How fast does it run if the data are already in an array? Time for another function. This one expects its arguments to be arrays already.
def method3a(x,y): inds = x.argsort() return y[inds]
And let’s convert ages and people into arrays ahead of time so we can use method3a:
array_ages = numpy.array(ages) array_people = numpy.array(people)
The other methods still ought to run when the data are arrays. Here are the results on my machine for all the methods so far:
timeit method1(array_ages, array_people) # 100000 loops, best of 3: 6.29 µs per loop timeit method2(array_ages, array_people) # 100000 loops, best of 3: 6.88 µs per loop timeit method3(array_ages, array_people) # 100000 loops, best of 3: 5.12 µs per loop timeit method3a(array_ages, array_people) # 100000 loops, best of 3: 4.02 µs per loop
So it looks like if the data are already in an array form, method3a looks like the fastest.
Let’s see if the results are consistent with a larger dataset, where you might actually perceive a difference in speed.
#The test arrays xa = numpy.random.random(100000) ya = numpy.random.random(100000) # The test arrays converted into test lists xlist = xa.tolist() ylist = ya.tolist() # Test the speed of sorting arrays timeit method1(xa,ya) # 10 loops, best of 3: 443 ms per loop timeit method2(xa,ya) # 10 loops, best of 3: 20.6 ms per loop timeit method3(xa,ya) # 10 loops, best of 3: 18.9 ms per loop timeit method3a(xa,ya) # 10 loops, best of 3: 19.1 ms per loop # Test the speed of sorting lists timeit method1(xlist,ylist) # 10 loops, best of 3: 391 ms per loop timeit method2(xlist,ylist) # 10 loops, best of 3: 51.3 ms per loop timeit method3(xlist,ylist) # 10 loops, best of 3: 55.1 ms per loop # (can't test method3a since it won't accept lists)
Interesting. When your data is already in an array, method3 (which converts x and y into arrays) is actually faster than method3a (which assumes they are already arrays)! Not by much, but I wouldn’t have expected that two extra lines of code would actually make it faster.
The final results show that the answer depends on what form your data are in:
- For small lists, use method1.
- For large lists, use method2.
- For small arrays, use method3a.
- For large arrays, use method3.