<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>scienceoss.com &#187; efficiency</title>
	<atom:link href="http://scienceoss.com/tags/efficiency/feed/" rel="self" type="application/rss+xml" />
	<link>http://scienceoss.com</link>
	<description>useful tidbits for using open source software in science</description>
	<lastBuildDate>Wed, 26 May 2010 03:34:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Test the speed of your code interactively in IPython</title>
		<link>http://scienceoss.com/test-the-speed-of-your-code-interactively-in-ipython/</link>
		<comments>http://scienceoss.com/test-the-speed-of-your-code-interactively-in-ipython/#comments</comments>
		<pubDate>Sun, 13 Apr 2008 16:49:54 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
				<category><![CDATA[IPython]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[efficiency]]></category>
		<category><![CDATA[speed]]></category>

		<guid isPermaLink="false">http://scienceoss.com/test-the-speed-of-your-code-interactively-in-ipython/</guid>
		<description><![CDATA[So you&#8217;ve come up with a couple of ways to solve a problem in code. But how do you decide which way is the best? One criterion to decide on is to use the one that makes the most sense to you. Another criterion is to use the version that is fastest. Here&#8217;s how to [...]]]></description>
			<content:encoded><![CDATA[<p>So you&#8217;ve come up with a couple of ways to solve a problem in code.  But how do you decide which way is the best?  One criterion to decide on is to use the one that makes the most sense to you.  Another criterion is to use the version that is fastest.  Here&#8217;s how to quickly determine which way is fastest using the interactive interpreter IPython.<span id="more-109"></span></p>
<p>Recently I posted a couple of <a href="http://scienceoss.com/sort-one-list-by-another-list/">different ways to sort one list by another list</a>.  But how to decide which one is faster?</p>
<p>I like using the <span class="c">timeit</span> magic function in IPython to time Python code.  It takes, as its argument, a single Python expression.  <span class="c">timeit</span> then returns the time it took the CPU to run that code.   Problem is, <span class="c">timeit</span> takes a single expression, but those sorting methods are multi-line statements. </p>
<p>No matter, we just wrap them in a function, and call the function as a single expression.  Here are the functions I&#8217;ll be testing:</p>
<pre class="brush: python; title: ; notranslate">
import numpy
def method1(x,y):
    z = zip(x,y)
    z.sort()
    return zip(*z)

def method2(x,y):
    inds = numpy.argsort(x)
    return numpy.take(y,inds)

def method3(x,y):
    xa = numpy.array(x)
    ya = numpy.array(y)
    inds = xa.argsort()
    return ya[inds]
</pre>
<p>OK, we need some lists to use for the test sorting.  How about the ones used from the previously mentioned post:</p>
<pre class="brush: python; title: ; notranslate">
people = ['Jim', 'Pam', 'Micheal', 'Dwight']
ages = [27, 25, 4, 9]
</pre>
<p>And now, to test each method.  If it&#8217;s a really fast bit of code, <span class="c">timeit</span> will run it many times (here, 100,000 or 10,000 times) to get a good estimate of how long it takes.</p>
<pre class="brush: python; title: ; notranslate">timeit method1(ages,people)  # 100000 loops, best of 3: 3.6 µs per loop
timeit method2(ages,people)  # 10000 loops, best of 3: 63.7 µs per loop
timeit method3(ages,people)  # 10000 loops, best of 3: 36.8 µs per loop</pre>
<p>These results suggest that the first method is an order of magnitude faster.  But wait a minute, method3() converts the input lists into arrays . . . I bet that takes some time.  How fast does it run if the data are already in an array?  Time for another function.  This one expects its arguments to be arrays already.</p>
<pre class="brush: python; title: ; notranslate">def method3a(x,y):
    inds = x.argsort()
    return y[inds]</pre>
<p>And let&#8217;s convert ages and people into arrays ahead of time so we can use <span class="c">method3a</span>:</p>
<pre class="brush: python; title: ; notranslate">array_ages = numpy.array(ages)
array_people = numpy.array(people)
</pre>
<p>The other methods still ought to run when the data are arrays.  Here are the results on my machine for all the methods so far:</p>
<pre class="brush: python; title: ; notranslate">timeit method1(array_ages, array_people)   # 100000 loops, best of 3: 6.29 µs per loop
timeit method2(array_ages, array_people)   # 100000 loops, best of 3: 6.88 µs per loop
timeit method3(array_ages, array_people)   # 100000 loops, best of 3: 5.12 µs per loop
timeit method3a(array_ages, array_people)  # 100000 loops, best of 3: 4.02 µs per loop</pre>
<p>So it looks like if the data are already in an array form, method3a looks like the fastest.</p>
<p>Let&#8217;s see if the results are consistent with a larger dataset, where you might actually perceive a difference in speed.</p>
<pre class="brush: python; title: ; notranslate">#The test arrays
xa = numpy.random.random(100000)
ya = numpy.random.random(100000)

# The test arrays converted into test lists
xlist = xa.tolist()
ylist = ya.tolist()

# Test the speed of sorting arrays
timeit method1(xa,ya)  # 10 loops, best of 3: 443 ms per loop
timeit method2(xa,ya)  # 10 loops, best of 3: 20.6 ms per loop
timeit method3(xa,ya)  # 10 loops, best of 3: 18.9 ms per loop
timeit method3a(xa,ya) # 10 loops, best of 3: 19.1 ms per loop

# Test the speed of sorting lists
timeit method1(xlist,ylist)  # 10 loops, best of 3: 391 ms per loop
timeit method2(xlist,ylist)  # 10 loops, best of 3: 51.3 ms per loop
timeit method3(xlist,ylist)  # 10 loops, best of 3: 55.1 ms per loop
# (can't test method3a since it won't accept lists)</pre>
<p>Interesting.  When your data is already in an array, <span class="c">method3</span> (which converts x and y into arrays) is actually faster than <span class="c">method3a</span> (which assumes they are already arrays)!  Not by much, but I wouldn&#8217;t have expected that two extra lines of code would actually make it faster.</p>
<p>The final results show that the answer depends on what form your data are in:
<ul>
<li>For small lists, use <span class="c">method1</span>.</li>
<li>For large lists, use <span class="c">method2</span>.</li>
<li>For small arrays, use <span class="c">method3a</span>.</li>
<li>For large arrays, use <span class="c">method3</span>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/test-the-speed-of-your-code-interactively-in-ipython/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

