<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>scienceoss.com</title>
	<atom:link href="http://scienceoss.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://scienceoss.com</link>
	<description>useful tidbits for using open source software in science</description>
	<pubDate>Wed, 30 Jul 2008 19:38:30 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6</generator>
	<language>en</language>
			<item>
		<title>RPy: statistics in R from Python</title>
		<link>http://scienceoss.com/rpy-statistics-in-r-from-python/</link>
		<comments>http://scienceoss.com/rpy-statistics-in-r-from-python/#comments</comments>
		<pubDate>Sat, 26 Jul 2008 03:37:51 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
		
		<category><![CDATA[Python]]></category>

		<category><![CDATA[R]]></category>

		<category><![CDATA[matplotlib]]></category>

		<category><![CDATA[ssh]]></category>

		<category><![CDATA[linear regression]]></category>

		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://scienceoss.com/?p=4</guid>
		<description><![CDATA[R is a free, open source statistics package written by statisticians, for statisticians.  Python on the other hand lacks a comprehensive statistics package.  RPy allows you to combine the power of Python with the power of R for an unbeatable combination in data analysis.
Note that in order to use R from Python, you [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.r-project.org/">R</a> is a free, open source statistics package written by statisticians, for statisticians.  Python on the other hand lacks a comprehensive statistics package.  <a href="http://rpy.sourceforge.net/">RPy</a> allows you to combine the power of Python with the power of R for an unbeatable combination in data analysis.</p>
<p>Note that in order to use R from Python, you need to know a little of both . . . so the learning curve can be steep.  You also need to have a feel for what would be easy in R and what would be easy in Python.</p>
<p>There are some detailed examples below if you want to skip right to &#8216;em.</p>
<p>I use Python for most tasks, but when I need high-powered stats, I embed R code in my Python scripts to perform the analysis.</p>
<p>Disclaimer: I figured all of this stuff out by trial and error.  The RPy documentation, while complete, was difficult for me to make sense of when I was learning.  If there&#8217;s a better way to do things, please let me know!  For the details that I don&#8217;t cover here, check the <a href="http://rpy.sourceforge.net/rpy/doc/rpy_html/index.html">online documentation </a></p>
<h3>Why use R?</h3>
<p>You&#8217;ll need R if you want to do any sort of sophisticated (or even not-so-sophisiticated) statistical analysis.  There are no solid statistics libraries that I&#8217;ve come across for Python . . . but maybe that&#8217;s because R is the best possible statistics library there could be.  </p>
<p>Be warned however that accessing  R from Python can get tricky at times.  I&#8217;ve tried to outline some of what I&#8217;ve learned here to make it easier for others.</p>
<p>Why use RPy instead of writing files out to R, then using R scripts to deal with it?  I did this for a little while and found that it was too much work to maintain two separate code bases . . . one for Python, then one for R.  If I changed anything in the output of a Python script, I&#8217;d have to fire up R and open my R scripts to modify and debug them.  I&#8217;ve found that using RPy lets me put all my code in one spot, resulting in fewer bugs and less maintenance.  </p>
<h3>R and Python are separate . . .</h3>
<p>I found that the easiest way to think about this is to think about doing things &#8220;inside R&#8221; or &#8220;inside Python&#8221;.  Things that are to be done inside R are typically wrapped in a string (a Python string).  For example, this creates a variable inside R called <span class="c">x</span> with a value of 5.</p>
<pre name="code" class="python">
from rpy import *
r(&#039;x=5&#039;)
</pre>
<p>Assuming this was typed into a fresh Python session, Python has no idea about the existence of the variable <span class="c">x</span>!  It works in reverse, too: R has no idea about what&#8217;s in the Python namespace.  So you can do this in Python:</p>
<pre name="code" class="python">
x = &quot;I&#039;m a Python string&quot;
</pre>
<p>and the variable x inside R is still the same:</p>
<pre name="code" class="python">
r(&#039;print(x)&#039;)  # still 5
</pre>
<h3>. . . but they can talk to each other</h3>
<p>RPy does some automatic conversions:</p>
<pre name="code" class="python">
x_from_R = r(&#039;x&#039;)  # 5
</pre>
<p>What happened here is that RPy looked at what <span class="c">x</span> was inside R, saw that it was an integer, and returned that integer to Python, which assigned it to the Python variable <span class="c">x_from_R</span>.  So that&#8217;s how you get data from R to Python: by sending a string (the variable name you want to retrieve in R) to the <span class="c">r</span> object.</p>
<p>At first you might think this is how you send data from Python to R:</p>
<pre name="code" class="python">
r(&#039;x_from_python&#039;) = x
#SyntaxError: can&#039;t assign to function call
</pre>
<p>Nope.  Turns out you have to use the <span class="c">r.assign()</span> function to do that:</p>
<pre name="code" class="python">
r.assign(&#039;x_from_python&#039;, x)
r(&#039;print(x_from_python)&#039;)  # &quot;I&#039;m a Python string&quot;
</pre>
<p>So that&#8217;s how you get data from Python to R: by using the <span class="c">r.assign()</span> function, first giving the name of the variable you want to be assigned in R followed by the Python object to be sent to R.</p>
<h3>Other data types</h3>
<p>OK, so you can get integers back from R.  And as you can imagine, strings work the same way.  But what about more complex data types?  This <a href="http://rpy.sourceforge.net/rpy/doc/rpy_html/Basic-conversion.html#Basic-conversion">list of conversions</a> tells you which R objects will be converted into which Python objects.  It&#8217;s pretty intuitive, a string becomes a string, a list becomes a list, etc.</p>
<p>But then there are things like data frames in R, which have row names and column names.</p>
<p>It&#8217;s not on that list linked above, but an R data frame is converted to a Python dictionary.  For example, the Motor Trend car data set, which comes standard in R, is a data frame.</p>
<pre name="code" class="python">
from rpy import *
r(&#039;print(head(mtcars))&#039;) # print just the first 6 lines.  Note the variable names.

# Returns:
#                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
# Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
# Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
# Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
</pre>
<p>Now send the whole thing to Python and check the keys of the dictionary that is created:</p>
<pre name="code" class="python">
mt = r(&#039;mtcars&#039;)
mt.keys()
</pre>
<p>Note that the keys are the same as the variable names in the dataframe.</p>
<p>Just like you get a Python dictionary from a dataframe, you can send a dictionary to R:</p>
<pre name="code" class="python">
r.assign(&#039;df&#039;, dict(a=1, b=2, c=3))
r(&#039;print(df)&#039;)
r(&#039;names(df)&#039;)
</pre>
<p>May have to convert it into a dataframe once inside R though:</p>
<pre name="code" class="python">
r(&#039;df = data.frame(df)&#039;)
</pre>
<h3>R functions</h3>
<p>So far, with the exception of <span class="c">r.assign()</span>, we&#8217;ve just been sending strings to the <span class="c">r</span> object.  But the <span class="c">r</span> object also has methods.  Unfortunately, you can&#8217;t see them all using IPython&#8217;s introspection.  Personally I find that I don&#8217;t use this functionality that much, (I use <span class="c">r.assign()</span> to get the data into R and then operate on it in there) but here it is for completeness.</p>
<p>There is a trick here.  Remember, before we were sending a string to the <span class="c">r</span> object and it was executing the code inside R:</p>
<pre name="code" class="python">
r(&#039;x=5&#039;)
</pre>
<p>But when you use a method of the <span class="c">r</span> object, you pass it raw Python objects.  For example, you can plot a Python list in R using the <span class="c">plot()</span> method of the <span class="c">r</span> object:</p>
<pre name="code" class="python">
x = [1,2,3]
r.plot(x)
</pre>
<p>There are some slight name changes though.  R tends to use a &#8220;.&#8221; as a spacer in function names, like &#8220;_&#8221; tends to be used in Python.  The &#8220;.&#8221; however is special in Python, so in method names of the <span class="c">r</span> object, &#8220;.&#8221; is converted to &#8220;_&#8221;.  For example, R&#8217;s <span class="c">t.test()</span> function becomes <span class="c">r.t_test()</span>. </p>
<p>These methods of the <span class="c">r</span> object are what Python sees, so that&#8217;s why their names have to be changed.  On the other hand, you call R function with its true name when you send the <span class="c">r</span> object a string, like we were doing before.  So both of these refer to the same underlying t-test function in R:</p>
<pre name="code" class="python">
r.t_test
r(&#039;t.test&#039;)
</pre>
<p>This next one is tricky.  First, since <span class="c">print</span> is a Python function, it needs to have a slightly different name when you want to use the version in R.  So an underscore is added to the end.  Second, what&#8217;s in the parentheses is a Python string.  So all that will get printed is the string, &#8216;x&#8217; . . . not 5, or &#8220;I&#8217;m a Python string&#8221; or anything else.</p>
<pre name="code" class="python">
r.print_(&#039;x&#039;) # &#039;x&#039;
</pre>
<p>In practice though, if I want to print something I&#8217;ll either use Python&#8217;s <span class="c">print</span> or if I want to print something from R, I&#8217;ll do this:</p>
<pre name="code" class="python">
r(&#039;print(x)&#039;)  # prints 5
</pre>
<h3>Plotting examples</h3>
<p>Here&#8217;s are a couple of examples of creating a plot.  In each case a plot is created of the list 1,2,3.  These are trivial examples, but they illustrate different ways of getting data to and from R.</p>
<h4>Option 1: Do everything in R</h4>
<p>You can execute arbitrary R commands by sending them as a string to the <span class="c">r</span> object.  Here, everything is done in R: a list is created and plotted.  In this example, the variable <span class="c">x</span> is never seen by Python.</p>
<pre name="code" class="python">
from rpy import *
r(&quot;&quot;&quot;
y = c(1,2,3)
plot(y)
&quot;&quot;&quot;)
</pre>
<p>Note that you can send many R commands in a multi-line string.</p>
<h4>Option 2: Use a method of the <span class="c">r</span> object</h4>
<p>Here, we start with a Python list, and then send it as the argument to the <span class="c">r.plot()</span> method.</p>
<pre name="code" class="python">
from  rpy import *
y = [1,2,3]
r.plot(y)
</pre>
<h4>Option 3: Get a list from R and plot it with matplotlib in Python</h4>
<p>This trivial because you don&#8217;t gain anything from making a list in R instead of Python, but it shows that you can send data both ways.</p>
<pre name="code" class="python">
from r import *
import pylab as p
y = r(&#039;c(1,2,3)&#039;)
p.plot(y)
p.show()
</pre>
<h4>Option 4: Use <span class="c">r.assign()</span> to get data to R, then call it inside R</h4>
<p>I tend to use this method a lot with large data sets. The idea is to pass the data into R once, then you can use it from inside R.  The trick is to use the <span class="c">r.assign()</span> method.</p>
<pre name="code" class="python">
from rpy import *
y = [1,2,3]
r.assign(&#039;Y&#039;, y)
r(&#039;plot(Y)&#039;)
</pre>
<h3>Getting help on R functions</h3>
<p>Use the <span class="c">r.help()</span> function.   For example, to view the help on anova:</p>
<pre name="code" class="python">
r.help(anova)
</pre>
<p>This displays the help on screen; it doesn&#8217;t return a string.</p>
<h3>Non-trivial examples</h3>
<p>Plotting and printing things are not what you&#8217;d want to use R and RPy for.  Instead, you&#8217;d want to use them for things that you can&#8217;t do in available packages for Python.  </p>
<p>Here are some examples where R can really fill in the gaps in Python&#8217;s statistical functionality.  Anything you can do in R, you can do from Python.  Given the wide variety of packages available for R, this is some stupendous power at your fingertips.  Now to learn how to wield it!</p>
<h4>Linear models in R</h4>
<p>Say I have a Python script already up and running, and it returns some data . . . and I want to know if the slope of two variables is significant.  I haven&#8217;t found any statistics libraries for Python, but in R this kind of functionality comes standard, in the function <span class="c">lm()</span>.</p>
<p>Viewing the help for <span class="c">lm()</span>, you can see that it takes a model specification, like &#8220;y~x&#8221; which means &#8220;y on x&#8221;.  Now, the components of this model specification, y and x, can either refer to variables in the R workspace (which is separate from Python, remember) or they can be variables in a dataframe which is supplied in an optional argument to <span class="c">lm()</span>.</p>
<p>So first we need to figure out how to send the data to R; performing the linear regression should be trivial, then we need to get the data back out.</p>
<p>First, let&#8217;s set up some test data in Python:</p>
<pre name="code" class="python">
import numpy as npy
x = npy.arange(10)
y = npy.arange(10) + npy.random.standard_normal(x.shape)
</pre>
<p>Now send it to R:</p>
<pre name="code" class="python">
r.assign(&#039;x&#039;,x)
r.assign(&#039;y&#039;,y)
</pre>
<p>(exercise for the reader: instead of assigning x and y individually, how would you get them into R as a dataframe?)</p>
<p>In R, run the linear model and save it as a variable in R.  Here, I&#8217;m simultaneously saving it as a Python dictionary (sneaky!)</p>
<pre name="code" class="python">
LM = r(&#039;linear_model = lm(y~x)&#039;)
</pre>
<p>OK, here&#8217;s where it take a little exploring.  The dictionary you get back may take some navigating.  Looking at it for a little bit, you might notice the &#8216;coefficients&#8217; key of the dictionary LM, which in turn has two more keys: &#8216;(Intercept)&#8217; and &#8216;x&#8217;.</p>
<pre class="prettyprint"><code class="code">{'assign': [0, 1],
 'call': <Robj object at 0xb7d3e790>,
 'coefficients': {'(Intercept)': 0.28490682478866736,
                  'x': 0.86209804871669171},
 'df.residual': 8,
 'effects': array([-13.16882479,   7.83039439,   1.22245056,   0.18398967,
         0.51108108,   0.8141431 ,  -0.45120018,  -1.1985602 ,
         1.54636612,   0.51341949]),
 'fitted.values': array([ 0.28490682,  1.14700487,  2.00910292,  2.87120097,  3.73329902,
        4.59539707,  5.45749512,  6.31959317,  7.18169121,  8.04378926]),
 'model': {'x': array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.]),
           'y': array([-0.64212347,  1.39389811,  3.06676323,  2.84957073,  3.99793052,
        5.12226093,  4.67818603,  4.7520944 ,  8.3182891 ,  8.10661086])},
 'qr': {'pivot': [1, 2],
        'qr': array([[ -3.16227766, -14.23024947],
       [  0.31622777,   9.08295106],
       [  0.31622777,   0.15621147],
       [  0.31622777,   0.0461151 ],
       [  0.31622777,  -0.06398128],
       [  0.31622777,  -0.17407766],
       [  0.31622777,  -0.28417403],
       [  0.31622777,  -0.39427041],
       [  0.31622777,  -0.50436679],
       [  0.31622777,  -0.61446316]]),
        'qraux': [1.316227766016838, 1.2663078500948464],
        'rank': 2,
        'tol': 9.9999999999999995e-08},
 'rank': 2,
 'residuals': array([-0.92703029,  0.24689324,  1.05766031, -0.02163025,  0.2646315 ,
        0.52686386, -0.77930909, -1.56749877,  1.13659789,  0.0628216 ]),
 'terms': <Robj object at 0xb7d3e780>,
 'xlevels': {}}</code></pre>
<p>So if all we were after were the slope and intercept, then </p>
<pre name="code" class="python">
slope = LM[&#039;coefficients&#039;][&#039;x&#039;]
intercept = LM[&#039;coefficients&#039;][&#039;(Intercept)&#039;]
</pre>
<p>But what about a P-value for the slope?  It&#8217;s nowhere to be seen in that dictionary.  Turns out, you need the <span class="c">summary()</span> function in R, and it takes as its input a linear model (among other possible inputs, but here we&#8217;re just using a linear model).  So save it in R (just in case) and simultaneously save it in Python:</p>
<pre name="code" class="python">
summary = r(&#039;LM_summary = summary(linear_model)&#039;)
</pre>
<p>Hmm.  </p>
<pre class="prettyprint"><code class="code">{'adj.r.squared': 0.88847497651170382,
 'aliased': {'(Intercept)': False, 'x': False},
 'call': <Robj object at 0xb7d3e770>,
 'coefficients': array([[  2.84906825e-01,   5.39776217e-01,   5.27823968e-01,
          6.11943659e-01],
       [  8.62098049e-01,   1.01109349e-01,   8.52639301e+00,
          2.75251311e-05]]),
 'cov.unscaled': array([[ 0.34545455, -0.05454545],
       [-0.05454545,  0.01212121]]),
 'df': [2, 8, 2],
 'fstatistic': {'dendf': 8.0, 'numdf': 1.0, 'value': 72.699377758431851},
 'r.squared': 0.90086664578818121,
 'residuals': array([-0.92703029,  0.24689324,  1.05766031, -0.02163025,  0.2646315 ,
        0.52686386, -0.77930909, -1.56749877,  1.13659789,  0.0628216 ]),
 'sigma': 0.9183712712215929,
 'terms': <Robj object at 0xb7d3e7c0>}</code></pre>
<p>There&#8217;s the r-squared and adjusted r-squared,</p>
<pre name="code" class="python">
R_squared = summary[&#039;adj.r.squared&#039;]
</pre>
<p>but no P value.  What gives?  Turns out Python can&#8217;t convert everything perfectly, and a little more exploration is in order.  Try printing the summary from R:</p>
<pre name="code" class="python">
r(&#039;print(LM_summary)&#039;)
</pre>
<p>Well, that makes more sense, and you can see the P value for the slope is 2.75E-5.  But how to extract it from Python?</p>
<pre class="prettyprint"><code class="code">Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max
-1.5675 -0.5899  0.1549  0.4613  1.1366 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.2849     0.5398   0.528    0.612
x             0.8621     0.1011   8.526 2.75e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.9184 on 8 degrees of freedom
Multiple R-squared: 0.9009,	Adjusted R-squared: 0.8885
F-statistic:  72.7 on 1 and 8 DF,  p-value: 2.753e-05</code></pre>
<p>The trick is to match output from the summary printout in R with the dictionary returned to Python.  Here, it looks like the key &#8216;coefficients&#8217; in the summary dictionary in Python gives the numbers in the 2nd row, 3rd column:</p>
<pre name="code" class="python">
P = summary[&#039;coefficients&#039;][1,2]
</pre>
<p>Whew, and there you have it.  See, it takes some digging around to get what you need, but now since I&#8217;ve done the work for you, you can now do linear regressions from Python.  All together it looks like this (can be wrapped in a function or class for your own reuse):</p>
<pre name="code" class="python">
r.assign(&#039;x&#039;, x)
r.assign(&#039;y&#039;, y)
LM = r(&#039;linear_model = lm(y~x)&#039;)
summary = r(&#039;summary_LM = summary(linear_model)&#039;)
slope = LM[&#039;coefficients&#039;][&#039;x&#039;]
intercept = LM[&#039;coefficients&#039;][&#039;(Intercept)&#039;]
P = summary[&#039;coefficients&#039;][1,2]
</pre>
<h4>Redundancy analysis</h4>
<p>OK, say you have this data set to perform redundancy analysis (RDA) on.  First, you need the package <a href="http://vegan.r-forge.r-project.org/">vegan</a> installed, which is fantastic for multivariate stats.  It&#8217;s probably best to fire up R proper (from a command line, or the GUI if you have it in Windows or OSX) and run</p>
<pre class="prettyprint"><code class="code">install.packages("vegan", dep=T)</code></pre>
<p>Here&#8217;s a heavily commented script, <a href='http://scienceoss.com/wp-content/uploads/2008/07/rpy-demo.py'>rpy-demo.py</a>, that will:</p>
<ul>
<li>load and format the data included in the script</li>
<li>send the data to R</li>
<li>perform an RDA in R</li>
<li>plot the ordination</li>
<li>save the ordination as a PNG</li>
<li>print the variance explained by constrained and unconstrained axes as well as each RDA axis.</li>
</ul>
<p>If you have RPy installed and the vegan package installed, you should be able to just run this Python script.</p>
<p>Often-run analyses that you need R for can be wrapped in a class or module to encapsulate your data analysis needs, so you don&#8217;t need to clutter your code with it. Once things are set up that way, it would be as easy as</p>
<pre name="code" class="python">
from myRstuff import lm, rda
results = lm(x,y)
ordination = rda(data)
</pre>
<p>For much, much more see the <a href="http://rpy.sourceforge.net/rpy/doc/rpy_html/index.html">online documentation</a> for RPy, but hopefully I gave you enough to at least get started.</p>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/rpy-statistics-in-r-from-python/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Polar bar plot in Python</title>
		<link>http://scienceoss.com/polar-bar-plot-in-python/</link>
		<comments>http://scienceoss.com/polar-bar-plot-in-python/#comments</comments>
		<pubDate>Sun, 20 Jul 2008 14:43:06 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
		
		<category><![CDATA[Python]]></category>

		<category><![CDATA[matplotlib]]></category>

		<category><![CDATA[plotting]]></category>

		<guid isPermaLink="false">http://scienceoss.com/?p=10</guid>
		<description><![CDATA[Here&#8217;s how to create a polar bar plot in matplotlib.

The trick is just to specify that you want polar coordinates when you create the axis.  Then create a bar plot as normal.

from matplotlib.pyplot import figure, show
from math import pi

fig = figure()
ax = fig.add_subplot(111, polar=True)
x = [30,60,90,120,150,180]
x = [i*pi/180 for i in x]  # [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s how to create a polar bar plot in matplotlib.</p>
<p><br style="clear:both;"></p>
<p>The trick is just to specify that you want polar coordinates when you create the axis.  Then create a bar plot as normal.</p>
<pre class = "prettyprint"><code class = "code">
from matplotlib.pyplot import figure, show
from math import pi

fig = figure()
ax = fig.add_subplot(111, polar=True)
x = [30,60,90,120,150,180]
x = [i*pi/180 for i in x]  # convert to radians

ax.bar(x,[1,2,3,4,5,6], width=0.4)
show()
</code></pre>
<p><a href="http://scienceoss.com/wp-content/uploads/2008/07/polar-bar-plot-simple.png"><img src="http://scienceoss.com/wp-content/uploads/2008/07/polar-bar-plot-simple-300x225.png" alt="" title="polar-bar-plot-simple" width="300" height="225" class="alignnone size-medium wp-image-136" /></a></p>
<p>Note that in the above example the &#8220;right&#8221; or &#8220;clockwise-most&#8221; edge is lined up with each specified x value.  You can change this by subtracting <span class="c">width / 2</span> to each of the x values to center the bars on the x-values, like this:</p>
<pre class = "prettyprint"><code class = "code">
from matplotlib.pyplot import figure, show
from math import pi

width = 0.4  # width of the bars (in radians)

fig = figure()
ax = fig.add_subplot(111, polar=True)
x = [30,60,90,120,150,180]

# Convert to radians and subtract half the width
# of a bar to center it.
x = [i*pi/180 - width/2 for i in x]
ax.bar(x,[1,2,3,4,5,6], width=width)
show()
</code></pre>
<p><a href="http://scienceoss.com/wp-content/uploads/2008/07/polar-bar-plot-simple-centered.png"><img src="http://scienceoss.com/wp-content/uploads/2008/07/polar-bar-plot-simple-centered-300x225.png" alt="" title="polar-bar-plot-simple-centered" width="300" height="225" class="alignnone size-medium wp-image-137" /></a></p>
<h3>Get funky . . . </h3>
<p>The following is slightly modifed from the matplotlib examples:</p>
<pre class = "prettyprint"><code class = "code">
import numpy as npy
import matplotlib.cm as cm
from matplotlib.pyplot import figure, show, rc

# force square figure and square axes (looks better for polar, IMHO)
fig = figure(figsize=(8,8))
ax = fig.add_axes([0.1, 0.1, 0.8, 0.8], polar=True)

N = 20
theta = npy.arange(0.0, 2*npy.pi, 2*npy.pi/N)  # random angles
radii = 10*npy.random.rand(N)  # random bar heights
width = npy.pi/4*npy.random.rand(N) # random widths

# Create the bar plot
bars = ax.bar(theta, radii, width=width, bottom=0.0)

# Step through bars (a list of Rectangle objects) and
# change color based on its height and set its alpha transparency
# to 0.5

for r,bar in zip(radii, bars):
    bar.set_facecolor( cm.jet(r/10.))
    bar.set_alpha(0.5)

show()
</code></pre>
<p><a href="http://scienceoss.com/wp-content/uploads/2008/07/polar-bar-plot-alpha.png"><img src="http://scienceoss.com/wp-content/uploads/2008/07/polar-bar-plot-alpha-300x300.png" alt="" title="polar-bar-plot-alpha" width="300" height="300" class="alignnone size-medium wp-image-135" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/polar-bar-plot-in-python/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Record screencasts, convert to Flash, and embed on your site</title>
		<link>http://scienceoss.com/record-screencasts-convert-to-flash-and-embed-on-your-site/</link>
		<comments>http://scienceoss.com/record-screencasts-convert-to-flash-and-embed-on-your-site/#comments</comments>
		<pubDate>Sat, 03 May 2008 18:30:41 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
		
		<category><![CDATA[utilities]]></category>

		<category><![CDATA[convert]]></category>

		<category><![CDATA[embed]]></category>

		<category><![CDATA[ffmpeg]]></category>

		<category><![CDATA[flash]]></category>

		<category><![CDATA[flv]]></category>

		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://scienceoss.com/?p=124</guid>
		<description><![CDATA[Here&#8217;s a step-by-step tutorial of creating a screencast, converting it into an .flv file, and uploading it to your site with an embedded Flash player.
The tools

ffmpeg
    (On Ubuntu: sudo apt-get install ffmpeg)
gtk-recordmydesktop
    (On Ubuntu: sudo apt-get install gtk-recordmydesktop)
Jeroen Wijering&#8217;s embedded Flash player wizard (brilliant!)

The process
1. Record a screencast
Fire up [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a step-by-step tutorial of creating a screencast, converting it into an .flv file, and uploading it to your site with an embedded Flash player.<span id="more-124"></span></p>
<h3>The tools</h3>
<ul>
<li><a href="http://ffmpeg.mplayerhq.hu/">ffmpeg</a>
<p>    (On Ubuntu: <span class="c">sudo apt-get install ffmpeg</span>)</li>
<li><a href="http://recordmydesktop.iovar.org/about.php">gtk-recordmydesktop</a>
<p>    (On Ubuntu: <span class="c">sudo apt-get install gtk-recordmydesktop</span>)</li>
<li>Jeroen Wijering&#8217;s <a href="http://www.jeroenwijering.com/?page=wizard">embedded Flash player wizard</a> (brilliant!)</li>
</ul>
<h3>The process</h3>
<h4>1. Record a screencast</h4>
<p>Fire up gtk-recordmydesktop.  The screen looks like this:<br />
<a href='http://scienceoss.com/wp-content/uploads/2008/04/screenshot-recordmydesktop.png'><img src="http://scienceoss.com/wp-content/uploads/2008/04/screenshot-recordmydesktop-300x137.png" alt="" title="gtk-recordmydesktop screenshot" width="300" height="137" class="alignnone size-medium wp-image-125" /></a></p>
<p>Optionally press &#8220;Select Window&#8221; and click on a window to select it for recording.</p>
<p>Press record, and do your thing.</p>
<p>By default, it saves the file as <span class="c">out.ogg</span> in your home directory.  (more info on the <a href="http://en.wikipedia.org/wiki/Ogg">Ogg</a> format).  </p>
<h4>2. Convert the <span class="c">.ogg</span> to <span class="c">.flv</span></h4>
<p>You need to convert this into a Flash <span class="c">.flv</span> format.  Easiest way is to use <span class="c">ffmpeg</span>:</p>
<pre class="prettyprint"><code class="code">ffmpeg -i out.ogg out.flv</code></pre>
<p>or, optionally resize so that the output file will be 320&#215;250:</p>
<pre class="prettyprint"><code class="code">ffmpeg -i out.ogg -s 320x250 out.flv</code></pre>
<h4>3. Upload the .flv to your site</h4>
<p>Upload the new <span class="c">.flv</span> to your site (and remember where you put it).</p>
<h4>4. Use the embedded Flash player wizard</h4>
<p>Then go to Jeroen Wijering&#8217;s <a href="http://www.jeroenwijering.com/?page=wizard">embedded Flash player wizard</a>.  Paste the link to your newly uploaded <span class="c">.flv</span>.</p>
<p>Double check that the file shows up in the preview of the wizard.  When you&#8217;re happy with it, copy the source code shown in the wizard and paste it in your site.</p>
<h4>A note on embedding in WordPress</h4>
<p>Copying and pasting the code into the edit window for a post did not work properly at first in WordPress (even in the non-visual editor).  The source code showed up as plain text.  To fix this, I had to delete all the newlines so that the <code>embed</code> tag was all on one line.</p>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/record-screencasts-convert-to-flash-and-embed-on-your-site/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Interactive subplots: make all x-axes move together</title>
		<link>http://scienceoss.com/interactive-subplots-make-all-x-axes-move-together/</link>
		<comments>http://scienceoss.com/interactive-subplots-make-all-x-axes-move-together/#comments</comments>
		<pubDate>Sat, 03 May 2008 18:27:25 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
		
		<category><![CDATA[Python]]></category>

		<category><![CDATA[matplotlib]]></category>

		<category><![CDATA[plotting]]></category>

		<category><![CDATA[axes]]></category>

		<category><![CDATA[intertactive plotting]]></category>

		<category><![CDATA[subplot]]></category>

		<guid isPermaLink="false">http://scienceoss.com/?p=123</guid>
		<description><![CDATA[It&#8217;s very easy to make subplots that share an x-axis, so that when you pan and zoom on one axis, the others automatically pan and zoom as well.  The key to this functionality is the sharex keyword argument, which is used when creating an axis.  Here&#8217;s some example code and a video of [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s very easy to make subplots that share an x-axis, so that when you pan and zoom on one axis, the others automatically pan and zoom as well.  The key to this functionality is the <span class="c">sharex</span> keyword argument, which is used when creating an axis.  Here&#8217;s some example code and a video of the resulting interaction.<span id="more-123"></span></p>
<pre class="prettyprint"><code class="code">
from pylab import figure, show

# Create some data: one x, three different y
x = arange(0.0, 20.0, 0.01)
y1 = sin(2*pi*x)
y2 = exp(-x)
y3 = y1*y2

# Create a figure and add three subplots.
fig = figure()
ax1 = fig.add_subplot(311)
ax2 = fig.add_subplot(312, sharex=ax1)  # share ax1's xaxis
ax3 = fig.add_subplot(313, sharex=ax1)  # share ax1's xaxis

# Plot
ax1.plot(x,y1)
ax2.plot(x,y2)
ax3.plot(x,y3)

# Show the figure.
show()</code></pre>
<p>Here&#8217;s a video of what the interaction is like with this figure (matplotlib automatically adds the pan, zoom, home, etc buttons to all figures):</p>
<p><embed src="http://www.jeroenwijering.com/embed/player.swf" width="320" height="250" allowscriptaccess="always" allowfullscreen="true" flashvars="height=250&#038;width=320&#038;file=http://scienceoss.com/wp-content/uploads/2008/04/out.flv&#038;searchbar=false"/></p>
<p>Notice that the y-axis remained independent in each of the three subplots.  As you&#8217;d expect, the <span class="c">add_subplot()</span> method accepts a <span class="c">sharey</span> keyword argument as well.  You can even pass both <span class="c">sharex</span> and <span class="c">sharey</span> . . . this is most useful when two subplots show data with the same units.</p>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/interactive-subplots-make-all-x-axes-move-together/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Calculate sunrise and sunset with PyEphem</title>
		<link>http://scienceoss.com/calculate-sunrise-and-sunset-with-pyephem/</link>
		<comments>http://scienceoss.com/calculate-sunrise-and-sunset-with-pyephem/#comments</comments>
		<pubDate>Sat, 26 Apr 2008 03:14:22 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
		
		<category><![CDATA[Python]]></category>

		<category><![CDATA[Python modules]]></category>

		<category><![CDATA[pyephem]]></category>

		<category><![CDATA[sunrise]]></category>

		<category><![CDATA[sunset]]></category>

		<guid isPermaLink="false">http://scienceoss.com/?p=121</guid>
		<description><![CDATA[PyEphem (from the Greek word ephemeris) is the way to calculate the positions of all sorts of astronomical bodies in Python. 
I used it recently to calculate a year&#8217;s worth of sunrise and sunset times.  Using something as advanced as PyEphem for something this astronomically simple might be overkill, but it works well.  [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://rhodesmill.org/pyephem/">PyEphem</a> (from the Greek word <a href="http://http://en.wikipedia.org/wiki/Ephemeris">ephemeris</a>) is the way to calculate the positions of all sorts of astronomical bodies in Python. <span id="more-121"></span></p>
<p>I used it recently to calculate a year&#8217;s worth of sunrise and sunset times.  Using something as advanced as PyEphem for something this astronomically simple might be overkill, but it works well.  Here&#8217;s how I did it:</p>
<pre class="prettyprint"><code class="code">import ephem
import datetime

obs = ephem.Observer()
obs.lat = '38.8'
obs.long= '-75.2'

start_date = datetime.datetime(2008, 1, 1)
end_date = datetime.datetime(2008, 12, 31)
td = datetime.timedelta(days=1)

sun = ephem.Sun()

sunrises = []
sunsets = []
dates = []

date = start_date
while date < end_date:
    date += td
    dates.append(date)
    obs.date = date

    rise_time = obs.next_rising(sun).datetime()
    sunrises.append(rise_time)

    set_time = obs.next_setting(sun).datetime()
    sunsets.append(set_time)

</code></pre>
<p>To plot day length in hours over the course of a year, first run the above code.  Then (assuming you have <a href="http://matplotlib.sourceforge.net/">matplotlib</a>):</p>
<pre class="prettyprint"><code class="code">from pylab import *
daylens = []
for i in range(len(sunrises)):
    timediff = sunsets[i] - sunrises[i]
    hours = timediff.seconds / 60. / 60.  # to get it in hours
    daylens.append(hours)

plot(dates, daylens)

# if you have an older version of matplotlib, you may need
# to convert dates into numbers before plotting:
# dates = [date2num(i) for i in dates]

xlabel('Date')
ylabel('Hours')
title('Day length in 2008')
show()</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/calculate-sunrise-and-sunset-with-pyephem/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Use Sphinx for documentation</title>
		<link>http://scienceoss.com/use-sphinx-for-documentation/</link>
		<comments>http://scienceoss.com/use-sphinx-for-documentation/#comments</comments>
		<pubDate>Sat, 26 Apr 2008 02:45:20 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
		
		<category><![CDATA[Python]]></category>

		<category><![CDATA[utilities]]></category>

		<category><![CDATA[documentation]]></category>

		<category><![CDATA[sphinx]]></category>

		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://scienceoss.com/?p=117</guid>
		<description><![CDATA[I&#8217;ve been doing quite a bit of code documentation lately, and I decided to try and figure out the best tool to use.  I found it.  It&#8217;s called Sphinx, and you can see what the documentation looks like by checking out the documentation for Python itself (v. 2.6 and 3.0).
Here&#8217;s how to get [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been doing quite a bit of code documentation lately, and I decided to try and figure out the best tool to use.  I found it.  It&#8217;s called <a href="http://sphinx.pocoo.org/">Sphinx</a>, and you can see what the documentation looks like by checking out the documentation for Python itself (v. <a href="http://docs.python.org/dev/">2.6</a> and <a href="http://docs.python.org/dev/3.0/">3.0</a>).<br />
Here&#8217;s how to get started using Sphinx. <span id="more-117"></span><br />
Sphinx is the name of the program that takes your plain text files and converts them into hyperlinked, nicely formatted HTML documents.  It knows what to link and how to format based on simple formatting commands in the text files (specifically, it uses the <a href="http://docutils.sourceforge.net/rst.html">ReStructured Text (ReST)</a> markup language).</p>
<h2>Installation</h3>
<p>It is really, really easy.</p>
<p>If you have easy_install on your machine, running</p>
<pre class="prettyprint"><code class="code">easy_install sphinx</code></pre>
<p> from the command line will do it.</p>
<p>If you don&#8217;t have easy_install yet, <a href="http://peak.telecommunity.com/DevCenter/EasyInstall#installing-easy-install">download and install it</a>, then run <span class="c">easy_install sphinx</span> from the command line. </p>
<h3>Run <span class="c">sphinx-quickstart</span> to automatically setup a directory structure</h3>
<p>Navigate to the directory that you want to generate documentation for.  Run <span class="c">sphinx-quickstart</span> from the command line.  You will get a series of questions that lead you through the process of setting up the directories and some files that Sphinx needs.  Most questions have defaults that you can just accept no prob.  When it comes to version name, you can just make something up.</p>
<p>If you accepted all the defaults (you conformist, you!) there are now a couple of new files and directories:</p>
<pre class="prettyprint"><code class="code">New directory contents:
/.build
/.templates
/.static
index.rst
conf.py</code></pre>
<p>In a moment we&#8217;ll look at index.rst and conf.py.  But first . . .</p>
<h3>Create some content</h3>
<p>Time to make some content.  All content is created in plain text files.  Create a new text file in the same directory as <span class="c">index.rst</span> and <span class="c">conf.py</span>.  Call it <span class="c">chapter1.rst</span> or something . . . what&#8217;s important though is that you give it the &#8220;.rst&#8221; extension (if you chose the default option for &#8220;Source file suffix&#8221; when you ran <span class="c">sphinx-build</span>, otherwise the extension must be whatever you chose in that step). Sphinx will recognize files that should be included based on their extension.  Since I tend to have random, non-documentation .txt files laying around which Sphinx will ask me about, I prefer to use .rst instead of .txt.</p>
<p>Inside <span class="c">chapter1.rst</span>, type some stuff.  You could be documenting the code that&#8217;s in the current directory, for example.  Here are some formatting elements you can try for now.  The Sphinx site has <a href="http://sphinx.pocoo.org/rest.html">lots more info on formatting</a>.</p>
<pre class="prettyprint"><code class="code">This is a header
================
Some text, *italic text*, **bold text**

* bulleted list.  There needs to be a space right after the "*"
* item 2

.. note::
    This is a note.

Here's some Python code:

>>> for i in range(10):
...     print i</code></pre>
<p>Once Sphinx is set up, it will read in the plain text above and convert it into some nicely formatted HTML.  The results of the above plain text will look something like this:<br />
<a href='http://scienceoss.com/wp-content/uploads/2008/04/documentation-preview.png'><img src="http://scienceoss.com/wp-content/uploads/2008/04/documentation-preview-300x161.png" alt="" title="documentation preview" width="300" height="161" class="aligncenter size-medium wp-image-120" /></a></p>
<p>But first you have to run it through Sphinx in order for it to look that way, and before you can do that you have to tell Sphinx to include this document in its processing.</p>
<h3>Tell <span class="c">index.rst</span> what files to include</h3>
<p>Now we have to edit <span class="c">index.rst </span>to tell Sphinx that chapter1.rst should be included in the documentation.  By default, <span class="c">index.rst</span> looks like this:</p>
<h4>Change this . . .</h4>
<pre class="prettyprint"><code class="code">
Welcome to my project's documentation!
======================================

Contents:

.. toctree::
   :maxdepth: 2

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
</code></pre>
<h4>To this . . .</h4>
<p>(Just add the <span class="c">chapter1.rst</span> line)</p>
<pre class="prettyprint"><code class="code">Welcome to my project's documentation!
======================================

Contents:

.. toctree::
    :maxdepth: 2

    chapter1.rst

Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
</code></pre>
<p>The indentation of the chapter1.rst line is important.  So is the blank line above, and the blank line below.</p>
<p>BE CAREFUL . . . this burned me the first time, and took a while to figure out.  See the :maxdepth: 2 line?  <strong>It&#8217;s only indented 3 spaces.</strong>  It doesn&#8217;t have to be 3 spaces, it just happens to be in this auto-generated file.  If you have your text editor set for 4 spaces, or anything other than 3 spaces, when you hit tab on another line your indentation will be inconsistent, resulting in errors.</p>
<p>Usually I just add a space to make the :maxdepth: 2 line a 4-space indentation.</p>
<h3>Build the documentation</h3>
<p>Back at the command line, make a new directory to hold your documentation.  I&#8217;m going to call mine <span class="c">doc</span>.</p>
<pre class="prettyprint"><code class="code">mkdir doc</code></pre>
<p>Then, in the same directory as <span class="c">index.rst</span>, run</p>
<pre class="prettyprint"><code class="code">sphinx-build . doc</code></pre>
<p>You&#8217;ll get some output that tells you what it&#8217;s doing.  </p>
<h3>View the result</h3>
<p>Now you can open up doc/index.html to view your newly generated documentation.  It looks something like this:<br />
<a href='http://scienceoss.com/wp-content/uploads/2008/04/documentation0.png'><img src="http://scienceoss.com/wp-content/uploads/2008/04/documentation0-300x132.png" alt="" title="first look at documentation" width="300" height="132" class="aligncenter size-medium wp-image-118" /></a></p>
<p>Clicking on the &#8220;This is a header&#8221; link shows you the content you added in chapter1.rst, and it looks something like this:<br />
<a href='http://scienceoss.com/wp-content/uploads/2008/04/documentation1.png'><img src="http://scienceoss.com/wp-content/uploads/2008/04/documentation1-300x157.png" alt="" title="first look at documentation, next page" width="300" height="157" class="aligncenter size-medium wp-image-119" /></a></p>
<h3>So what&#8217;s the big deal?</h3>
<p>The above example could have been done with a little work in Microsoft Word.  Where Sphinx really shines, though, is when you use it to document Python code.</p>
<p>Sphinx can auto-document by reading in a module, then displaying the docstrings of objects in that source code.  It hyperlinks modules, classes, methods, attributes, etc.  It can even take the code you write as a tutorial and use it as doctests, which test your code to ensure that it is correct.  With a little more setup, you can generate LaTeX (and then PDF files) of your code, along with the good-looking syntax highlighting.  (see upcoming posts for how to do all of this)</p>
<p>I feel that other documentation tools, like <a href="http://epydoc.sourceforge.net/">epydoc</a>, have too much of an auto-generated look, and it&#8217;s too easy to include extraneous content.  In order to include tutorial info in epydoc documents, you have to add text to the beginning of a module.  While epydoc has some nice formatting (allowing you to tag text as parameters or return values), it tends to look ugly&#8211;sometimes to the point of unreadability.</p>
<p>I like Sphinx because it allows you to insert auto-generated documentation when you need it but tends to focus on high-quality, handwritten, tutorial-style content which I think is the most effective kind of documentation.</p>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/use-sphinx-for-documentation/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Insert content into TiddlyWikis with this Python script</title>
		<link>http://scienceoss.com/insert-content-into-tiddlywikis-with-this-python-script/</link>
		<comments>http://scienceoss.com/insert-content-into-tiddlywikis-with-this-python-script/#comments</comments>
		<pubDate>Thu, 17 Apr 2008 22:44:57 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
		
		<category><![CDATA[Python]]></category>

		<category><![CDATA[TiddlyWiki]]></category>

		<category><![CDATA[image]]></category>

		<category><![CDATA[script]]></category>

		<guid isPermaLink="false">http://scienceoss.com/insert-content-into-tiddlywikis-with-this-python-script/</guid>
		<description><![CDATA[I&#8217;ve been generating many figures, and I want to be able to find them again and browse them easily.  Organizing them on disk just isn&#8217;t cutting it.  My solution for now is to use a local TiddlyWiki as the glue for my figures, since I can embed figures in tiddlers (the microcontent entries [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been generating many figures, and I want to be able to find them again and browse them easily.  Organizing them on disk just isn&#8217;t cutting it.  My solution for now is to use a local <a href="http://tiddlywiki.com">TiddlyWiki</a> as the glue for my figures, since I can embed figures in tiddlers (the microcontent entries that are the bread and butter of TiddlyWikis), and tag and search those entries.  Bonus: I can zip everything up and send TiddlyWiki + images to my advisor so  he can browse and search them as well.</p>
<p>Try this Python script, <a href='http://scienceoss.com/wp-content/uploads/2008/04/addtiddler.py' title='addtiddler.py'>addtiddler.py</a>, to insert tiddlers into an existing TiddlyWiki.  You can optionally specify an image name (relative to the output file, see the documentation in the source code) to be embedded.  You can use this script from the command line using options, or import it into another script.</p>
<p>I tried to add lots of comments so you can modify it for your own needs.  Let me know if you find bugs so I can fix them.</p>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/insert-content-into-tiddlywikis-with-this-python-script/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Advanced sorting: sorting by key</title>
		<link>http://scienceoss.com/advanced-sorting-sorting-by-key/</link>
		<comments>http://scienceoss.com/advanced-sorting-sorting-by-key/#comments</comments>
		<pubDate>Mon, 14 Apr 2008 18:30:29 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
		
		<category><![CDATA[Python]]></category>

		<category><![CDATA[key]]></category>

		<category><![CDATA[sort]]></category>

		<guid isPermaLink="false">http://scienceoss.com/advanced-sorting-sorting-by-key/</guid>
		<description><![CDATA[The sort() method of list objects in Python is quite flexible.  By default, it sorts on the first thing in each item of the list, which is exactly what you would expect.  For example, a list of strings is sorted by the first letter of each string.    What if you [...]]]></description>
			<content:encoded><![CDATA[<p>The <span class="c">sort()</span> method of list objects in Python is quite flexible.  By default, it sorts on the first thing in each item of the list, which is exactly what you would expect.  For example, a list of strings is sorted by the first letter of each string.    What if you wanted to sort by the second letter of each string?  Or sort a list of people&#8217;s names by last name?<span id="more-110"></span></p>
<p>By default, the <span class="c">key</span> to sort by is the first letter of each string.  Or the first item in a sequence if it&#8217;s list of sequences.  But Python allows you to specify any key that you want, using the <span class="c">key</span> parameter for the <span class="c">sort()</span> function.  The <span class="c">key</span> is the name of a function.  </p>
<p>The way it works is this:  <span class="c">sort</span> runs every item in the list through the function.  Whatever the function returns is used to sort, overriding the default of using the first thing.  </p>
<h3>Example 1</h3>
<p>So to sort a list of strings by the second letter instead of the first letter then we simply need a function that returns the second letter of a string.  Here&#8217;s such a function, and how to use it as a key to <span class="c">sort</span>.  Note that <span class="c">key=secondletter</span>, NOT <span class="c">key=secondletter()</span>.  We&#8217;re specifying the reference to <span class="c">secondletter</span>, not trying to call it.</p>
<pre class="prettyprint"><code class="code">def secondletter(x):
    return x[1]

mylist = ['orange', 'banana', 'apple']

mylist.sort(key=secondletter)

# ['banana', 'apple', 'orange']
</code></pre>
<p>By the way, for simple one-liner functions like this, we could have used the <a href="http://docs.python.org/tut/node6.html#SECTION006750000000000000000">lambda syntax</a> instead of defining the <span class="c">secondletter</span> function:</p>
<pre class="prettyprint"><code class="code">mylist = ['orange', 'banana', 'apple']
mylist.sort(key=lambda x: x[1])</code></pre>
<h3>Example 2</h3>
<p>OK, while it&#8217;s a good first example, sorting on the second letter isn&#8217;t terribly useful.  How about sorting a list of people&#8217;s names by their last name?  We simply need a function to return the last name, and use the name of that function as the sort key.</p>
<pre class="prettyprint"><code class="code">def lastname(x):
    firstname, lastname = x.split()
    return lastname

presidents = ['Abraham Lincoln', 'George Washington',
              'Benjamin Harrison', 'Millard Fillmore']

presidents.sort(key=lastname)

#['Millard Fillmore',
# 'Benjamin Harrison',
# 'Abraham Lincoln',
# 'George Washington']</code></pre>
<p>Of course, in practice you would have to be careful with this . . . if there&#8217;s a middle name in there, then it would break the <span class="c">lastname</span> function.  This one works better:</p>
<pre class="prettyprint"><code class="code">def lastname2(x):
        return x.split()[-1]

not_all_presidents = ['Abraham Lincoln', 'George Washington', 'Benjamin Harrison',
                     'Millard Fillmore', 'Prince', 'Madonna', 'Arthur C. Clarke']

not_all_presidents.sort(key=lastname2)

#['Arthur C. Clarke',
# 'Millard Fillmore',
# 'Benjamin Harrison',
# 'Abraham Lincoln',
# 'Madonna',
# 'Prince',
# 'George Washington']</code></pre>
<p>&#8230;but if you have names like &#8216;King George III&#8217;, you&#8217;ll have to fix the function to deal with them.</p>
<h3>Example 3</h3>
<p>How about sorting a list of stocks by their maximum closing price for this week?  (Use <span class="c">reverse=True</span> so that highest are listed first)</p>
<pre class="prettyprint"><code class="code">stocks = [ [56, 94, 13, 90, 91], [33, 76, 22, 34, 105], [25, 28, 29, 30, 35] ]
stocks.sort(key=max, reverse=True)

# [[33, 76, 22, 34, 105], [56, 94, 13, 90, 91], [25, 28, 29, 30, 35]]
</code></pre>
<h3>Example 4</h3>
<p>You can get tricky&#8230;knowing that <span class="c">sort()</span> changes the list in-place, sort individual items by stock price, then sort each stock by its max.</p>
<pre class="prettyprint"><code class="code">def mymax(x):
    x.sort(reverse=True)
    return x[0]

stocks = [ [56, 94, 13, 90, 91], [33, 76, 22, 34, 105], [25, 28, 29, 30, 35] ]
stocks2 = stocks[:] # make a copy, cause we're about to change it

stocks2.sort(key=mymax, reverse=True)

#[[105, 76, 34, 33, 22], [94, 91, 90, 56, 13], [35, 30, 29, 28, 25]]</code></pre>
<h3>Example 5</h3>
<p>Or the sort a list by absolute value instead of paying attention to negative signs:</p>
<pre class="prettyprint"><code class="code">deviations = [10, -34, -5, 90, -87]
deviations.sort(key=abs)
# [-5, 10, -34, -87, 90]
</code></pre>
<p>As you can see, specifying the sort key can be pretty useful if you know it&#8217;s there.  Coming up with these examples really helped me see where this technique would be useful.  You can find more info on sorting on the <a href="http://wiki.python.org/moin/HowTo/Sorting">Python wiki</a>, and <a href="http://http://xahlee.org/perl-python/sort_list.html">comparisons between Python and Perl sorting</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/advanced-sorting-sorting-by-key/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Test the speed of your code interactively in IPython</title>
		<link>http://scienceoss.com/test-the-speed-of-your-code-interactively-in-ipython/</link>
		<comments>http://scienceoss.com/test-the-speed-of-your-code-interactively-in-ipython/#comments</comments>
		<pubDate>Sun, 13 Apr 2008 16:49:54 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
		
		<category><![CDATA[IPython]]></category>

		<category><![CDATA[Python]]></category>

		<category><![CDATA[algorithm]]></category>

		<category><![CDATA[efficiency]]></category>

		<category><![CDATA[speed]]></category>

		<guid isPermaLink="false">http://scienceoss.com/test-the-speed-of-your-code-interactively-in-ipython/</guid>
		<description><![CDATA[So you&#8217;ve come up with a couple of ways to solve a problem in code.  But how do you decide which way is the best?  One criterion to decide on is to use the one that makes the most sense to you.  Another criterion is to use the version that is fastest. [...]]]></description>
			<content:encoded><![CDATA[<p>So you&#8217;ve come up with a couple of ways to solve a problem in code.  But how do you decide which way is the best?  One criterion to decide on is to use the one that makes the most sense to you.  Another criterion is to use the version that is fastest.  Here&#8217;s how to quickly determine which way is fastest using the interactive interpreter IPython.<span id="more-109"></span></p>
<p>Recently I posted a couple of <a href="http://scienceoss.com/sort-one-list-by-another-list/">different ways to sort one list by another list</a>.  But how to decide which one is faster?</p>
<p>I like using the <span class="c">timeit</span> magic function in IPython to time Python code.  It takes, as its argument, a single Python expression.  <span class="c">timeit</span> then returns the time it took the CPU to run that code.   Problem is, <span class="c">timeit</span> takes a single expression, but those sorting methods are multi-line statements. </p>
<p>No matter, we just wrap them in a function, and call the function as a single expression.  Here are the functions I&#8217;ll be testing:</p>
<pre class="prettyprint"><code class="code">import numpy
def method1(x,y):
    z = zip(x,y)
    z.sort()
    return zip(*z)

def method2(x,y):
    inds = numpy.argsort(x)
    return numpy.take(y,inds)

def method3(x,y):
    xa = numpy.array(x)
    ya = numpy.array(y)
    inds = xa.argsort()
    return ya[inds]
</code></pre>
<p>OK, we need some lists to use for the test sorting.  How about the ones used from the previously mentioned post:</p>
<pre class="prettyprint"><code class="code">people = ['Jim', 'Pam', 'Micheal', 'Dwight']
ages = [27, 25, 4, 9]</code></pre>
<p>And now, to test each method.  If it&#8217;s a really fast bit of code, <span class="c">timeit</span> will run it many times (here, 100,000 or 10,000 times) to get a good estimate of how long it takes.</p>
<pre class="prettyprint"><code class="code">timeit method1(ages,people)  # 100000 loops, best of 3: 3.6 µs per loop
timeit method2(ages,people)  # 10000 loops, best of 3: 63.7 µs per loop
timeit method3(ages,people)  # 10000 loops, best of 3: 36.8 µs per loop</code></pre>
<p>These results suggest that the first method is an order of magnitude faster.  But wait a minute, method3() converts the input lists into arrays . . . I bet that takes some time.  How fast does it run if the data are already in an array?  Time for another function.  This one expects its arguments to be arrays already.</p>
<pre class="prettyprint"><code class="code">def method3a(x,y):
    inds = x.argsort()
    return y[inds]</code></pre>
<p>And let&#8217;s convert ages and people into arrays ahead of time so we can use <span class="c">method3a</span>:</p>
<pre class="prettyprint"><code class="code">array_ages = numpy.array(ages)
array_people = numpy.array(people)
</code></pre>
<p>The other methods still ought to run when the data are arrays.  Here are the results on my machine for all the methods so far:</p>
<pre class="prettyprint"><code class="code">timeit method1(array_ages, array_people)   # 100000 loops, best of 3: 6.29 µs per loop
timeit method2(array_ages, array_people)   # 100000 loops, best of 3: 6.88 µs per loop
timeit method3(array_ages, array_people)   # 100000 loops, best of 3: 5.12 µs per loop
timeit method3a(array_ages, array_people)  # 100000 loops, best of 3: 4.02 µs per loop</code></pre>
<p>So it looks like if the data are already in an array form, method3a looks like the fastest.</p>
<p>Let&#8217;s see if the results are consistent with a larger dataset, where you might actually perceive a difference in speed.</p>
<pre class="prettyprint"><code class="code">#The test arrays
xa = numpy.random.random(100000)
ya = numpy.random.random(100000)

# The test arrays converted into test lists
xlist = xa.tolist()
ylist = ya.tolist()

# Test the speed of sorting arrays
timeit method1(xa,ya)  # 10 loops, best of 3: 443 ms per loop
timeit method2(xa,ya)  # 10 loops, best of 3: 20.6 ms per loop
timeit method3(xa,ya)  # 10 loops, best of 3: 18.9 ms per loop
timeit method3a(xa,ya) # 10 loops, best of 3: 19.1 ms per loop

# Test the speed of sorting lists
timeit method1(xlist,ylist)  # 10 loops, best of 3: 391 ms per loop
timeit method2(xlist,ylist)  # 10 loops, best of 3: 51.3 ms per loop
timeit method3(xlist,ylist)  # 10 loops, best of 3: 55.1 ms per loop
# (can't test method3a since it won't accept lists)</code></pre>
<p>Interesting.  When your data is already in an array, <span class="c">method3</span> (which converts x and y into arrays) is actually faster than <span class="c">method3a</span> (which assumes they are already arrays)!  Not by much, but I wouldn&#8217;t have expected that two extra lines of code would actually make it faster.</p>
<p>The final results show that the answer depends on what form your data are in:
<ul>
<li>For small lists, use <span class="c">method1</span>.</li>
<li>For large lists, use <span class="c">method2</span>.</li>
<li>For small arrays, use <span class="c">method3a</span>.</li>
<li>For large arrays, use <span class="c">method3</span>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/test-the-speed-of-your-code-interactively-in-ipython/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Sort one list by another list</title>
		<link>http://scienceoss.com/sort-one-list-by-another-list/</link>
		<comments>http://scienceoss.com/sort-one-list-by-another-list/#comments</comments>
		<pubDate>Fri, 11 Apr 2008 14:14:26 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
		
		<category><![CDATA[NumPy]]></category>

		<category><![CDATA[Python]]></category>

		<category><![CDATA[argsort]]></category>

		<category><![CDATA[sort]]></category>

		<category><![CDATA[sorting]]></category>

		<guid isPermaLink="false">http://scienceoss.com/sort-one-list-by-another-list/</guid>
		<description><![CDATA[Here are a couple of ways of sorting one list by another list in Python.  The first uses plain ol&#8217; Python, and the others use NumPy.
In each case imagine we want to sort a list of peoples names by their ages.
Method 1
Zip the lists together, making sure that the one to sort by is [...]]]></description>
			<content:encoded><![CDATA[<p>Here are a couple of ways of sorting one list by another list in Python.  The first uses plain ol&#8217; Python, and the others use NumPy.</p>
<p>In each case imagine we want to sort a list of peoples names by their ages.<span id="more-108"></span></p>
<h3>Method 1</h3>
<p>Zip the lists together, making sure that the one to sort by is passed first to zip().  The result is a list of tuples.  When you sort a list of tuples, it sorts using the first item in each tuple.  Then use the zip* trick to unzip the now sorted tuples into separate variables.</p>
<pre class="prettyprint"><code class="code">people = ['Jim', 'Pam', 'Micheal', 'Dwight']
ages = [27, 25, 4, 9]

agesAndPeople = zip(ages, people)
agesAndPeople.sort()
sortedAges, sortedPeople = zip(*agesAndPeople)</code></pre>
<p>Note that if you want to sort in reverse, you can use <span class="c">agesAndPeople.sort(reverse=True)</span>.</p>
<h3>Method 2</h3>
<p>This method uses NumPy, and you don&#8217;t have to convert the lists into arrays.  The argsort() function doesn&#8217;t return the sorted ages . . . instead, it returns the indices that each item would if it were in an already sorted array (try it to see what I mean).  take() is a way of using useful NumPy indexing on a list.  See the next example for something that might be more straigtforward for Matlab users.</p>
<pre class="prettyprint"><code class="code">people = ['Jim', 'Pam', 'Micheal', 'Dwight']
ages = [27, 25, 4, 9]

import numpy
inds = numpy.argsort(ages)
sortedPeople = numpy.take(people, inds)</code></pre>
<h3>Method 3</h3>
<p>This method also uses NumPy, but first it converts the lists into arrays.  Then it uses the argsort() of one to index into the other.</p>
<pre class="prettyprint"><code class="code">people = ['Jim', 'Pam', 'Micheal', 'Dwight']
ages = [27, 25, 4, 9]

import numpy
people = numpy.array(people)
ages = numpy.array(ages)
inds = ages.argsort()
sortedPeople = people[inds]</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/sort-one-list-by-another-list/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
