<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>scienceoss.com &#187; Python</title>
	<atom:link href="http://scienceoss.com/tags/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://scienceoss.com</link>
	<description>useful tidbits for using open source software in science</description>
	<lastBuildDate>Wed, 26 May 2010 03:34:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Python script to package Latex projects for distribution</title>
		<link>http://scienceoss.com/python-script-to-package-latex-projects-for-distribution/</link>
		<comments>http://scienceoss.com/python-script-to-package-latex-projects-for-distribution/#comments</comments>
		<pubDate>Sat, 14 Mar 2009 02:20:25 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
				<category><![CDATA[latex]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[archiving]]></category>

		<guid isPermaLink="false">http://scienceoss.com/?p=191</guid>
		<description><![CDATA[This is probably one of those scripts that will evolve over time, but I&#8217;m posting it now in case someone can get some use out of it. My problem was this: I had many, many figures in my working directory, but I didn&#8217;t use all of them in the Latex document. I was trying to [...]]]></description>
			<content:encoded><![CDATA[<p>This is probably one of those scripts that will evolve over time, but I&#8217;m posting it now in case someone can get some use out of it.  My problem was this:</p>
<p>I had many, many figures in my working directory, but I didn&#8217;t use all of them in the Latex document.  I was trying to figure out a way to send the source files &#8212; *.tex, *.cls, *.bst, *.bib, etc, plus only the images files that were actually in the document &#8212; to someone else so they could edit on their own and compile on their own.  I didn&#8217;t want to set up a version control (SVN, etc), I just wanted a tar file.</p>
<p>After some poking around I couldn&#8217;t find anything already made that would do this (Kile has an Archive menu item, but this doesn&#8217;t include figures).  It was easy enough to get a Python script going.  </p>
<p>This script parses an input file, looks at the various documents and figures that are included, and archives them in a tar.gz file which can then be sent to someone.  Note that as it stands, it only looks two levels deep for \include tags.  If I use this more I&#8217;ll have to make it recursive (it&#8217;s not obvious to me how to do that, I haven&#8217;t used recursion much before). </p>
<p>Consider this script a rough draft.  It worked perfectly for me, but your mileage may vary.</p>
<pre class="brush: python; title: ; notranslate">

&quot;&quot;&quot;
This script gathers the necessary images and files (from
an arbitrarily large number of unneeded figures) and
puts it all in a tarball for distribution.

Usage: latexpackager.py main.tex dissertation.tar.gz
&quot;&quot;&quot;

import sys
import re
import os
import tarfile

def find_references(f):
    '''Returns a list of Latex files that f refers to,
    by parsing \include, \bibliography, \bibliographystyle,
    \input, etc.

    If nothing was found, returns an empty list.'''

    s = open(f).read()

    # Find the .tex files.
    texs = []
    for i in re.finditer(r&quot;&quot;&quot;[^%]\\include\{(.*)\}&quot;&quot;&quot;, s):
        texs.append(i.groups()[0]+'.tex')

    # Find the .bib files.
    bibs = []
    for i in re.finditer(r&quot;&quot;&quot;[^%]\\bibliography\{(.*)\}&quot;&quot;&quot;, s):
        bibs.append(i.groups()[0]+'.bib')

    # Find the styles.
    styles = []
    for i in re.finditer(r&quot;&quot;&quot;[^%]\\bibliographystyle\{(.*)\}&quot;&quot;&quot;, s):
        styles.append(i.groups()[0]+'.bst')

    # Find the document class description file
    docclass = []
    for i in re.finditer(r&quot;&quot;&quot;[^%]\\documentclass\{(.*)\}&quot;&quot;&quot;, s):
        docclass.append(i.groups()[0]+'.cls')

    # Look for any inputs.
    inputs = []
    for i in re.finditer(r&quot;&quot;&quot;[^%]\\input\{(.*)\}&quot;&quot;&quot;, s):
        texs.append(i.groups()[0]+'.tex')

    # Here is everything that was referenced in f:
    return texs + bibs + styles + docclass + inputs

def find_figures(f):
    '''Returns a list of figures found in the file.  Only
    looks in .tex files.  If not a .tex file or no figures found,
    returns an empty list.'''

    # Short circuit if not a .tex file.
    if f[-4:] != '.tex':
        return []

    includegraphics = r&quot;&quot;&quot;[^%].*\\includegraphics\[.*\]\{([^\}]*)\}&quot;&quot;&quot;
    figures = []
    s = open(f).read()
    matches = re.finditer(includegraphics, s)

    for match in matches:
        basename = match.groups()[0]
        if basename[-4] == '.':
            # that is, it has an extension already.
            # This is for things like .png images.
            figures.append(basename)
        else:
            figures.append(basename + '.pdf')
            figures.append(basename + '.eps')

    return figures

main = sys.argv[1]
tarfn = sys.argv[2]

projectdir, main = os.path.split(main)
if projectdir == '':
    projectdir = os.getcwd()

keepers = find_references(main)

# Don't forget to add the main .tex file.
keepers.append(main)

# For each of those that main.tex referenced, look for more.
# These are files referenced two levels deep.

for f in keepers:
    if f[-4:] != '.tex':
        continue
    keepers.extend(find_references(f))

# Now look for graphics.

figures = []
for f in keepers:
    figures.extend(find_figures(f))

#paths = [os.path.join(projectdir, i) for i in keepers + figures]
paths = keepers + figures

tarball = tarfile.open(tarfn, 'w:gz')
for path in paths:
    print path
    tarball.add(path)
tarball.close()
</pre>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/python-script-to-package-latex-projects-for-distribution/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>RPy: statistics in R from Python</title>
		<link>http://scienceoss.com/rpy-statistics-in-r-from-python/</link>
		<comments>http://scienceoss.com/rpy-statistics-in-r-from-python/#comments</comments>
		<pubDate>Sat, 26 Jul 2008 03:37:51 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[linear regression]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://scienceoss.com/?p=4</guid>
		<description><![CDATA[R is a free, open source statistics package written by statisticians, for statisticians. Python on the other hand lacks a comprehensive statistics package. RPy allows you to combine the power of Python with the power of R for an unbeatable combination in data analysis. Note that in order to use R from Python, you need [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.r-project.org/">R</a> is a free, open source statistics package written by statisticians, for statisticians.  Python on the other hand lacks a comprehensive statistics package.  <a href="http://rpy.sourceforge.net/">RPy</a> allows you to combine the power of Python with the power of R for an unbeatable combination in data analysis.</p>
<p>Note that in order to use R from Python, you need to know a little of both . . . so the learning curve can be steep.  You also need to have a feel for what would be easy in R and what would be easy in Python.</p>
<p>There are some detailed examples below if you want to skip right to &#8216;em.</p>
<p>I use Python for most tasks, but when I need high-powered stats, I embed R code in my Python scripts to perform the analysis.</p>
<p>Disclaimer: I figured all of this stuff out by trial and error.  The RPy documentation, while complete, was difficult for me to make sense of when I was learning.  If there&#8217;s a better way to do things, please let me know!  For the details that I don&#8217;t cover here, check the <a href="http://rpy.sourceforge.net/rpy/doc/rpy_html/index.html">online documentation </a></p>
<h3>Why use R?</h3>
<p>You&#8217;ll need R if you want to do any sort of sophisticated (or even not-so-sophisiticated) statistical analysis.  There are no solid statistics libraries that I&#8217;ve come across for Python . . . but maybe that&#8217;s because R is the best possible statistics library there could be.  </p>
<p>Be warned however that accessing  R from Python can get tricky at times.  I&#8217;ve tried to outline some of what I&#8217;ve learned here to make it easier for others.</p>
<p>Why use RPy instead of writing files out to R, then using R scripts to deal with it?  I did this for a little while and found that it was too much work to maintain two separate code bases . . . one for Python, then one for R.  If I changed anything in the output of a Python script, I&#8217;d have to fire up R and open my R scripts to modify and debug them.  I&#8217;ve found that using RPy lets me put all my code in one spot, resulting in fewer bugs and less maintenance.  </p>
<h3>R and Python are separate . . .</h3>
<p>I found that the easiest way to think about this is to think about doing things &#8220;inside R&#8221; or &#8220;inside Python&#8221;.  Things that are to be done inside R are typically wrapped in a string (a Python string).  For example, this creates a variable inside R called <span class="c">x</span> with a value of 5.</p>
<pre class="brush: python; title: ; notranslate">from rpy import *
r('x=5')</pre>
<p>Assuming this was typed into a fresh Python session, Python has no idea about the existence of the variable <span class="c">x</span>!  It works in reverse, too: R has no idea about what&#8217;s in the Python namespace.  So you can do this in Python:</p>
<pre class="brush: python; title: ; notranslate">x = 'I'm a Python string'</pre>
<p>and the variable x inside R is still the same:</p>
<pre class="brush: python; title: ; notranslate">r('print(x)')  # still 5</pre>
<h3>. . . but they can talk to each other</h3>
<p>RPy does some automatic conversions:</p>
<pre class="brush: python; title: ; notranslate">x_from_R = r('x')  # 5</pre>
<p>What happened here is that RPy looked at what <span class="c">x</span> was inside R, saw that it was an integer, and returned that integer to Python, which assigned it to the Python variable <span class="c">x_from_R</span>.  So that&#8217;s how you get data from R to Python: by sending a string (the variable name you want to retrieve in R) to the <span class="c">r</span> object.</p>
<p>At first you might think this is how you send data from Python to R:</p>
<pre class="brush: python; title: ; notranslate">r('x_from_python') = x
#SyntaxError: can't assign to function call</pre>
<p>Nope.  Turns out you have to use the <span class="c">r.assign()</span> function to do that:</p>
<pre class="brush: python; title: ; notranslate">r.assign('x_from_python', x)
r('print(x_from_python)')  # 'I'm a Python string'</pre>
<p>So that&#8217;s how you get data from Python to R: by using the <span class="c">r.assign()</span> function, first giving the name of the variable you want to be assigned in R followed by the Python object to be sent to R.</p>
<h3>Other data types</h3>
<p>OK, so you can get integers back from R.  And as you can imagine, strings work the same way.  But what about more complex data types?  This <a href="http://rpy.sourceforge.net/rpy/doc/rpy_html/Basic-conversion.html#Basic-conversion">list of conversions</a> tells you which R objects will be converted into which Python objects.  It&#8217;s pretty intuitive, a string becomes a string, a list becomes a list, etc.</p>
<p>But then there are things like data frames in R, which have row names and column names.</p>
<p>It&#8217;s not on that list linked above, but an R data frame is converted to a Python dictionary.  For example, the Motor Trend car data set, which comes standard in R, is a data frame.</p>
<pre class="brush: python; title: ; notranslate">from rpy import *
r('print(head(mtcars))') # print just the first 6 lines.  Note the variable names.

# Returns:
#                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
# Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
# Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
# Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
</pre>
<p>Now send the whole thing to Python and check the keys of the dictionary that is created:</p>
<pre class="brush: python; title: ; notranslate">mt = r('mtcars')
mt.keys()</pre>
<p>Note that the keys are the same as the variable names in the dataframe.</p>
<p>Just like you get a Python dictionary from a dataframe, you can send a dictionary to R:</p>
<pre class="brush: python; title: ; notranslate">r.assign('df', dict(a=1, b=2, c=3))
r('print(df)')
r('names(df)')
</pre>
<p>May have to convert it into a dataframe once inside R though:</p>
<pre class="brush: python; title: ; notranslate">
r('df = data.frame(df)')
</pre>
<h3>R functions</h3>
<p>So far, with the exception of <span class="c">r.assign()</span>, we&#8217;ve just been sending strings to the <span class="c">r</span> object.  But the <span class="c">r</span> object also has methods.  Unfortunately, you can&#8217;t see them all using IPython&#8217;s introspection.  Personally I find that I don&#8217;t use this functionality that much, (I use <span class="c">r.assign()</span> to get the data into R and then operate on it in there) but here it is for completeness.</p>
<p>There is a trick here.  Remember, before we were sending a string to the <span class="c">r</span> object and it was executing the code inside R:</p>
<pre class="brush: python; title: ; notranslate">r('x=5')</pre>
<p>But when you use a method of the <span class="c">r</span> object, you pass it raw Python objects.  For example, you can plot a Python list in R using the <span class="c">plot()</span> method of the <span class="c">r</span> object:</p>
<pre class="brush: python; title: ; notranslate">x = [1,2,3]
r.plot(x)</pre>
<p>There are some slight name changes though.  R tends to use a &#8220;.&#8221; as a spacer in function names, like &#8220;_&#8221; tends to be used in Python.  The &#8220;.&#8221; however is special in Python, so in method names of the <span class="c">r</span> object, &#8220;.&#8221; is converted to &#8220;_&#8221;.  For example, R&#8217;s <span class="c">t.test()</span> function becomes <span class="c">r.t_test()</span>. </p>
<p>These methods of the <span class="c">r</span> object are what Python sees, so that&#8217;s why their names have to be changed.  On the other hand, you call R function with its true name when you send the <span class="c">r</span> object a string, like we were doing before.  So both of these refer to the same underlying t-test function in R:</p>
<pre class="brush: python; title: ; notranslate">r.t_test
r('t.test')</pre>
<p>This next one is tricky.  First, since <span class="c">print</span> is a Python function, it needs to have a slightly different name when you want to use the version in R.  So an underscore is added to the end.  Second, what&#8217;s in the parentheses is a Python string.  So all that will get printed is the string, &#8216;x&#8217; . . . not 5, or &#8220;I&#8217;m a Python string&#8221; or anything else.</p>
<pre class="brush: python; title: ; notranslate">r.print_('x') # 'x'</pre>
<p>In practice though, if I want to print something I&#8217;ll either use Python&#8217;s <span class="c">print</span> or if I want to print something from R, I&#8217;ll do this:</p>
<pre class="brush: python; title: ; notranslate">r('print(x)')  # prints 5</pre>
<h3>Plotting examples</h3>
<p>Here&#8217;s are a couple of examples of creating a plot.  In each case a plot is created of the list 1,2,3.  These are trivial examples, but they illustrate different ways of getting data to and from R.</p>
<h4>Option 1: Do everything in R</h4>
<p>You can execute arbitrary R commands by sending them as a string to the <span class="c">r</span> object.  Here, everything is done in R: a list is created and plotted.  In this example, the variable <span class="c">x</span> is never seen by Python.</p>
<pre class="brush: python; title: ; notranslate">from rpy import *
r(&quot;&quot;&quot;
y = c(1,2,3)
plot(y)
&quot;&quot;&quot;)</pre>
<p>Note that you can send many R commands in a multi-line string.</p>
<h4>Option 2: Use a method of the <span class="c">r</span> object</h4>
<p>Here, we start with a Python list, and then send it as the argument to the <span class="c">r.plot()</span> method.</p>
<pre class="brush: python; title: ; notranslate">from  rpy import *
y = [1,2,3]
r.plot(y)</pre>
<h4>Option 3: Get a list from R and plot it with matplotlib in Python</h4>
<p>This trivial because you don&#8217;t gain anything from making a list in R instead of Python, but it shows that you can send data both ways.</p>
<pre class="brush: python; title: ; notranslate">from r import *
import pylab as p
y = r('c(1,2,3)')
p.plot(y)
p.show()</pre>
<h4>Option 4: Use <span class="c">r.assign()</span> to get data to R, then call it inside R</h4>
<p>I tend to use this method a lot with large data sets. The idea is to pass the data into R once, then you can use it from inside R.  The trick is to use the <span class="c">r.assign()</span> method.</p>
<pre class="brush: python; title: ; notranslate">from rpy import *
y = [1,2,3]
r.assign('Y', y)
r('plot(Y)')</pre>
<h3>Getting help on R functions</h3>
<p>Use the <span class="c">r.help()</span> function.   For example, to view the help on anova:</p>
<pre>r.help(anova)</pre>
<p>This displays the help on screen; it doesn&#8217;t return a string.</p>
<h3>Non-trivial examples</h3>
<p>Plotting and printing things are not what you&#8217;d want to use R and RPy for.  Instead, you&#8217;d want to use them for things that you can&#8217;t do in available packages for Python.  </p>
<p>Here are some examples where R can really fill in the gaps in Python&#8217;s statistical functionality.  Anything you can do in R, you can do from Python.  Given the wide variety of packages available for R, this is some stupendous power at your fingertips.  Now to learn how to wield it!</p>
<h4>Linear models in R</h4>
<p>Say I have a Python script already up and running, and it returns some data . . . and I want to know if the slope of two variables is significant.  I haven&#8217;t found any statistics libraries for Python, but in R this kind of functionality comes standard, in the function <span class="c">lm()</span>.</p>
<p>Viewing the help for <span class="c">lm()</span>, you can see that it takes a model specification, like &#8220;y~x&#8221; which means &#8220;y on x&#8221;.  Now, the components of this model specification, y and x, can either refer to variables in the R workspace (which is separate from Python, remember) or they can be variables in a dataframe which is supplied in an optional argument to <span class="c">lm()</span>.</p>
<p>So first we need to figure out how to send the data to R; performing the linear regression should be trivial, then we need to get the data back out.</p>
<p>First, let&#8217;s set up some test data in Python:</p>
<pre class="brush: python; title: ; notranslate">
import numpy as npy
x = npy.arange(10)
y = npy.arange(10) + npy.random.standard_normal(x.shape)&lt;/pre&gt;

Now send it to R:
&lt;pre&gt;r.assign('x',x)
r.assign('y',y)</pre>
<p>(exercise for the reader: instead of assigning x and y individually, how would you get them into R as a dataframe?)</p>
<p>In R, run the linear model and save it as a variable in R.  Here, I&#8217;m simultaneously saving it as a Python dictionary (sneaky!)</p>
<pre class="brush: python; title: ; notranslate">LM = r('linear_model = lm(y~x)')</pre>
<p>OK, here&#8217;s where it take a little exploring.  The dictionary you get back may take some navigating.  Looking at it for a little bit, you might notice the &#8216;coefficients&#8217; key of the dictionary LM, which in turn has two more keys: &#8216;(Intercept)&#8217; and &#8216;x&#8217;.</p>
<pre class="brush: plain; title: ; notranslate">{'assign': [0, 1],
 'call': &lt;Robj object at 0xb7d3e790&gt;,
 'coefficients': {'(Intercept)': 0.28490682478866736,
                  'x': 0.86209804871669171},
 'df.residual': 8,
 'effects': array([-13.16882479,   7.83039439,   1.22245056,   0.18398967,
         0.51108108,   0.8141431 ,  -0.45120018,  -1.1985602 ,
         1.54636612,   0.51341949]),
 'fitted.values': array([ 0.28490682,  1.14700487,  2.00910292,  2.87120097,  3.73329902,
        4.59539707,  5.45749512,  6.31959317,  7.18169121,  8.04378926]),
 'model': {'x': array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.]),
           'y': array([-0.64212347,  1.39389811,  3.06676323,  2.84957073,  3.99793052,
        5.12226093,  4.67818603,  4.7520944 ,  8.3182891 ,  8.10661086])},
 'qr': {'pivot': [1, 2],
        'qr': array([[ -3.16227766, -14.23024947],
       [  0.31622777,   9.08295106],
       [  0.31622777,   0.15621147],
       [  0.31622777,   0.0461151 ],
       [  0.31622777,  -0.06398128],
       [  0.31622777,  -0.17407766],
       [  0.31622777,  -0.28417403],
       [  0.31622777,  -0.39427041],
       [  0.31622777,  -0.50436679],
       [  0.31622777,  -0.61446316]]),
        'qraux': [1.316227766016838, 1.2663078500948464],
        'rank': 2,
        'tol': 9.9999999999999995e-08},
 'rank': 2,
 'residuals': array([-0.92703029,  0.24689324,  1.05766031, -0.02163025,  0.2646315 ,
        0.52686386, -0.77930909, -1.56749877,  1.13659789,  0.0628216 ]),
 'terms': &lt;Robj object at 0xb7d3e780&gt;,
 'xlevels': {}}</pre>
<p>So if all we were after were the slope and intercept, then </p>
<pre class="brush: python; title: ; notranslate">
slope = LM['coefficients']['x']
intercept = LM['coefficients']['(Intercept)']</pre>
<p>But what about a P-value for the slope?  It&#8217;s nowhere to be seen in that dictionary.  Turns out, you need the <span class="c">summary()</span> function in R, and it takes as its input a linear model (among other possible inputs, but here we&#8217;re just using a linear model).  So save it in R (just in case) and simultaneously save it in Python:</p>
<pre class="brush: python; title: ; notranslate">summary = r('LM_summary = summary(linear_model)')</pre>
<p>Hmm.  </p>
<pre class="brush: plain; title: ; notranslate">{'adj.r.squared': 0.88847497651170382,
 'aliased': {'(Intercept)': False, 'x': False},
 'call': &lt;Robj object at 0xb7d3e770&gt;,
 'coefficients': array([[  2.84906825e-01,   5.39776217e-01,   5.27823968e-01,
          6.11943659e-01],
       [  8.62098049e-01,   1.01109349e-01,   8.52639301e+00,
          2.75251311e-05]]),
 'cov.unscaled': array([[ 0.34545455, -0.05454545],
       [-0.05454545,  0.01212121]]),
 'df': [2, 8, 2],
 'fstatistic': {'dendf': 8.0, 'numdf': 1.0, 'value': 72.699377758431851},
 'r.squared': 0.90086664578818121,
 'residuals': array([-0.92703029,  0.24689324,  1.05766031, -0.02163025,  0.2646315 ,
        0.52686386, -0.77930909, -1.56749877,  1.13659789,  0.0628216 ]),
 'sigma': 0.9183712712215929,
 'terms': &lt;Robj object at 0xb7d3e7c0&gt;}</pre>
<p>There&#8217;s the r-squared and adjusted r-squared,</p>
<pre class="brush: python; title: ; notranslate">R_squared = summary['adj.r.squared']</pre>
<p>but no P value.  What gives?  Turns out Python can&#8217;t convert everything perfectly, and a little more exploration is in order.  Try printing the summary from R:</p>
<pre class="brush: python; title: ; notranslate">r('print(LM_summary)')</pre>
<p>Well, that makes more sense, and you can see the P value for the slope is 2.75E-5.  But how to extract it from Python?</p>
<pre class="brush: plain; title: ; notranslate">Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max
-1.5675 -0.5899  0.1549  0.4613  1.1366 

Coefficients:
            Estimate Std. Error t value Pr(&gt;|t|)
(Intercept)   0.2849     0.5398   0.528    0.612
x             0.8621     0.1011   8.526 2.75e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.9184 on 8 degrees of freedom
Multiple R-squared: 0.9009,	Adjusted R-squared: 0.8885
F-statistic:  72.7 on 1 and 8 DF,  p-value: 2.753e-05</pre>
<p>The trick is to match output from the summary printout in R with the dictionary returned to Python.  Here, it looks like the key &#8216;coefficients&#8217; in the summary dictionary in Python gives the numbers in the 2nd row, 3rd column:</p>
<pre class="brush: python; title: ; notranslate">P = summary['coefficients'][1,2]</pre>
<p>Whew, and there you have it.  See, it takes some digging around to get what you need, but now since I&#8217;ve done the work for you, you can now do linear regressions from Python.  All together it looks like this (can be wrapped in a function or class for your own reuse):</p>
<pre class="brush: python; title: ; notranslate">r.assign('x', x)
r.assign('y', y)
LM = r('linear_model = lm(y~x)')
summary = r('summary_LM = summary(linear_model)')
slope = LM['coefficients']['x']
intercept = LM['coefficients']['(Intercept)']
P = summary['coefficients'][1,2]</pre>
<h4>Redundancy analysis</h4>
<p>OK, say you have this data set to perform redundancy analysis (RDA) on.  First, you need the package <a href="http://vegan.r-forge.r-project.org/">vegan</a> installed, which is fantastic for multivariate stats.  It&#8217;s probably best to fire up R proper (from a command line, or the GUI if you have it in Windows or OSX) and run</p>
<pre class="brush: plain; title: ; notranslate">install.packages(&quot;vegan&quot;, dep=T)</pre>
<p>Here&#8217;s a heavily commented script, <a href='http://scienceoss.com/wp-content/uploads/2008/07/rpy-demo.py'>rpy-demo.py</a>, that will:</p>
<ul>
<li>load and format the data included in the script</li>
<li>send the data to R</li>
<li>perform an RDA in R</li>
<li>plot the ordination</li>
<li>save the ordination as a PNG</li>
<li>print the variance explained by constrained and unconstrained axes as well as each RDA axis.</li>
</ul>
<p>If you have RPy installed and the vegan package installed, you should be able to just run this Python script.</p>
<p>Often-run analyses that you need R for can be wrapped in a class or module to encapsulate your data analysis needs, so you don&#8217;t need to clutter your code with it. Once things are set up that way, it would be as easy as</p>
<pre class="brush: python; title: ; notranslate">
from myRstuff import lm, rda
results = lm(x,y)
ordination = rda(data)</pre>
<p>For much, much more see the <a href="http://rpy.sourceforge.net/rpy/doc/rpy_html/index.html">online documentation</a> for RPy, but hopefully I gave you enough to at least get started.</p>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/rpy-statistics-in-r-from-python/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Advanced sorting: sorting by key</title>
		<link>http://scienceoss.com/advanced-sorting-sorting-by-key/</link>
		<comments>http://scienceoss.com/advanced-sorting-sorting-by-key/#comments</comments>
		<pubDate>Mon, 14 Apr 2008 18:30:29 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[key]]></category>
		<category><![CDATA[sort]]></category>

		<guid isPermaLink="false">http://scienceoss.com/advanced-sorting-sorting-by-key/</guid>
		<description><![CDATA[The sort() method of list objects in Python is quite flexible. By default, it sorts on the first thing in each item of the list, which is exactly what you would expect. For example, a list of strings is sorted by the first letter of each string. What if you wanted to sort by the [...]]]></description>
			<content:encoded><![CDATA[<p>The <span class="c">sort()</span> method of list objects in Python is quite flexible.  By default, it sorts on the first thing in each item of the list, which is exactly what you would expect.  For example, a list of strings is sorted by the first letter of each string.    What if you wanted to sort by the second letter of each string?  Or sort a list of people&#8217;s names by last name?<span id="more-110"></span></p>
<p>By default, the <span class="c">key</span> to sort by is the first letter of each string.  Or the first item in a sequence if it&#8217;s list of sequences.  But Python allows you to specify any key that you want, using the <span class="c">key</span> parameter for the <span class="c">sort()</span> function.  The <span class="c">key</span> is the name of a function.  </p>
<p>The way it works is this:  <span class="c">sort</span> runs every item in the list through the function.  Whatever the function returns is used to sort, overriding the default of using the first thing.  </p>
<h3>Example 1</h3>
<p>So to sort a list of strings by the second letter instead of the first letter then we simply need a function that returns the second letter of a string.  Here&#8217;s such a function, and how to use it as a key to <span class="c">sort</span>.  Note that <span class="c">key=secondletter</span>, NOT <span class="c">key=secondletter()</span>.  We&#8217;re specifying the reference to <span class="c">secondletter</span>, not trying to call it.</p>
<pre class="brush: python; title: ; notranslate">
def secondletter(x):
    return x[1]

mylist = ['orange', 'banana', 'apple']

mylist.sort(key=secondletter)

# ['banana', 'apple', 'orange']
</pre>
<p>By the way, for simple one-liner functions like this, we could have used the <a href="http://docs.python.org/tut/node6.html#SECTION006750000000000000000">lambda syntax</a> instead of defining the <span class="c">secondletter</span> function:</p>
<pre class="brush: python; title: ; notranslate">
mylist = ['orange', 'banana', 'apple']
mylist.sort(key=lambda x: x[1])
</pre>
<h3>Example 2</h3>
<p>OK, while it&#8217;s a good first example, sorting on the second letter isn&#8217;t terribly useful.  How about sorting a list of people&#8217;s names by their last name?  We simply need a function to return the last name, and use the name of that function as the sort key.</p>
<pre class="brush: python; title: ; notranslate">def lastname(x):
    firstname, lastname = x.split()
    return lastname

presidents = ['Abraham Lincoln', 'George Washington',
              'Benjamin Harrison', 'Millard Fillmore']

presidents.sort(key=lastname)

#['Millard Fillmore',
# 'Benjamin Harrison',
# 'Abraham Lincoln',
# 'George Washington']
</pre>
<p>Of course, in practice you would have to be careful with this . . . if there&#8217;s a middle name in there, then it would break the <span class="c">lastname</span> function.  This one works better:</p>
<pre class="brush: python; title: ; notranslate">
def lastname2(x):
        return x.split()[-1]

not_all_presidents = ['Abraham Lincoln', 'George Washington', 'Benjamin Harrison',
                     'Millard Fillmore', 'Prince', 'Madonna', 'Arthur C. Clarke']

not_all_presidents.sort(key=lastname2)

#['Arthur C. Clarke',
# 'Millard Fillmore',
# 'Benjamin Harrison',
# 'Abraham Lincoln',
# 'Madonna',
# 'Prince',
# 'George Washington']
</pre>
<p>&#8230;but if you have names like &#8216;King George III&#8217;, you&#8217;ll have to fix the function to deal with them.</p>
<h3>Example 3</h3>
<p>How about sorting a list of stocks by their maximum closing price for this week?  (Use <span class="c">reverse=True</span> so that highest are listed first)</p>
<pre class="brush: python; title: ; notranslate">
stocks = [ [56, 94, 13, 90, 91], [33, 76, 22, 34, 105], [25, 28, 29, 30, 35] ]
stocks.sort(key=max, reverse=True)

# [[33, 76, 22, 34, 105], [56, 94, 13, 90, 91], [25, 28, 29, 30, 35]]
</pre>
<h3>Example 4</h3>
<p>You can get tricky&#8230;knowing that <span class="c">sort()</span> changes the list in-place, sort individual items by stock price, then sort each stock by its max.</p>
<pre class="brush: python; title: ; notranslate">
def mymax(x):
    x.sort(reverse=True)
    return x[0]

stocks = [ [56, 94, 13, 90, 91], [33, 76, 22, 34, 105], [25, 28, 29, 30, 35] ]
stocks2 = stocks[:] # make a copy, cause we're about to change it

stocks2.sort(key=mymax, reverse=True)

#[[105, 76, 34, 33, 22], [94, 91, 90, 56, 13], [35, 30, 29, 28, 25]]
</pre>
<h3>Example 5</h3>
<p>Or the sort a list by absolute value instead of paying attention to negative signs:</p>
<pre class="brush: python; title: ; notranslate">
deviations = [10, -34, -5, 90, -87]
deviations.sort(key=abs)
# [-5, 10, -34, -87, 90]
</pre>
<p>As you can see, specifying the sort key can be pretty useful if you know it&#8217;s there.  Coming up with these examples really helped me see where this technique would be useful.  You can find more info on sorting on the <a href="http://wiki.python.org/moin/HowTo/Sorting">Python wiki</a>, and <a href="http://http://xahlee.org/perl-python/sort_list.html">comparisons between Python and Perl sorting</a>. </p>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/advanced-sorting-sorting-by-key/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Foreign keys in MySQL</title>
		<link>http://scienceoss.com/foreign-keys-in-mysql/</link>
		<comments>http://scienceoss.com/foreign-keys-in-mysql/#comments</comments>
		<pubDate>Tue, 08 Apr 2008 16:47:43 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[alter]]></category>
		<category><![CDATA[foreign key]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[SQLAlchemy]]></category>
		<category><![CDATA[tables]]></category>

		<guid isPermaLink="false">http://scienceoss.com/foreign-keys-in-mysql/</guid>
		<description><![CDATA[The default engine in MySQL, MyISAM, does not support foreign keys. You need foreign keys to use the SQLAlchemy Python package effectively. In order to use foreign keys, you need to convert your tables to the InnoDB engine: To add a foreign key to mytable where the unique keys are coming from othertable, use this: [...]]]></description>
			<content:encoded><![CDATA[<p>The default engine in MySQL, MyISAM, does not support foreign keys.  You need foreign keys to use the SQLAlchemy Python package effectively.  </p>
<p>In order to use foreign keys, you need to convert your tables to the InnoDB engine:</p>
<pre class="brush: sql; title: ; notranslate">
ALTER TABLE mytable ENGINE = INNODB;
</pre>
<p>To add a foreign key to <span class="c">mytable</span> where the unique keys are coming from <span class="c">othertable</span>, use this:</p>
<pre class="brush: sql; title: ; notranslate">
ALTER TABLE mytable ADD FOREIGN KEY (otherID) REFERENCES othertable (otherID);
</pre>
<p>If you run that same line several times, several identical foreign keys will be created, which will confuse SQLAlchemy.  In that case you need to delete the keys.  To do so, you need their name.  To see the name, use </p>
<pre class="brush: sql; title: ; notranslate">SHOW CREATE TABLE mytable;</pre>
<p>The foreign key&#8217;s label will be something that ends in something similar to <span class="c">fk_1</span> or <span class="c">fk_2</span>.  Using that label, you can then delete the foreign key:</p>
<pre class="brush: sql; title: ; notranslate">ALTER TABLE mytable DROP FOREIGN KEY mytable_dbfk_2;</pre>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/foreign-keys-in-mysql/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>MySQLdb &#8211; accessing MySQL databases from Python</title>
		<link>http://scienceoss.com/mysqldb-accessing-mysql-databases-from-python/</link>
		<comments>http://scienceoss.com/mysqldb-accessing-mysql-databases-from-python/#comments</comments>
		<pubDate>Mon, 24 Mar 2008 00:31:04 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[data management]]></category>
		<category><![CDATA[databases]]></category>
		<category><![CDATA[MySQLdb]]></category>
		<category><![CDATA[syntax]]></category>

		<guid isPermaLink="false">http://scienceoss.com/mysqldb-accessing-mysql-databases-from-python/</guid>
		<description><![CDATA[MySQL is a popular open-source database engine, and Python interfaces quite nicely with MySQL with the MySQLdb package. For more on why you would want to use a database for your data, check out this post. Here I&#8217;ll show you how to connect to your existing MySQL database with Python. Assumptions I&#8217;m assuming you have [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.mysql.com/">MySQL</a> is a popular open-source database engine, and Python interfaces quite nicely with MySQL with the <a href="http://sourceforge.net/projects/mysql-python">MySQLdb</a> package.  For more on why you would want to use a database for your data, check out <a href="http://scienceoss.com/why-should-i-use-a-database-for-my-data/">this post</a>.  Here I&#8217;ll show you how to connect to your existing MySQL database with Python.<span id="more-5"></span></p>
<h3>Assumptions</h3>
<ul>
<li>I&#8217;m assuming you have a MySQL database running. <a href="http://scienceoss.com/why-should-i-use-a-database-for-my-data/">More info here</a></li>
<li>you have the <a href="http://sourceforge.net/projects/mysql-python">MySQLdb</a> package installed for Python.</li>
<li>The database is running on <span class="c">localhost</span>, the user is <span class="c">root</span>, and the password is <span class="c">p@55w0rd</span>.</li>
<li>You <a href="http://www.w3schools.com/sql/default.asp">know some SQL</a> (at least enough to appreciate some of these examples)</li>
</ul>
<h3>Caveats</h3>
<p>While the code below is specific to MySQLdb, no matter what database API you use you should be able to use the same syntax (as outlined in <a href="http://www.python.org/dev/peps/pep-0249/">PEP 249</a>).</p>
<p>For more details, see the <a href="http://mysql-python.sourceforge.net/MySQLdb.html">official documentation for MySQLdb</a>.  Here I&#8217;m just trying to explain things slightly differently.</p>
<h2>Example usage</h2>
<h3>Import MySQLdb, and connect to the database</h3>
<pre class="brush: python; title: ; notranslate">import MySQLdb
my_connection = MySQLdb.connect(host='localhost', user='root', passwd='p@55w0rd')
cursor = my_connection.cursor()</pre>
<p>That&#8217;s it!  You&#8217;re ready to start sending SQL statements to your MySQL database!</p>
<h3>The cursor is everything!</h3>
<p>The <a href="http://en.wikipedia.org/wiki/Cursor_(databases)">cursor</a> now contains all the information it needs to send information to and get information from the running MySQL server.  It&#8217;s your key to the database.</p>
<p>The two most often-used methods of the MySQLdb cursor are</p>
<ol>
<li><span class="c"><strong>cursor.execute()</strong></span>, which executes a query (but doesn&#8217;t return the data)</li>
<li><span class="c"><strong>cursor.fetchall()</strong></span>, which fetches the data from the most recently executed query.</li>
</ol>
<p>You send commands to MySQL by passing strings of SQL statements to <span class="c">cursor.execute()</span>.  When doing so, you can take advantage of Python&#8217;s multi-line string (delimited by triple quotes (<span class="c">&#8220;&#8221;"</span>)) and the fact that SQL syntax doesn&#8217;t care that there are newlines in the query.  Furthermore, MySQLdb automatically adds semicolons to the end of SQL statements if you forget them.</p>
<h3>Interacting with the database</h3>
<h4>Create the database and a table</h4>
<p>Make a new database by sending the standard SQL query, <span class="c">&#8216;CREATE DATABASE testdb&#8217;</span>, to the database you connected to.  Note that MySQLdb automatically adds semicolons to the end of statments if you don&#8217;t add them yourself.</p>
<pre class="brush: python; title: ; notranslate">cursor.execute('CREATE DATABASE testdb')</pre>
<p>If you do this in an interactive session, you will notice that this method returned a long format integer (1L).  This is the number of lines returned by the cursor.  Don&#8217;t worry about it quite yet.</p>
<p>Now make that new database the active one:</p>
<pre class="brush: python; title: ; notranslate">cursor.execute('USE testdb')</pre>
<p>Now create a table in the <span class="c">testdb</span> database to hold some addresses:</p>
<pre class="brush: python; title: ; notranslate">cursor.execute('''CREATE TABLE addresses (
                    name VARCHAR(20),
                    street VARCHAR(20),
                    zipcode INT,
                    city VARCHAR(20),
                    state CHAR(2)
                    )
                    ''')</pre>
<p>Note the use of triple quotes so that you can visually organize the SQL query for clarity.</p>
<h4>Import data from Python into MySQL</h4>
<p>The general syntax for passing Python data to an SQL query through the cursor is:</p>
<p><strong>
<pre class="brush: python; title: ; notranslate">cursor.execute(SQL,tuple)</pre>
<p></strong><br />
where <span class="c">SQL</span> is a valid SQL statement.  If <span class="c">SQL</span> has N placeholders of the form <span class="c">%s</span>, then <span class="c">tuple</span> must have length N.  Hopefully an example will make more sense.</p>
<p>Let&#8217;s create some Python lists that we&#8217;ll import into this table.  The beauty of it is that these data could have been parsed from a text file with hundreds or thousands of names, and we can import them into the database automatically.  For now we&#8217;ll just enter three records though.</p>
<p>Here&#8217;s the data that will go into the database:</p>
<pre class="brush: python; title: ; notranslate">names = ['Bob', 'Alfred', 'Jen']
streets = ['123 Elm Street', '55 Ninth Ave', '1 Paved Rd']
zips = [00123, 34565, 30094]
cities = ['Newark', 'Salinas', 'Los Angeles']
states = ['NJ', 'CA', 'CA']</pre>
<p>And here&#8217;s how to get that data into the <span class="c">addresses</span> table:</p>
<pre class="brush: python; title: ; notranslate">cursor.executemany('''INSERT INTO addresses
                     (name, street, zipcode, city, state)
                     VALUES
                     (%s, %s, %s, %s, %s)''',
                     zip(names, streets, zips, cities, states))</pre>
<p>A couple of things to note here:</p>
<ul>
<li>This time we used <span class="c">cursor.executemany()</span>, which will accept a list of lists as input, instead of <span class="c">cursor.execute()</span>.</li>
<li>There were 5 fields into which we inserted data (name, street, zipcode, cities, and state)</li>
<li>There were 5 <span class="c">%s</span> placeholders in the SQL query.</li>
<li>Even though zipcode is an INT field and not a string, we used %s.  This will always be the case:<em> use %s as a placeholder no matter what the datatype</em>.</li>
<li>There were 5 lists that were zipped together.  They need to be zipped so that the result is a list of lists, and the length of each item in the list = 5.</li>
<li>The order in which these lists were zipped corresponded to the fields into which they were to be inserted.</li>
</ul>
<h3>Retrieving data from the database</h3>
<p>There are two steps to retrieving data: executing the query, then fetching the results.</p>
<p>To select all addresses in California, first execute this query (it&#8217;s a one-liner so triple quoting isn&#8217;t really needed)</p>
<pre class="brush: python; title: ; notranslate">cursor.execute(&quot;SELECT * FROM addresses WHERE state = 'CA' &quot;)</pre>
<p>Alternatively . . . often you will want to feed Python variables into the query.  Say the state abbreviation &#8216;CA&#8217; is saved in a Python variable called <span class="c">my_state</span>.  Then this query will do the same thing as the one above:</p>
<pre class="brush: python; title: ; notranslate">cursor.execute('''SELECT * FROM addresses WHERE state = %s''', my_state)</pre>
<p>By the way, <span class="c">my_state</span> is not a tuple, but that&#8217;s OK since there is only one <span class="c">%s</span> placeholder in the query.  MySQLdb knows where it should go.</p>
<p>Now to retrieve the results:</p>
<pre class="brush: python; title: ; notranslate">results = cursor.fetchall()</pre>
<p>Note that a cursor object is similar to a file object or an iterator: <em>once you fetch everything, there is nothing left in the cursor to retrieve</em>.  So executing the command above a second time would result in an empty list until the query is executed again.</p>
<p><span class="c">results</span> is a tuple of tuples and looks like this:</p>
<pre class="brush: python; title: ; notranslate">(('Alfred', '55 Ninth Ave', 34565L, 'Salinas', 'CA'),
 ('Jen', '1 Paved Rd', 30094L, 'Los Angeles', 'CA'))</pre>
<p>That&#8217;s all there is to it!  Armed with this knowledge, now you can execute queries from Python to import, retrieve, and plot data from your database.  This was a simple demo of what MySQL and Python can do, but you can construct ever-larger databases and ever-more-sophisticated queries to manipulate data in ways that would be impossible without these tools.</p>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/mysqldb-accessing-mysql-databases-from-python/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Sending command line options to Python scripts</title>
		<link>http://scienceoss.com/sending-command-line-options-to-python-scripts/</link>
		<comments>http://scienceoss.com/sending-command-line-options-to-python-scripts/#comments</comments>
		<pubDate>Thu, 06 Dec 2007 14:44:38 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[command line]]></category>

		<guid isPermaLink="false">http://scienceoss.com/?p=52</guid>
		<description><![CDATA[If you have some different options in your program and you want to turn them on or off, or feed your functions different arguments, then you can specify all of this from the command line. You can read about the details of the optparser module here, but here are the basics: Create an OptionParser object. [...]]]></description>
			<content:encoded><![CDATA[<p>If you have some different options in your program and you want to turn them on or off, or feed your functions different arguments, then you can specify all of this from the command line.</p>
<p>You can read about the details of the optparser module <a href="http://docs.python.org/lib/module-optparse.html">here</a>, but here are the basics:<span id="more-52"></span></p>
<p>Create an OptionParser object.</p>
<pre class = "prettyprint"><code class = "code">from optparse import OptionParser
parser = OptionParser()</code></pre>
<p>Use <span class="c">parser.add_option()</span> to add options that will be passed to the script from the command line.  For example,</p>
<pre class = "prettyprint"><code class = "code">parser.add_option('--i', action='store', dest='input_file')
(options,args) = parser.parse_args</code></pre>
<p>Each time you add an option, you have to specify:</p>
<ul>
<li>the option flag (say, <span class="c">-i</span> or <span class="c">&#8211;infile</span> for an input file).  One dash for single characters, two dashes for anything longer.</li>
<li>the action.  This is typically store to store the variable, or store_true to use the option as an on-off switch.</li>
<li>the destination.  The option will be stored in an options object (described shortly . .)</li>
</ul>
<p>There are other things you can add to, like defaults and help strings. See the  <a href="http://docs.python.org/lib/module-optparse.html">documentation</a> for more.</p>
<p>To get everything that was sent to the command line into your script, you get tell the <span class="c">OptionParser</span> to get the parse the arguments from the command line like this:</p>
<pre class = "prettyprint"><code class = "code">(options,args) = parser.parse_args()</code></pre>
<p>Now <span class="c">options</span> has attributes named after the destination.  In the example above, I added the command line option, <span class="c">-i</span>, which will store the argument given right after it in the command line in <span class="c">options.input_file</span>.</p>
<p>So if I called my script from the commandline like this:</p>
<pre class = "prettyprint"><code class = "code">python script.py -i mydata.txt</code></pre>
<p>Then in my script, I could access the &#8216;mydata.txt&#8217; that was provided on the commandline using</p>
<pre class = "prettyprint"><code class = "code">options.input_file</code></pre>
<p>Or, to open the file specified on the command line within my script,</p>
<pre class = "prettyprint"><code class = "code">
f = open(options.input_file)
</code></pre>
<p>Here&#8217;s a longer example:</p>
<pre class = "prettyprint"><code class = "code">
import optparser

parser = OptionParser()

parser.add_option("-i",
                         action="store",
                         dest="infile",
                         help="specify the input file")

parser.add_option("-o",
                         action="store",
                         dest="outfile",
                         help="specify the output file")

# This is a boolean (True/False) option.
# I set the default to False; using the --tabs
# option at the command line will make it true.
parser.add_option("--tabs",
                         action="store_true",
                         dest="use_tabs",
                         default=False,
                         help="tell the script to export as tabs")

parser.add_option("--useless",
                         action="store",
                         dest="dummy_variable",
                         help="not used for anything")

(options, args) = parser.parse_args()

# This is how you access the options:

print options.infile
print options.outfile
print options.use_tabs
print options.dummy_variable
</code></pre>
<p>Now if you run this at the command prompt:</p>
<pre class = "prettyprint"><code class = "code">python test.py -i input.txt -o output.txt</code></pre>
<p>then you&#8217;ll see that the options were passed to the script.  If you don&#8217;t specify an option, its default value is None.  You can also set a default value, like I&#8217;ve done for the &#8211;tabs option above.</p>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/sending-command-line-options-to-python-scripts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Test your code: so easy there&#8217;s no excuse!</title>
		<link>http://scienceoss.com/test-your-code-so-easy-theres-no-excuse/</link>
		<comments>http://scienceoss.com/test-your-code-so-easy-theres-no-excuse/#comments</comments>
		<pubDate>Thu, 06 Dec 2007 04:52:20 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
				<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://scienceoss.com/?p=51</guid>
		<description><![CDATA[I had heard of unittest and how I really needed to use it to make sure my code is doing what I expect . . . but it just seemed so clunky. Plus, a program would have to reach a certain threshold of complexity before I would make the effort to test with unittest. Then [...]]]></description>
			<content:encoded><![CDATA[<p>I had heard of unittest and how I really needed to use it to make sure my code is doing what I expect . . . but it just seemed so clunky.  Plus, a program would have to reach a certain threshold of complexity before I would make the effort to test with unittest.</p>
<p>Then I ran across doctest.  And it is so astoundingly easy to use that I might start writing tests even for one-line scripts.<span id="more-51"></span></p>
<p>Here are the three files you need, along with what I&#8217;m going to be calling them:</p>
<ol>
<li>some code to test, &#8220;code.py&#8221;</li>
<li>a text file containing documentation for that code, &#8220;tutorial.txt&#8221;</li>
<li>a two line test-runner script, &#8220;test.py&#8221;</li>
</ol>
<h2>1. code.py</h2>
<p>code.py, the stuff I want to test, looks like this:</p>
<pre class = "prettyprint"><code class = "code">
def addxy(x,y):
    return x+y
def sayHi():
    print "Hi!"
</code></pre>
<h2>2. tutorial.txt</h2>
<p>tutorial.txt contains what to expect when you type certain things at the prompt, along with text.  This is plain text.</p>
<pre class = "prettyprint">
<code class = "code">
This tests the module "code.py".

First, make sure that addxy() works:

>>> from code import *
>>> addxy(4,5)
9

Now make sure sayHi() works:

>>> sayHi()
Hi!
</code></pre>
<h2>3. test.py</h2>
<p>test.py contains these two lines.  Note that tutorial.txt is referred to in this file, so make sure it points to whatever you called your testing text.</p>
<pre class = "prettyprint"><code class = "code">
import doctest
doctest.testfile('tutorial.txt')
</code></pre>
<h2>Now test!</h2>
<p>Run test.py to test everything in the tutorial.txt file:</p>
<pre class = "prettyprint"><code class = "code">python test.py</code></pre>
<p>In this case, it tests the stuff in code.py.  If all goes well, <em>nothing is returned</em>.  If you would like some feedback, then use the -v option at the commandline when running test.py.</p>
<pre class = "prettyprint"><code class = "code">python test.py -v</code></pre>
<h2>Results</h2>
<p>Here&#8217;s what you get when all tests pass and you use the -v option:</p>
<pre class = "prettyprint"><code class = "code">Trying:
    import code
Expecting nothing
ok
Trying:
    code.addxy(4,5)
Expecting:
    9
ok
Trying:
    code.sayHi()
Expecting:
    Hi!
ok
1 items passed all tests:
   3 tests in tutorial.txt
3 tests in 1 items.
3 passed and 0 failed.
Test passed.</code></pre>
<p>To see what happens when a test fails, try changing the &#8220;+&#8221; in the definition of addxy() to a &#8220;-&#8221;.  When you run the test again using:</p>
<pre class = "prettyprint"><code class = "code">python test.py
</code></pre>
<p>then you get something like this:</p>
<pre class = "prettyprint"><code class = "code">
**********************************************************************
File "tutorial.txt", line 3, in tutorial.txt
Failed example:
    code.addxy(4,5)
Expected:
    9
Got:
    -1
**********************************************************************
1 items had failures:
   1 of   3 in tutorial.txt
***Test Failed*** 1 failures.</code></pre>
<p>So it tells you where the testing failed: on line 3 when it tried running code.addxy(4,5).  It said it was expecting 9 but instead got -1.  After changing that &#8220;-&#8221; back to a &#8220;+&#8221;, everything should work again.</p>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/test-your-code-so-easy-theres-no-excuse/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Store Python objects so you can use them later with &#8220;shelve&#8221;</title>
		<link>http://scienceoss.com/store-python-objects-so-you-can-use-them-later/</link>
		<comments>http://scienceoss.com/store-python-objects-so-you-can-use-them-later/#comments</comments>
		<pubDate>Tue, 04 Dec 2007 22:41:35 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[persistence]]></category>
		<category><![CDATA[shelve]]></category>
		<category><![CDATA[store variables]]></category>

		<guid isPermaLink="false">http://scienceoss.com/?p=46</guid>
		<description><![CDATA[I had a couple of Python lists that I needed to use in another script on another computer. The annoying way would be to write out to a tab-delimited file, but luckily the shelve module (standard with Python), makes things much easier. It&#8217;s the equivalent of Matlab&#8217;s .mat files, where you store variables for later [...]]]></description>
			<content:encoded><![CDATA[<p>I had a couple of Python lists that I needed to use in another script on another computer.  The annoying way would be to write out to a tab-delimited file, but luckily the shelve module (standard with Python), makes things much easier.  It&#8217;s the equivalent of Matlab&#8217;s .mat files, where you store variables for later use.  Here&#8217;s how to use it:<span id="more-46"></span></p>
<h3>First, the ugly way</h3>
<p>To show how useful the shelve module is, think for a moment what it would take to save variables as tab-delimited text files:</p>
<ol>
<li>write each list out to a tab-delimited text file</li>
<li>copy each list&#8217;s text file to the other computer</li>
<li>for each list, create an empty list</li>
<li>open its file (a way of automatically knowing the filenames would be nice &#8212; sequential numbering perhaps?</li>
<li>append each line to the new list</li>
</ol>
<p>Ugh! Now what if I wanted to use two lists and a dictionary on another computer?  That would get pretty annoying, pretty quickly.</p>
<h3>A better way</h3>
<p>Instead, I used the shelve module, which comes standard with Python.  The way it works is almost self-explanatory (though maybe I&#8217;d call it a cupboard rather than a shelf since you open and close it).</p>
<h4>Add stuff to be transported</h4>
<pre class = "prettyprint"><code class = "code">
import shelve

x = shelve.open('my_shelf.dat')

# Add stuff to the shelve object
x['first list'] = list1
x['second list'] = list2
x['a dictionary'] = dictionary1

# close it when you're done.
x.close()
</code></pre>
<p>As you can see, the shelve object acts much like a dictionary.  It can store arbitrary objects, too.  But not that it&#8217;s in there, how do you use it?</p>
<h3>Use the stuff you added</h3>
<p>To use the shelve object you just created, say, on another computer, just copy that single file over (in this case, my_shelf.dat).  Then on that other computer, open it up and ask for what was in it.  Like so:</p>
<pre class = "prettyprint"><code class = "code">
import shelve

y = shelve.open('my_shelf.dat')

list_a = y['first list']
list_b = y['second list']
dict_1 = y['a dictionary']

y.close()

# insert code here that uses list_a, list_b, or dict_1 . . .
</code></pre>
<p>So the shelve object acts almost like a flat database.</p>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/store-python-objects-so-you-can-use-them-later/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Run programs in the background in IPython</title>
		<link>http://scienceoss.com/running-programs-in-the-background-in-ipython/</link>
		<comments>http://scienceoss.com/running-programs-in-the-background-in-ipython/#comments</comments>
		<pubDate>Sun, 02 Dec 2007 17:36:35 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
				<category><![CDATA[IPython]]></category>
		<category><![CDATA[IPython help]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://scienceoss.com/?p=40</guid>
		<description><![CDATA[You can start another thread in IPython to run a program in the background. bg go() runs the function go() in another thread jobs[0].status tells you what&#8217;s going on with that job jobs[0].result the result of the job jobs? see more help on this part of IPython]]></description>
			<content:encoded><![CDATA[<p>You can start another thread in IPython to run a program in the background. <span id="more-40"></span></p>
<table border="0" cellpadding="4" >
<tbody>
<tr>
<td class="command">bg go()</td>
<td>runs the function <span class="c">go()</span> in<br />
another thread</td>
</tr>
<tr>
<td class="command">jobs[0].status</td>
<td>tells you what&#8217;s going on with that job</td>
</tr>
<tr>
<td class="command">jobs[0].result</td>
<td>the result of the job</td>
</tr>
<tr>
<td class="command">jobs?</td>
<td>see more help on this part of IPython</td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/running-programs-in-the-background-in-ipython/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Debug and optimize your code in IPython</title>
		<link>http://scienceoss.com/debugging-and-optimizing-in-ipython/</link>
		<comments>http://scienceoss.com/debugging-and-optimizing-in-ipython/#comments</comments>
		<pubDate>Sun, 02 Dec 2007 17:34:22 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
				<category><![CDATA[IPython]]></category>
		<category><![CDATA[IPython help]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://scienceoss.com/?p=39</guid>
		<description><![CDATA[One great thing about Python is the interactive debugger, which lets you inspect the value of variables at the point an error occurred. Of course, IPython integrates nicely with the Python debugger and makes debugging code a cinch. xmode Plain plain exception mode xmode Context shows source code around an error xmode Verbose alo shows [...]]]></description>
			<content:encoded><![CDATA[<p>One great thing about Python is the interactive debugger, which lets you inspect the value of variables at the point an error occurred.  Of course, IPython integrates nicely with the Python debugger and makes debugging code a cinch.<span id="more-39"></span></p>
<table border="0" cellpadding="4" >
<tbody>
<tr>
<td class="command">xmode Plain</td>
<td>plain exception mode</td>
</tr>
<tr>
<td class="command">xmode Context</td>
<td>shows source code around an error</td>
</tr>
<tr>
<td class="command">xmode Verbose</td>
<td>alo shows the arguments going into the function<br />
where the error occured</td>
</tr>
<tr>
<td class="command">pdb</td>
<td>Toggls automatic <span class="c">pdb</span>. Upon<br />
hitting an error, you are dropped into the Python debugger. See {ln: Python debugger} for details.</td>
</tr>
<tr>
<td class="command">run -d scriptName</td>
<td>starts the debugger from the beginning of the<br />
script</td>
</tr>
<tr>
<td class="command">debug</td>
<td>go into the debugger right away.</td>
</tr>
</tbody>
</table>
<p style="margin-bottom: 0in">&nbsp;</p>
<p style="margin-bottom: 0in"></p>
<h2>IPython from any script</h2>
<p style="margin-bottom: 0in">IPython can be embedded in a script.&nbsp; Wherever the command ipshell() occurs in your code, an IPython shell will pop up, giving you full access to everything in the namespace of the script.&nbsp; This is extremely useful if you want to add interactive functionality to a program, or need to check on some variables without using the debugger.</p>
<pre class="prettyprint">
<span class=
"prettyprint">from IPython.Shell import IPShellEmbed </span>
ipshell = IPShellEmbed()
ipshell()  # add this wherever you would like to open an IPython window
</pre>
<p style="margin-bottom: 0in">Each ipshell() will open up a new shell.&nbsp; You have to close the newly opened shell using <span class="c">exit()</span> or <span class="c">Exit</span> before the script will continue.</p>
<p style="margin-bottom: 0in">&nbsp;</p>
<p style="margin-bottom: 0in">Checking performance of your code</p>
<p style="margin-bottom: 0in">&nbsp;</p>
<table border="0" cellpadding="4" >
<tbody>
<tr>
<td class="command">time sum(range(10000000))</td>
<td>time the summing of the list of numbers 1 through<br />
ten million and print how long it took</td>
</tr>
<tr>
<td class="command">timeit sum(range(1000))</td>
<td>automatically figures out if it should try a couple times to get an accurate estimate for commands that take very little time.&nbsp; Prints average time.</td>
</tr>
<tr>
<td class="command">prun go()</td>
<td>runs the profiler on the function, <span class="c">go()</span>.&nbsp; The output from the profiler shows how long it took for each part of the script to run.&nbsp; <span class="c">prun</span> works on expressions, not files.</td>
</tr>
<tr>
<td class="command">run -p scriptname</td>
<td>runs the profiler on the script, <span class=<br />
"c">scriptname</span>.&nbsp; <span class="c">run -p</span> works on files, not statements.</td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/debugging-and-optimizing-in-ipython/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

