<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>scienceoss.com &#187; R</title>
	<atom:link href="http://scienceoss.com/tags/r/feed/" rel="self" type="application/rss+xml" />
	<link>http://scienceoss.com</link>
	<description>useful tidbits for using open source software in science</description>
	<lastBuildDate>Wed, 26 May 2010 03:34:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>RPy: statistics in R from Python</title>
		<link>http://scienceoss.com/rpy-statistics-in-r-from-python/</link>
		<comments>http://scienceoss.com/rpy-statistics-in-r-from-python/#comments</comments>
		<pubDate>Sat, 26 Jul 2008 03:37:51 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[linear regression]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://scienceoss.com/?p=4</guid>
		<description><![CDATA[R is a free, open source statistics package written by statisticians, for statisticians. Python on the other hand lacks a comprehensive statistics package. RPy allows you to combine the power of Python with the power of R for an unbeatable combination in data analysis. Note that in order to use R from Python, you need [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.r-project.org/">R</a> is a free, open source statistics package written by statisticians, for statisticians.  Python on the other hand lacks a comprehensive statistics package.  <a href="http://rpy.sourceforge.net/">RPy</a> allows you to combine the power of Python with the power of R for an unbeatable combination in data analysis.</p>
<p>Note that in order to use R from Python, you need to know a little of both . . . so the learning curve can be steep.  You also need to have a feel for what would be easy in R and what would be easy in Python.</p>
<p>There are some detailed examples below if you want to skip right to &#8216;em.</p>
<p>I use Python for most tasks, but when I need high-powered stats, I embed R code in my Python scripts to perform the analysis.</p>
<p>Disclaimer: I figured all of this stuff out by trial and error.  The RPy documentation, while complete, was difficult for me to make sense of when I was learning.  If there&#8217;s a better way to do things, please let me know!  For the details that I don&#8217;t cover here, check the <a href="http://rpy.sourceforge.net/rpy/doc/rpy_html/index.html">online documentation </a></p>
<h3>Why use R?</h3>
<p>You&#8217;ll need R if you want to do any sort of sophisticated (or even not-so-sophisiticated) statistical analysis.  There are no solid statistics libraries that I&#8217;ve come across for Python . . . but maybe that&#8217;s because R is the best possible statistics library there could be.  </p>
<p>Be warned however that accessing  R from Python can get tricky at times.  I&#8217;ve tried to outline some of what I&#8217;ve learned here to make it easier for others.</p>
<p>Why use RPy instead of writing files out to R, then using R scripts to deal with it?  I did this for a little while and found that it was too much work to maintain two separate code bases . . . one for Python, then one for R.  If I changed anything in the output of a Python script, I&#8217;d have to fire up R and open my R scripts to modify and debug them.  I&#8217;ve found that using RPy lets me put all my code in one spot, resulting in fewer bugs and less maintenance.  </p>
<h3>R and Python are separate . . .</h3>
<p>I found that the easiest way to think about this is to think about doing things &#8220;inside R&#8221; or &#8220;inside Python&#8221;.  Things that are to be done inside R are typically wrapped in a string (a Python string).  For example, this creates a variable inside R called <span class="c">x</span> with a value of 5.</p>
<pre class="brush: python; title: ; notranslate">from rpy import *
r('x=5')</pre>
<p>Assuming this was typed into a fresh Python session, Python has no idea about the existence of the variable <span class="c">x</span>!  It works in reverse, too: R has no idea about what&#8217;s in the Python namespace.  So you can do this in Python:</p>
<pre class="brush: python; title: ; notranslate">x = 'I'm a Python string'</pre>
<p>and the variable x inside R is still the same:</p>
<pre class="brush: python; title: ; notranslate">r('print(x)')  # still 5</pre>
<h3>. . . but they can talk to each other</h3>
<p>RPy does some automatic conversions:</p>
<pre class="brush: python; title: ; notranslate">x_from_R = r('x')  # 5</pre>
<p>What happened here is that RPy looked at what <span class="c">x</span> was inside R, saw that it was an integer, and returned that integer to Python, which assigned it to the Python variable <span class="c">x_from_R</span>.  So that&#8217;s how you get data from R to Python: by sending a string (the variable name you want to retrieve in R) to the <span class="c">r</span> object.</p>
<p>At first you might think this is how you send data from Python to R:</p>
<pre class="brush: python; title: ; notranslate">r('x_from_python') = x
#SyntaxError: can't assign to function call</pre>
<p>Nope.  Turns out you have to use the <span class="c">r.assign()</span> function to do that:</p>
<pre class="brush: python; title: ; notranslate">r.assign('x_from_python', x)
r('print(x_from_python)')  # 'I'm a Python string'</pre>
<p>So that&#8217;s how you get data from Python to R: by using the <span class="c">r.assign()</span> function, first giving the name of the variable you want to be assigned in R followed by the Python object to be sent to R.</p>
<h3>Other data types</h3>
<p>OK, so you can get integers back from R.  And as you can imagine, strings work the same way.  But what about more complex data types?  This <a href="http://rpy.sourceforge.net/rpy/doc/rpy_html/Basic-conversion.html#Basic-conversion">list of conversions</a> tells you which R objects will be converted into which Python objects.  It&#8217;s pretty intuitive, a string becomes a string, a list becomes a list, etc.</p>
<p>But then there are things like data frames in R, which have row names and column names.</p>
<p>It&#8217;s not on that list linked above, but an R data frame is converted to a Python dictionary.  For example, the Motor Trend car data set, which comes standard in R, is a data frame.</p>
<pre class="brush: python; title: ; notranslate">from rpy import *
r('print(head(mtcars))') # print just the first 6 lines.  Note the variable names.

# Returns:
#                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
# Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
# Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
# Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
</pre>
<p>Now send the whole thing to Python and check the keys of the dictionary that is created:</p>
<pre class="brush: python; title: ; notranslate">mt = r('mtcars')
mt.keys()</pre>
<p>Note that the keys are the same as the variable names in the dataframe.</p>
<p>Just like you get a Python dictionary from a dataframe, you can send a dictionary to R:</p>
<pre class="brush: python; title: ; notranslate">r.assign('df', dict(a=1, b=2, c=3))
r('print(df)')
r('names(df)')
</pre>
<p>May have to convert it into a dataframe once inside R though:</p>
<pre class="brush: python; title: ; notranslate">
r('df = data.frame(df)')
</pre>
<h3>R functions</h3>
<p>So far, with the exception of <span class="c">r.assign()</span>, we&#8217;ve just been sending strings to the <span class="c">r</span> object.  But the <span class="c">r</span> object also has methods.  Unfortunately, you can&#8217;t see them all using IPython&#8217;s introspection.  Personally I find that I don&#8217;t use this functionality that much, (I use <span class="c">r.assign()</span> to get the data into R and then operate on it in there) but here it is for completeness.</p>
<p>There is a trick here.  Remember, before we were sending a string to the <span class="c">r</span> object and it was executing the code inside R:</p>
<pre class="brush: python; title: ; notranslate">r('x=5')</pre>
<p>But when you use a method of the <span class="c">r</span> object, you pass it raw Python objects.  For example, you can plot a Python list in R using the <span class="c">plot()</span> method of the <span class="c">r</span> object:</p>
<pre class="brush: python; title: ; notranslate">x = [1,2,3]
r.plot(x)</pre>
<p>There are some slight name changes though.  R tends to use a &#8220;.&#8221; as a spacer in function names, like &#8220;_&#8221; tends to be used in Python.  The &#8220;.&#8221; however is special in Python, so in method names of the <span class="c">r</span> object, &#8220;.&#8221; is converted to &#8220;_&#8221;.  For example, R&#8217;s <span class="c">t.test()</span> function becomes <span class="c">r.t_test()</span>. </p>
<p>These methods of the <span class="c">r</span> object are what Python sees, so that&#8217;s why their names have to be changed.  On the other hand, you call R function with its true name when you send the <span class="c">r</span> object a string, like we were doing before.  So both of these refer to the same underlying t-test function in R:</p>
<pre class="brush: python; title: ; notranslate">r.t_test
r('t.test')</pre>
<p>This next one is tricky.  First, since <span class="c">print</span> is a Python function, it needs to have a slightly different name when you want to use the version in R.  So an underscore is added to the end.  Second, what&#8217;s in the parentheses is a Python string.  So all that will get printed is the string, &#8216;x&#8217; . . . not 5, or &#8220;I&#8217;m a Python string&#8221; or anything else.</p>
<pre class="brush: python; title: ; notranslate">r.print_('x') # 'x'</pre>
<p>In practice though, if I want to print something I&#8217;ll either use Python&#8217;s <span class="c">print</span> or if I want to print something from R, I&#8217;ll do this:</p>
<pre class="brush: python; title: ; notranslate">r('print(x)')  # prints 5</pre>
<h3>Plotting examples</h3>
<p>Here&#8217;s are a couple of examples of creating a plot.  In each case a plot is created of the list 1,2,3.  These are trivial examples, but they illustrate different ways of getting data to and from R.</p>
<h4>Option 1: Do everything in R</h4>
<p>You can execute arbitrary R commands by sending them as a string to the <span class="c">r</span> object.  Here, everything is done in R: a list is created and plotted.  In this example, the variable <span class="c">x</span> is never seen by Python.</p>
<pre class="brush: python; title: ; notranslate">from rpy import *
r(&quot;&quot;&quot;
y = c(1,2,3)
plot(y)
&quot;&quot;&quot;)</pre>
<p>Note that you can send many R commands in a multi-line string.</p>
<h4>Option 2: Use a method of the <span class="c">r</span> object</h4>
<p>Here, we start with a Python list, and then send it as the argument to the <span class="c">r.plot()</span> method.</p>
<pre class="brush: python; title: ; notranslate">from  rpy import *
y = [1,2,3]
r.plot(y)</pre>
<h4>Option 3: Get a list from R and plot it with matplotlib in Python</h4>
<p>This trivial because you don&#8217;t gain anything from making a list in R instead of Python, but it shows that you can send data both ways.</p>
<pre class="brush: python; title: ; notranslate">from r import *
import pylab as p
y = r('c(1,2,3)')
p.plot(y)
p.show()</pre>
<h4>Option 4: Use <span class="c">r.assign()</span> to get data to R, then call it inside R</h4>
<p>I tend to use this method a lot with large data sets. The idea is to pass the data into R once, then you can use it from inside R.  The trick is to use the <span class="c">r.assign()</span> method.</p>
<pre class="brush: python; title: ; notranslate">from rpy import *
y = [1,2,3]
r.assign('Y', y)
r('plot(Y)')</pre>
<h3>Getting help on R functions</h3>
<p>Use the <span class="c">r.help()</span> function.   For example, to view the help on anova:</p>
<pre>r.help(anova)</pre>
<p>This displays the help on screen; it doesn&#8217;t return a string.</p>
<h3>Non-trivial examples</h3>
<p>Plotting and printing things are not what you&#8217;d want to use R and RPy for.  Instead, you&#8217;d want to use them for things that you can&#8217;t do in available packages for Python.  </p>
<p>Here are some examples where R can really fill in the gaps in Python&#8217;s statistical functionality.  Anything you can do in R, you can do from Python.  Given the wide variety of packages available for R, this is some stupendous power at your fingertips.  Now to learn how to wield it!</p>
<h4>Linear models in R</h4>
<p>Say I have a Python script already up and running, and it returns some data . . . and I want to know if the slope of two variables is significant.  I haven&#8217;t found any statistics libraries for Python, but in R this kind of functionality comes standard, in the function <span class="c">lm()</span>.</p>
<p>Viewing the help for <span class="c">lm()</span>, you can see that it takes a model specification, like &#8220;y~x&#8221; which means &#8220;y on x&#8221;.  Now, the components of this model specification, y and x, can either refer to variables in the R workspace (which is separate from Python, remember) or they can be variables in a dataframe which is supplied in an optional argument to <span class="c">lm()</span>.</p>
<p>So first we need to figure out how to send the data to R; performing the linear regression should be trivial, then we need to get the data back out.</p>
<p>First, let&#8217;s set up some test data in Python:</p>
<pre class="brush: python; title: ; notranslate">
import numpy as npy
x = npy.arange(10)
y = npy.arange(10) + npy.random.standard_normal(x.shape)&lt;/pre&gt;

Now send it to R:
&lt;pre&gt;r.assign('x',x)
r.assign('y',y)</pre>
<p>(exercise for the reader: instead of assigning x and y individually, how would you get them into R as a dataframe?)</p>
<p>In R, run the linear model and save it as a variable in R.  Here, I&#8217;m simultaneously saving it as a Python dictionary (sneaky!)</p>
<pre class="brush: python; title: ; notranslate">LM = r('linear_model = lm(y~x)')</pre>
<p>OK, here&#8217;s where it take a little exploring.  The dictionary you get back may take some navigating.  Looking at it for a little bit, you might notice the &#8216;coefficients&#8217; key of the dictionary LM, which in turn has two more keys: &#8216;(Intercept)&#8217; and &#8216;x&#8217;.</p>
<pre class="brush: plain; title: ; notranslate">{'assign': [0, 1],
 'call': &lt;Robj object at 0xb7d3e790&gt;,
 'coefficients': {'(Intercept)': 0.28490682478866736,
                  'x': 0.86209804871669171},
 'df.residual': 8,
 'effects': array([-13.16882479,   7.83039439,   1.22245056,   0.18398967,
         0.51108108,   0.8141431 ,  -0.45120018,  -1.1985602 ,
         1.54636612,   0.51341949]),
 'fitted.values': array([ 0.28490682,  1.14700487,  2.00910292,  2.87120097,  3.73329902,
        4.59539707,  5.45749512,  6.31959317,  7.18169121,  8.04378926]),
 'model': {'x': array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.]),
           'y': array([-0.64212347,  1.39389811,  3.06676323,  2.84957073,  3.99793052,
        5.12226093,  4.67818603,  4.7520944 ,  8.3182891 ,  8.10661086])},
 'qr': {'pivot': [1, 2],
        'qr': array([[ -3.16227766, -14.23024947],
       [  0.31622777,   9.08295106],
       [  0.31622777,   0.15621147],
       [  0.31622777,   0.0461151 ],
       [  0.31622777,  -0.06398128],
       [  0.31622777,  -0.17407766],
       [  0.31622777,  -0.28417403],
       [  0.31622777,  -0.39427041],
       [  0.31622777,  -0.50436679],
       [  0.31622777,  -0.61446316]]),
        'qraux': [1.316227766016838, 1.2663078500948464],
        'rank': 2,
        'tol': 9.9999999999999995e-08},
 'rank': 2,
 'residuals': array([-0.92703029,  0.24689324,  1.05766031, -0.02163025,  0.2646315 ,
        0.52686386, -0.77930909, -1.56749877,  1.13659789,  0.0628216 ]),
 'terms': &lt;Robj object at 0xb7d3e780&gt;,
 'xlevels': {}}</pre>
<p>So if all we were after were the slope and intercept, then </p>
<pre class="brush: python; title: ; notranslate">
slope = LM['coefficients']['x']
intercept = LM['coefficients']['(Intercept)']</pre>
<p>But what about a P-value for the slope?  It&#8217;s nowhere to be seen in that dictionary.  Turns out, you need the <span class="c">summary()</span> function in R, and it takes as its input a linear model (among other possible inputs, but here we&#8217;re just using a linear model).  So save it in R (just in case) and simultaneously save it in Python:</p>
<pre class="brush: python; title: ; notranslate">summary = r('LM_summary = summary(linear_model)')</pre>
<p>Hmm.  </p>
<pre class="brush: plain; title: ; notranslate">{'adj.r.squared': 0.88847497651170382,
 'aliased': {'(Intercept)': False, 'x': False},
 'call': &lt;Robj object at 0xb7d3e770&gt;,
 'coefficients': array([[  2.84906825e-01,   5.39776217e-01,   5.27823968e-01,
          6.11943659e-01],
       [  8.62098049e-01,   1.01109349e-01,   8.52639301e+00,
          2.75251311e-05]]),
 'cov.unscaled': array([[ 0.34545455, -0.05454545],
       [-0.05454545,  0.01212121]]),
 'df': [2, 8, 2],
 'fstatistic': {'dendf': 8.0, 'numdf': 1.0, 'value': 72.699377758431851},
 'r.squared': 0.90086664578818121,
 'residuals': array([-0.92703029,  0.24689324,  1.05766031, -0.02163025,  0.2646315 ,
        0.52686386, -0.77930909, -1.56749877,  1.13659789,  0.0628216 ]),
 'sigma': 0.9183712712215929,
 'terms': &lt;Robj object at 0xb7d3e7c0&gt;}</pre>
<p>There&#8217;s the r-squared and adjusted r-squared,</p>
<pre class="brush: python; title: ; notranslate">R_squared = summary['adj.r.squared']</pre>
<p>but no P value.  What gives?  Turns out Python can&#8217;t convert everything perfectly, and a little more exploration is in order.  Try printing the summary from R:</p>
<pre class="brush: python; title: ; notranslate">r('print(LM_summary)')</pre>
<p>Well, that makes more sense, and you can see the P value for the slope is 2.75E-5.  But how to extract it from Python?</p>
<pre class="brush: plain; title: ; notranslate">Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max
-1.5675 -0.5899  0.1549  0.4613  1.1366 

Coefficients:
            Estimate Std. Error t value Pr(&gt;|t|)
(Intercept)   0.2849     0.5398   0.528    0.612
x             0.8621     0.1011   8.526 2.75e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.9184 on 8 degrees of freedom
Multiple R-squared: 0.9009,	Adjusted R-squared: 0.8885
F-statistic:  72.7 on 1 and 8 DF,  p-value: 2.753e-05</pre>
<p>The trick is to match output from the summary printout in R with the dictionary returned to Python.  Here, it looks like the key &#8216;coefficients&#8217; in the summary dictionary in Python gives the numbers in the 2nd row, 3rd column:</p>
<pre class="brush: python; title: ; notranslate">P = summary['coefficients'][1,2]</pre>
<p>Whew, and there you have it.  See, it takes some digging around to get what you need, but now since I&#8217;ve done the work for you, you can now do linear regressions from Python.  All together it looks like this (can be wrapped in a function or class for your own reuse):</p>
<pre class="brush: python; title: ; notranslate">r.assign('x', x)
r.assign('y', y)
LM = r('linear_model = lm(y~x)')
summary = r('summary_LM = summary(linear_model)')
slope = LM['coefficients']['x']
intercept = LM['coefficients']['(Intercept)']
P = summary['coefficients'][1,2]</pre>
<h4>Redundancy analysis</h4>
<p>OK, say you have this data set to perform redundancy analysis (RDA) on.  First, you need the package <a href="http://vegan.r-forge.r-project.org/">vegan</a> installed, which is fantastic for multivariate stats.  It&#8217;s probably best to fire up R proper (from a command line, or the GUI if you have it in Windows or OSX) and run</p>
<pre class="brush: plain; title: ; notranslate">install.packages(&quot;vegan&quot;, dep=T)</pre>
<p>Here&#8217;s a heavily commented script, <a href='http://scienceoss.com/wp-content/uploads/2008/07/rpy-demo.py'>rpy-demo.py</a>, that will:</p>
<ul>
<li>load and format the data included in the script</li>
<li>send the data to R</li>
<li>perform an RDA in R</li>
<li>plot the ordination</li>
<li>save the ordination as a PNG</li>
<li>print the variance explained by constrained and unconstrained axes as well as each RDA axis.</li>
</ul>
<p>If you have RPy installed and the vegan package installed, you should be able to just run this Python script.</p>
<p>Often-run analyses that you need R for can be wrapped in a class or module to encapsulate your data analysis needs, so you don&#8217;t need to clutter your code with it. Once things are set up that way, it would be as easy as</p>
<pre class="brush: python; title: ; notranslate">
from myRstuff import lm, rda
results = lm(x,y)
ordination = rda(data)</pre>
<p>For much, much more see the <a href="http://rpy.sourceforge.net/rpy/doc/rpy_html/index.html">online documentation</a> for RPy, but hopefully I gave you enough to at least get started.</p>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/rpy-statistics-in-r-from-python/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Restructure or reformat dataframes in R with melt</title>
		<link>http://scienceoss.com/restructure-or-reformat-dataframes-in-r-with-melt/</link>
		<comments>http://scienceoss.com/restructure-or-reformat-dataframes-in-r-with-melt/#comments</comments>
		<pubDate>Sun, 23 Mar 2008 22:06:17 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
				<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[data management]]></category>
		<category><![CDATA[dataframe]]></category>
		<category><![CDATA[melt]]></category>

		<guid isPermaLink="false">http://scienceoss.com/restructure-or-reformat-dataframes-in-r-with-melt/</guid>
		<description><![CDATA[The basic idea is to take an R dataframe like this one containing abundance of three species at each site, and elevation at each site site sp1 sp2 sp3 elev a 3 4 9 100 b 1 8 10 210 c 4 8 15 165 and reorganize into something like this (perhaps so we can [...]]]></description>
			<content:encoded><![CDATA[<p>The basic idea is to take an R dataframe like this one containing abundance of three species at each site, and elevation at each site</p>
<pre class="prettyprint"><code class="code">site  sp1 sp2 sp3 elev
a      3   4   9   100
b      1   8   10  210
c      4   8   15  165
</code></pre>
<p>and reorganize into something like this (perhaps so we can do an ANOVA using species as a factor):</p>
<pre class="prettyprint"><code class="code">site  elev  sp  abundance
a     100  sp1  3
a     100  sp2  4
a     100  sp3  9
b     210  sp1  1
b     210  sp2  8
b     210  sp3  10
c     165  sp1  4
c     165  sp2  8
c     165  sp3  15</code></pre>
<p>Assuming the first dataframe above is called <span class="c">d</span>, the second dataframe can be obtained using the following code:</p>
<pre class="prettyprint"><code class="code">> library(ggplot2)
> m = melt(d, id=c('site','elev'))
</code></pre>
<p><span class="c">melt</span> works like this: You specify the ID variables, which are those variables that will REMAIN as dataframe variables.  Any others will be considered measured variables.  If it&#8217;s easier for your data, you can do it the other way: specify the measured variables and the others will be considered ID variables.</p>
<p>Melting results in two new variables, <span class="c">variable</span> and <span class="c">value</span>.  <span class="c">variable</span> contains the names of the original columns of the dataframe as factors, and <span class="c">value</span> contains the corresponding values.</p>
<h3>Another example</h3>
<p>Here&#8217;s another example using the built-in dataset, airquality.  First, unmelted:</p>
<pre class="prettyprint"><code class="code"># make all the variable names lowercase
names(airquality) <- tolower(names(airquality))
head(airquality)
  ozone solar.r wind temp month day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6
</code></pre>
<p>and melted:</p>
<pre class="prettyprint"><code class="code">> head(melt(airquality,id=c('month','day')))
  month day variable value
1     5   1    ozone    41
2     5   2    ozone    36
3     5   3    ozone    12
4     5   4    ozone    18
5     5   5    ozone    NA
6     5   6    ozone    28</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/restructure-or-reformat-dataframes-in-r-with-melt/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Reorder factors for ggplot</title>
		<link>http://scienceoss.com/reorder-factors-for-ggplot/</link>
		<comments>http://scienceoss.com/reorder-factors-for-ggplot/#comments</comments>
		<pubDate>Sun, 23 Mar 2008 21:35:45 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
				<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[facet grid]]></category>
		<category><![CDATA[factors]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[reorder]]></category>

		<guid isPermaLink="false">http://scienceoss.com/reorder-factors-for-ggplot/</guid>
		<description><![CDATA[A somewhat contrived example . . . first, illustrate the problem: library(ggplot2) ggplot(iris)+aes(x=Sepal.Width)+geom_histogram()+facet_grid(Species~.) How do we get these histograms to be better sorted? The following will reorder the factor variable, Species, by the mean of Sepal.Width: iris$Species = reorder(iris$Species, iris$Sepal.Width, mean) ggplot(iris)+aes(x=Sepal.Width)+geom_histogram()+facet_grid(Species~.) Now the histograms are sorted by the mean sepal width.]]></description>
			<content:encoded><![CDATA[<p>A somewhat contrived example . . . first, illustrate the problem:</p>
<pre class="prettyprint"><code class="code">library(ggplot2)
ggplot(iris)+aes(x=Sepal.Width)+geom_histogram()+facet_grid(Species~.)</code></pre>
<p><a href='http://scienceoss.com/wp-content/uploads/2008/03/hist-unsorted.png' title='Unsorted histogram'><img src='http://scienceoss.com/wp-content/uploads/2008/03/hist-unsorted.png' alt='Unsorted histogram' /></a><br />
How do we get these histograms to be better sorted?  The following will reorder the factor variable, <span class="c">Species</span>, by the mean of <span class="c">Sepal.Width</span>:</p>
<pre class="prettyprint"><code class="code">iris$Species = reorder(iris$Species, iris$Sepal.Width, mean)
ggplot(iris)+aes(x=Sepal.Width)+geom_histogram()+facet_grid(Species~.)</code></pre>
<p><a href='http://scienceoss.com/wp-content/uploads/2008/03/hist-sorted.png' title='Sorted histogram'><img src='http://scienceoss.com/wp-content/uploads/2008/03/hist-sorted.png' alt='Sorted histogram' /></a><br />
Now the histograms are sorted by the mean sepal width.</p>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/reorder-factors-for-ggplot/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Linear interpolation</title>
		<link>http://scienceoss.com/linear-interpolation/</link>
		<comments>http://scienceoss.com/linear-interpolation/#comments</comments>
		<pubDate>Thu, 13 Mar 2008 14:46:13 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[interpolation]]></category>
		<category><![CDATA[time series]]></category>

		<guid isPermaLink="false">http://scienceoss.com/linear-interpolation/</guid>
		<description><![CDATA[In R, use the approx function. x is the original x data y is the original y data xi is the x data you want new y data for. z = approx(x, y, xi) plot(z) plot(z$x, z$y) # does the same thing. See ?approx for more options.]]></description>
			<content:encoded><![CDATA[<p>In R, use the <span class="c">approx</span> function.</p>
<ul>
<li><span class="c">x </span>is the original x data</li>
<li><span class="c">y</span> is the original y data</li>
<li><span class="c">xi</span> is the x data you want new y data for.</li>
</ul>
<pre class="prettyprint"><code class="code">z = approx(x, y, xi)
plot(z)
plot(z$x, z$y) # does the same thing.</code></pre>
<p>See <span class="c">?approx</span> for more options.</p>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/linear-interpolation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Longitudinal data in rggobi</title>
		<link>http://scienceoss.com/longitudinal-data-in-rggobi/</link>
		<comments>http://scienceoss.com/longitudinal-data-in-rggobi/#comments</comments>
		<pubDate>Sun, 09 Mar 2008 16:05:15 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[ggobi]]></category>
		<category><![CDATA[multivariate]]></category>
		<category><![CDATA[rggobi]]></category>
		<category><![CDATA[timeseries]]></category>

		<guid isPermaLink="false">http://scienceoss.com/longitudinal-data-in-rggobi/</guid>
		<description><![CDATA[This displays lines between points in RGgobi. time_var is what separates each unique object, specified by id_var, over time. ggobi_longitudinal(dataframe, time_var, id_var) A specific example would be temperatures at different sites over different years: ggobi_lon]]></description>
			<content:encoded><![CDATA[<p>This displays lines between points in RGgobi.  time_var is what separates each unique object, specified by id_var, over time.</p>
<pre class="prettyprint"><code class="code">ggobi_longitudinal(dataframe, time_var, id_var)</code></pre>
<p>A specific example would be temperatures at different sites over different years:<br />
ggobi_lon</p>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/longitudinal-data-in-rggobi/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Change properties of ggplot plots</title>
		<link>http://scienceoss.com/change-properties-of-ggplot-plots/</link>
		<comments>http://scienceoss.com/change-properties-of-ggplot-plots/#comments</comments>
		<pubDate>Mon, 04 Feb 2008 16:28:54 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
				<category><![CDATA[plotting]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[grayscale]]></category>
		<category><![CDATA[parameters]]></category>

		<guid isPermaLink="false">http://scienceoss.com/change-properties-of-ggplot-plots/</guid>
		<description><![CDATA[Try ?ggopt to see the different ways of adjusting plot background, axes, aspect ratio, border colors, and strip labels. Change the font size of the labels. This acts on the currently active plot. grid.gedit('label', gp=gpar(fontsize=16)) Or just change one type of label (here, the yaxis). grid.gedit(gPath("yaxis", "labels"), gp=gpar(col="red")) Use a black and white theme The [...]]]></description>
			<content:encoded><![CDATA[<p>Try </p>
<pre class="prettyprint"><code class="code">?ggopt</code></pre>
<p>to see the different ways of adjusting plot background, axes, aspect ratio, border colors, and strip labels.</p>
<h3>Change the font size of the labels.</h3>
<p>This acts on the currently active plot.</p>
<pre class="prettyprint"><code class="code">grid.gedit('label', gp=gpar(fontsize=16))</code></pre>
<p>Or just change one type of label (here, the yaxis).</p>
<pre class="prettyprint"><code class="code">grid.gedit(gPath("yaxis", "labels"), gp=gpar(col="red"))</code></pre>
<h3>Use a black and white theme</h3>
<p>The newest version of ggplot2 (0.5.7) allows you to have black and white themes.</p>
<pre class="prettyprint"><code class="code">pl = ggplot(diamonds)+aes(x=carat, y=price) +
    geom_point()+theme_bw
pl
</code></pre>
<p>I like to tweak the <span class="c">theme_bw</span> a little before using it as above:</p>
<pre class="prettyprint"><code class="code">theme_bw$grid.colour = "grey80"
theme_bw$border.colour = "gray70"</code></pre>
<h3>Change the strip labels</h3>
<pre class="prettyprint"><code class="code">pl$strip.gp = gpar(fill="grey90")
pl$strip.txt.gp = gpar(col="black", fontsize=16)
pl</code></pre>
<h3>Change factor colors to grayscale</h3>
<pre class="prettyprint"><code class="code">pl+scale_colour_grey(end=0.7,start=0,name='')
pl</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/change-properties-of-ggplot-plots/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Combine dataframes in R</title>
		<link>http://scienceoss.com/combine-dataframes-in-r/</link>
		<comments>http://scienceoss.com/combine-dataframes-in-r/#comments</comments>
		<pubDate>Mon, 04 Feb 2008 15:15:11 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[dataframe]]></category>
		<category><![CDATA[merge]]></category>

		<guid isPermaLink="false">http://scienceoss.com/combine-dataframes-in-r/</guid>
		<description><![CDATA[Quick answer: use merge. You can also use rbind, but it will only be useful for simple cases. See http://www.statmethods.net/management/merging.html for more details.]]></description>
			<content:encoded><![CDATA[<p>Quick answer:  use <span class="c">merge</span>.  </p>
<p>You can also use <span class="c">rbind</span>, but it will only be useful for simple cases.  </p>
<p>See <a href="http://www.statmethods.net/management/merging.html">http://www.statmethods.net/management/merging.html</a> for more details.</p>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/combine-dataframes-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Run an R script</title>
		<link>http://scienceoss.com/run-an-r-script/</link>
		<comments>http://scienceoss.com/run-an-r-script/#comments</comments>
		<pubDate>Wed, 23 Jan 2008 19:40:11 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[command line]]></category>

		<guid isPermaLink="false">http://scienceoss.com/?p=70</guid>
		<description><![CDATA[From the R prompt: source('filename.R') From the shell: R filename.R]]></description>
			<content:encoded><![CDATA[<p>From the R prompt:</p>
<pre class = "prettyprint"><code class = "code">source('filename.R')</code></pre>
<p>From the shell:</p>
<pre class = "prettyprint"><code class = "code">R filename.R</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/run-an-r-script/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Change axis labels in ggplot</title>
		<link>http://scienceoss.com/change-axis-labels-in-ggplot/</link>
		<comments>http://scienceoss.com/change-axis-labels-in-ggplot/#comments</comments>
		<pubDate>Mon, 14 Jan 2008 00:17:39 +0000</pubDate>
		<dc:creator>ryan</dc:creator>
				<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://scienceoss.com/?p=60</guid>
		<description><![CDATA[By default, ggplot uses the variable name as the axis labels. Change it to something else using scale_x_continuous or scale_y_continuous p = ggplot(diamonds)+geom_point()+aes(x=carat,y=price) p + scale_x_continuous('x axis label') + scale_y_continuous('y axis label')]]></description>
			<content:encoded><![CDATA[<p>By default, ggplot uses the variable name as the axis labels.  Change it to something else using <span class="c">scale_x_continuous</span> or <code>scale_y_continuous</code></p>
<pre class = "prettyprint"><code class = "code">
p = ggplot(diamonds)+geom_point()+aes(x=carat,y=price)

p + scale_x_continuous('x axis label') + scale_y_continuous('y axis label')
</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://scienceoss.com/change-axis-labels-in-ggplot/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

