Archive for the ‘matplotlib’ Category

RPy: statistics in R from Python

Friday, July 25th, 2008

R is a free, open source statistics package written by statisticians, for statisticians. Python on the other hand lacks a comprehensive statistics package. RPy allows you to combine the power of Python with the power of R for an unbeatable combination in data analysis.

Note that in order to use R from Python, you need to know a little of both . . . so the learning curve can be steep. You also need to have a feel for what would be easy in R and what would be easy in Python.

There are some detailed examples below if you want to skip right to ‘em.

I use Python for most tasks, but when I need high-powered stats, I embed R code in my Python scripts to perform the analysis.

Disclaimer: I figured all of this stuff out by trial and error. The RPy documentation, while complete, was difficult for me to make sense of when I was learning. If there’s a better way to do things, please let me know! For the details that I don’t cover here, check the online documentation

Why use R?

You’ll need R if you want to do any sort of sophisticated (or even not-so-sophisiticated) statistical analysis. There are no solid statistics libraries that I’ve come across for Python . . . but maybe that’s because R is the best possible statistics library there could be.

Be warned however that accessing R from Python can get tricky at times. I’ve tried to outline some of what I’ve learned here to make it easier for others.

Why use RPy instead of writing files out to R, then using R scripts to deal with it? I did this for a little while and found that it was too much work to maintain two separate code bases . . . one for Python, then one for R. If I changed anything in the output of a Python script, I’d have to fire up R and open my R scripts to modify and debug them. I’ve found that using RPy lets me put all my code in one spot, resulting in fewer bugs and less maintenance.

R and Python are separate . . .

I found that the easiest way to think about this is to think about doing things “inside R” or “inside Python”. Things that are to be done inside R are typically wrapped in a string (a Python string). For example, this creates a variable inside R called x with a value of 5.

from rpy import *
r('x=5')

Assuming this was typed into a fresh Python session, Python has no idea about the existence of the variable x! It works in reverse, too: R has no idea about what’s in the Python namespace. So you can do this in Python:

x = "I'm a Python string"

and the variable x inside R is still the same:

r('print(x)')  # still 5

. . . but they can talk to each other

RPy does some automatic conversions:

x_from_R = r('x')  # 5

What happened here is that RPy looked at what x was inside R, saw that it was an integer, and returned that integer to Python, which assigned it to the Python variable x_from_R. So that’s how you get data from R to Python: by sending a string (the variable name you want to retrieve in R) to the r object.

At first you might think this is how you send data from Python to R:

r('x_from_python') = x
#SyntaxError: can't assign to function call

Nope. Turns out you have to use the r.assign() function to do that:

r.assign('x_from_python', x)
r('print(x_from_python)')  # "I'm a Python string"

So that’s how you get data from Python to R: by using the r.assign() function, first giving the name of the variable you want to be assigned in R followed by the Python object to be sent to R.

Other data types

OK, so you can get integers back from R. And as you can imagine, strings work the same way. But what about more complex data types? This list of conversions tells you which R objects will be converted into which Python objects. It’s pretty intuitive, a string becomes a string, a list becomes a list, etc.

But then there are things like data frames in R, which have row names and column names.

It’s not on that list linked above, but an R data frame is converted to a Python dictionary. For example, the Motor Trend car data set, which comes standard in R, is a data frame.

from rpy import *
r('print(head(mtcars))') # print just the first 6 lines.  Note the variable names.

# Returns:
#                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
# Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
# Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
# Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Now send the whole thing to Python and check the keys of the dictionary that is created:

mt = r('mtcars')
mt.keys()

Note that the keys are the same as the variable names in the dataframe.

Just like you get a Python dictionary from a dataframe, you can send a dictionary to R:

r.assign('df', dict(a=1, b=2, c=3))
r('print(df)')
r('names(df)')

May have to convert it into a dataframe once inside R though:

r('df = data.frame(df)')

R functions

So far, with the exception of r.assign(), we’ve just been sending strings to the r object. But the r object also has methods. Unfortunately, you can’t see them all using IPython’s introspection. Personally I find that I don’t use this functionality that much, (I use r.assign() to get the data into R and then operate on it in there) but here it is for completeness.

There is a trick here. Remember, before we were sending a string to the r object and it was executing the code inside R:

r('x=5')

But when you use a method of the r object, you pass it raw Python objects. For example, you can plot a Python list in R using the plot() method of the r object:

x = [1,2,3]
r.plot(x)

There are some slight name changes though. R tends to use a “.” as a spacer in function names, like “_” tends to be used in Python. The “.” however is special in Python, so in method names of the r object, “.” is converted to “_”. For example, R’s t.test() function becomes r.t_test().

These methods of the r object are what Python sees, so that’s why their names have to be changed. On the other hand, you call R function with its true name when you send the r object a string, like we were doing before. So both of these refer to the same underlying t-test function in R:

r.t_test
r('t.test')

This next one is tricky. First, since print is a Python function, it needs to have a slightly different name when you want to use the version in R. So an underscore is added to the end. Second, what’s in the parentheses is a Python string. So all that will get printed is the string, ‘x’ . . . not 5, or “I’m a Python string” or anything else.

r.print_('x') # 'x'

In practice though, if I want to print something I’ll either use Python’s print or if I want to print something from R, I’ll do this:

r('print(x)')  # prints 5

Plotting examples

Here’s are a couple of examples of creating a plot. In each case a plot is created of the list 1,2,3. These are trivial examples, but they illustrate different ways of getting data to and from R.

Option 1: Do everything in R

You can execute arbitrary R commands by sending them as a string to the r object. Here, everything is done in R: a list is created and plotted. In this example, the variable x is never seen by Python.

from rpy import *
r("""
y = c(1,2,3)
plot(y)
""")

Note that you can send many R commands in a multi-line string.

Option 2: Use a method of the r object

Here, we start with a Python list, and then send it as the argument to the r.plot() method.

from  rpy import *
y = [1,2,3]
r.plot(y)

Option 3: Get a list from R and plot it with matplotlib in Python

This trivial because you don’t gain anything from making a list in R instead of Python, but it shows that you can send data both ways.

from r import *
import pylab as p
y = r('c(1,2,3)')
p.plot(y)
p.show()

Option 4: Use r.assign() to get data to R, then call it inside R

I tend to use this method a lot with large data sets. The idea is to pass the data into R once, then you can use it from inside R. The trick is to use the r.assign() method.

from rpy import *
y = [1,2,3]
r.assign('Y', y)
r('plot(Y)')

Getting help on R functions

Use the r.help() function. For example, to view the help on anova:

r.help(anova)

This displays the help on screen; it doesn’t return a string.

Non-trivial examples

Plotting and printing things are not what you’d want to use R and RPy for. Instead, you’d want to use them for things that you can’t do in available packages for Python.

Here are some examples where R can really fill in the gaps in Python’s statistical functionality. Anything you can do in R, you can do from Python. Given the wide variety of packages available for R, this is some stupendous power at your fingertips. Now to learn how to wield it!

Linear models in R

Say I have a Python script already up and running, and it returns some data . . . and I want to know if the slope of two variables is significant. I haven’t found any statistics libraries for Python, but in R this kind of functionality comes standard, in the function lm().

Viewing the help for lm(), you can see that it takes a model specification, like “y~x” which means “y on x”. Now, the components of this model specification, y and x, can either refer to variables in the R workspace (which is separate from Python, remember) or they can be variables in a dataframe which is supplied in an optional argument to lm().

So first we need to figure out how to send the data to R; performing the linear regression should be trivial, then we need to get the data back out.

First, let’s set up some test data in Python:

import numpy as npy
x = npy.arange(10)
y = npy.arange(10) + npy.random.standard_normal(x.shape)

Now send it to R:

r.assign('x',x)
r.assign('y',y)

(exercise for the reader: instead of assigning x and y individually, how would you get them into R as a dataframe?)

In R, run the linear model and save it as a variable in R. Here, I’m simultaneously saving it as a Python dictionary (sneaky!)

LM = r('linear_model = lm(y~x)')

OK, here’s where it take a little exploring. The dictionary you get back may take some navigating. Looking at it for a little bit, you might notice the ‘coefficients’ key of the dictionary LM, which in turn has two more keys: ‘(Intercept)’ and ‘x’.

{'assign': [0, 1],
 'call': ,
 'coefficients': {'(Intercept)': 0.28490682478866736,
                  'x': 0.86209804871669171},
 'df.residual': 8,
 'effects': array([-13.16882479,   7.83039439,   1.22245056,   0.18398967,
         0.51108108,   0.8141431 ,  -0.45120018,  -1.1985602 ,
         1.54636612,   0.51341949]),
 'fitted.values': array([ 0.28490682,  1.14700487,  2.00910292,  2.87120097,  3.73329902,
        4.59539707,  5.45749512,  6.31959317,  7.18169121,  8.04378926]),
 'model': {'x': array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.]),
           'y': array([-0.64212347,  1.39389811,  3.06676323,  2.84957073,  3.99793052,
        5.12226093,  4.67818603,  4.7520944 ,  8.3182891 ,  8.10661086])},
 'qr': {'pivot': [1, 2],
        'qr': array([[ -3.16227766, -14.23024947],
       [  0.31622777,   9.08295106],
       [  0.31622777,   0.15621147],
       [  0.31622777,   0.0461151 ],
       [  0.31622777,  -0.06398128],
       [  0.31622777,  -0.17407766],
       [  0.31622777,  -0.28417403],
       [  0.31622777,  -0.39427041],
       [  0.31622777,  -0.50436679],
       [  0.31622777,  -0.61446316]]),
        'qraux': [1.316227766016838, 1.2663078500948464],
        'rank': 2,
        'tol': 9.9999999999999995e-08},
 'rank': 2,
 'residuals': array([-0.92703029,  0.24689324,  1.05766031, -0.02163025,  0.2646315 ,
        0.52686386, -0.77930909, -1.56749877,  1.13659789,  0.0628216 ]),
 'terms': ,
 'xlevels': {}}

So if all we were after were the slope and intercept, then

slope = LM['coefficients']['x']
intercept = LM['coefficients']['(Intercept)']

But what about a P-value for the slope? It’s nowhere to be seen in that dictionary. Turns out, you need the summary() function in R, and it takes as its input a linear model (among other possible inputs, but here we’re just using a linear model). So save it in R (just in case) and simultaneously save it in Python:

summary = r('LM_summary = summary(linear_model)')

Hmm.

{'adj.r.squared': 0.88847497651170382,
 'aliased': {'(Intercept)': False, 'x': False},
 'call': ,
 'coefficients': array([[  2.84906825e-01,   5.39776217e-01,   5.27823968e-01,
          6.11943659e-01],
       [  8.62098049e-01,   1.01109349e-01,   8.52639301e+00,
          2.75251311e-05]]),
 'cov.unscaled': array([[ 0.34545455, -0.05454545],
       [-0.05454545,  0.01212121]]),
 'df': [2, 8, 2],
 'fstatistic': {'dendf': 8.0, 'numdf': 1.0, 'value': 72.699377758431851},
 'r.squared': 0.90086664578818121,
 'residuals': array([-0.92703029,  0.24689324,  1.05766031, -0.02163025,  0.2646315 ,
        0.52686386, -0.77930909, -1.56749877,  1.13659789,  0.0628216 ]),
 'sigma': 0.9183712712215929,
 'terms': }

There’s the r-squared and adjusted r-squared,

R_squared = summary['adj.r.squared']

but no P value. What gives? Turns out Python can’t convert everything perfectly, and a little more exploration is in order. Try printing the summary from R:

r('print(LM_summary)')

Well, that makes more sense, and you can see the P value for the slope is 2.75E-5. But how to extract it from Python?

Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max
-1.5675 -0.5899  0.1549  0.4613  1.1366 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.2849     0.5398   0.528    0.612
x             0.8621     0.1011   8.526 2.75e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.9184 on 8 degrees of freedom
Multiple R-squared: 0.9009,	Adjusted R-squared: 0.8885
F-statistic:  72.7 on 1 and 8 DF,  p-value: 2.753e-05

The trick is to match output from the summary printout in R with the dictionary returned to Python. Here, it looks like the key ‘coefficients’ in the summary dictionary in Python gives the numbers in the 2nd row, 3rd column:

P = summary['coefficients'][1,2]

Whew, and there you have it. See, it takes some digging around to get what you need, but now since I’ve done the work for you, you can now do linear regressions from Python. All together it looks like this (can be wrapped in a function or class for your own reuse):

r.assign('x', x)
r.assign('y', y)
LM = r('linear_model = lm(y~x)')
summary = r('summary_LM = summary(linear_model)')
slope = LM['coefficients']['x']
intercept = LM['coefficients']['(Intercept)']
P = summary['coefficients'][1,2]

Redundancy analysis

OK, say you have this data set to perform redundancy analysis (RDA) on. First, you need the package vegan installed, which is fantastic for multivariate stats. It’s probably best to fire up R proper (from a command line, or the GUI if you have it in Windows or OSX) and run

install.packages("vegan", dep=T)

Here’s a heavily commented script, rpy-demo.py, that will:

  • load and format the data included in the script
  • send the data to R
  • perform an RDA in R
  • plot the ordination
  • save the ordination as a PNG
  • print the variance explained by constrained and unconstrained axes as well as each RDA axis.

If you have RPy installed and the vegan package installed, you should be able to just run this Python script.

Often-run analyses that you need R for can be wrapped in a class or module to encapsulate your data analysis needs, so you don’t need to clutter your code with it. Once things are set up that way, it would be as easy as

from myRstuff import lm, rda
results = lm(x,y)
ordination = rda(data)

For much, much more see the online documentation for RPy, but hopefully I gave you enough to at least get started.

Polar bar plot in Python

Sunday, July 20th, 2008

Here’s how to create a polar bar plot in matplotlib.


The trick is just to specify that you want polar coordinates when you create the axis. Then create a bar plot as normal.


from matplotlib.pyplot import figure, show
from math import pi

fig = figure()
ax = fig.add_subplot(111, polar=True)
x = [30,60,90,120,150,180]
x = [i*pi/180 for i in x]  # convert to radians

ax.bar(x,[1,2,3,4,5,6], width=0.4)
show()

Note that in the above example the “right” or “clockwise-most” edge is lined up with each specified x value. You can change this by subtracting width / 2 to each of the x values to center the bars on the x-values, like this:


from matplotlib.pyplot import figure, show
from math import pi

width = 0.4  # width of the bars (in radians)

fig = figure()
ax = fig.add_subplot(111, polar=True)
x = [30,60,90,120,150,180]

# Convert to radians and subtract half the width
# of a bar to center it.
x = [i*pi/180 - width/2 for i in x]
ax.bar(x,[1,2,3,4,5,6], width=width)
show()

Get funky . . .

The following is slightly modifed from the matplotlib examples:


import numpy as npy
import matplotlib.cm as cm
from matplotlib.pyplot import figure, show, rc

# force square figure and square axes (looks better for polar, IMHO)
fig = figure(figsize=(8,8))
ax = fig.add_axes([0.1, 0.1, 0.8, 0.8], polar=True)

N = 20
theta = npy.arange(0.0, 2*npy.pi, 2*npy.pi/N)  # random angles
radii = 10*npy.random.rand(N)  # random bar heights
width = npy.pi/4*npy.random.rand(N) # random widths

# Create the bar plot
bars = ax.bar(theta, radii, width=width, bottom=0.0)

# Step through bars (a list of Rectangle objects) and
# change color based on its height and set its alpha transparency
# to 0.5

for r,bar in zip(radii, bars):
    bar.set_facecolor( cm.jet(r/10.))
    bar.set_alpha(0.5)

show()

Interactive subplots: make all x-axes move together

Saturday, May 3rd, 2008

It’s very easy to make subplots that share an x-axis, so that when you pan and zoom on one axis, the others automatically pan and zoom as well. The key to this functionality is the sharex keyword argument, which is used when creating an axis. Here’s some example code and a video of the resulting interaction. (more…)

Interactively select points from a plot in matplotlib

Monday, January 28th, 2008

In Matlab, there is a very useful function called ginput(). There is no such ready-built function for matplotlib/pylab, but all the pieces are there. Copy and paste this code to get ginput functionality.
(more…)

Errorbars in matplotlib

Sunday, January 27th, 2008

Here’s how to plot x or y errorbars (or both) and how to customize the resulting plot.
(more…)

Adjust settings for matplotlib using rc and matplotlibrc

Monday, January 21st, 2008

The matplotlibrc file contains many useful parameters for tweaking your setup to your liking, and it’s worth at least skimming through to get an idea of what it contains. Editing the file makes more permanent changes, while using the pylab rcParams dictionary or rc() and rcdefaults() functions lets you make and revert changes on the fly.

View the rc parameters by using

import pylab as p
print p.rcParams

Examples ensue.
(more…)

Change distance of tick labels from axis

Thursday, January 17th, 2008

Set the rc parameters using the rc function.


import pylab as p
p.rc(('xtick.major','xtick.minor','ytick.major','ytick.minor'), pad=10)
p.plot([1,2,3])

If you only want to change one of the tick labels, say, the x major ticks, use
p.rc(’xtick.major), pad=10)

When you’re done and want to reset the rc settings, use

p.rcdefaults()

See matplotlibrc for more settings you can change via the rc command.

Interacting with figures in Python

Thursday, January 17th, 2008

Here is some code from the matplotlib mailing list, sent by Rob Hetland, for selecting points from a plot.


from matplotlib.pyplot import *

class ginput(object):
    """docstring for on_click"""

    def __init__(self):
        self.x = []
        self.y = []
        connect('button_press_event', self)

    def __call__(self, event):
        xd, yd = event.xdata, event.ydata
        if event.button==1:
            if event.inaxes is not None:
                # print 'data coords', event.xdata, event.ydata
                self.x.append(xd)
                self.y.append(yd)
                plot((xd,), (yd,), 'r+', ms=5)

map_points = ginput()

#!/usr/bin/env python
# encoding: utf-8
"""Polygon geometry.

Copyright (C) 2006, Robert Hetland
Copyright (C) 2006, Stefan van der Walt

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

1. Redistributions of source code must retain the above copyright
   notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
   notice, this list of conditions and the following disclaimer in the
   documentation and/or other materials provided with the
   distribution.
3. The name of the author may not be used to endorse or promote
   products derived from this software without specific prior written
   permission.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
"""

import numpy as np
import sys

try:
    import scipy.weave as weave

    def npnpoly(verts,points):
        verts = verts.astype(np.float64)
        points = points.astype(np.float64)

        xp = np.ascontiguousarray(verts[:,0])
        yp = np.ascontiguousarray(verts[:,1])
        x = np.ascontiguousarray(points[:,0])
        y = np.ascontiguousarray(points[:,1])
        out = np.empty(len(points),dtype=np.uint8)

        code = """
        /* Code from:
           http://www.ecse.rpi.edu/Homepages/wrf/Research/Short_Notes/pnpoly.html

           Copyright (c) 1970-2003, Wm. Randolph Franklin

           Permission is hereby granted, free of charge, to any person
           obtaining a copy of this software and associated documentation
           files (the "Software"), to deal in the Software without
           restriction, including without limitation the rights to use, copy,
           modify, merge, publish, distribute, sublicense, and/or sell copies
           of the Software, and to permit persons to whom the Software is
           furnished to do so, subject to the following conditions:

        	1. Redistributions of source code must retain the above
                 copyright notice, this list of conditions and the following
                 disclaimers.
        	2. Redistributions in binary form must reproduce the above
                 copyright notice in the documentation and/or other materials
                 provided with the distribution.
        	3. The name of W. Randolph Franklin may not be used to endorse
                 or promote products derived from this Software without
                 specific prior written permission.

           THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
           EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
           MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
           NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
           BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
           ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
           CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
           SOFTWARE. */

        int i,j,n;
        unsigned int c;
        int nr_verts = Nxp[0];
        for (n = 0; n < Nx[0]; n++) {
            c = 0;
        	for (i = 0, j = nr_verts-1; i < nr_verts; j = i++) {
                if ((((yp(i)<=y(n)) && (y(n)= 3, 'Need 3 vertices to create polygon.'

        # close polygon, if needed
        if not np.all(verts[0]==verts[-1]):
            verts = np.vstack((verts,verts[0]))

        self.verts = verts

        return verts.view(Polygeom).copy()

    def inside(self,points):
        points = np.atleast_2d(points)
        assert points.shape[1] == 2, \
               "Points should be of shape Nx2, is %s" % str(points.shape)
        return npnpoly(self.verts,points).astype(bool)

    def get_area(self):
        """
        Return the area of the polygon.

        From Paul Bourke's webpage:
          http://astronomy.swin.edu.au/~pbourke/geometry
        """
        v = self.verts
        v_first = v[:-1][:,[1,0]]
        v_second = v[1:]
        return np.diff(v_first*v_second).sum()/2.0

    def get_centroid(self):
        "Return the centroid of the polygon"
        v = self.verts
        a = np.diff(v[:-1][:,[1,0]]*v[1:])
        area = a.sum()/2.0
        return ((v[:-1,:] + v[1:,:])*a).sum(axis=0) / (6.0*area)

    area = property(get_area)
    centroid = property(get_centroid)

if __name__ == '__main__':
    import pylab as pl
    grid = np.mgrid[0:1:10j,0:1:10j].reshape(2,-1).swapaxes(0,1)

    # simple area test
    verts = np.array([[0.15,0.15],
                      [0.85,0.15],
                      [0.85,0.85],
                      [0.15,0.85]])
    pa = Polygeom(verts)
    print pa.area
    print (0.85-0.15)**2

    print pa

    print pa.centroid

    # concave enclosure test-case for inside.
    verts = np.array([[0.15,0.15],
                      [0.25,0.15],
                      [0.45,0.15],
                      [0.45,0.25],
                      [0.25,0.25],
                      [0.25,0.55],
                      [0.65,0.55],
                      [0.65,0.15],
                      [0.85,0.15],
                      [0.85,0.85],
                      [0.15,0.85]])
    pb = Polygeom(verts)
    inside = pb.inside(grid)
    pl.plot(grid[:,0][inside], grid[:,1][inside], 'g.')
    pl.plot(grid[:,0][~inside], grid[:,1][~inside],'r.')
    pl.plot(pb.verts[:,0],pb.verts[:,1], '-k')
    print pb.centroid
    xc, yc = pb.centroid
    print xc, yc
    pl.plot([xc], [yc], 'co')
    pl.show()

    pl.figure()
    # many points in a semicircle, to test speed.
    grid = np.mgrid[0:1:1000j,0:1:1000j].reshape(2,-1).swapaxes(0,1)
    xp = np.sin(np.arange(0,np.pi,0.01))
    yp = np.cos(np.arange(0,np.pi,0.01))
    pc = Polygeom(np.hstack([xp[:,np.newaxis],yp[:,np.newaxis]]))
    print "%d points inside %d vertex poly..." % (grid.size/2,len(verts)),
    sys.stdout.flush()
    inside = pc.inside(grid)
    print "done."
    pl.plot(grid[:,0][inside], grid[:,1][inside], 'g+')
    pl.plot(grid[:,0][~inside], grid[:,1][~inside], 'r.')
    pl.plot(pc.verts[:,0], pc.verts[:,1], '-k')
    xc, yc = pc.centroid
    print xc, yc
    pl.plot([xc], [yc], 'co')
    pl.show()

Timezones in Python

Thursday, January 10th, 2008
  • I had a Python datetime object, T.
  • I knew it was in Eastern time and wanted to convert to UTC.
  • I also wanted it to keep track of daylight savings time, for when I have dates later in the year.
  • Python datetime objects have “support” for timezone info, but you have to implement it yourself. Read: more coding overhead that I’m trying to avoid.

Here’s how to do it the easy way. (more…)

Create a second y-axis in matplotlib

Sunday, December 2nd, 2007

Today I was trying to figure out how to plot two time series with differently scaled values. I found two_scales.py example in the matplotlib examples which describes how to do it.

Here’s a slightly simplified version of the code, and afterward a detailed explanation of what’s going on. (more…)