Restructure or reformat dataframes in R with melt
The basic idea is to take an R dataframe like this one containing abundance of three species at each site, and elevation at each site
site sp1 sp2 sp3 elev
a 3 4 9 100
b 1 8 10 210
c 4 8 15 165
and reorganize into something like this (perhaps so we can do an ANOVA using species as a factor):
site elev sp abundance
a 100 sp1 3
a 100 sp2 4
a 100 sp3 9
b 210 sp1 1
b 210 sp2 8
b 210 sp3 10
c 165 sp1 4
c 165 sp2 8
c 165 sp3 15
Assuming the first dataframe above is called d, the second dataframe can be obtained using the following code:
> library(ggplot2)
> m = melt(d, id=c('site','elev'))
melt works like this: You specify the ID variables, which are those variables that will REMAIN as dataframe variables. Any others will be considered measured variables. If it’s easier for your data, you can do it the other way: specify the measured variables and the others will be considered ID variables.
Melting results in two new variables, variable and value. variable contains the names of the original columns of the dataframe as factors, and value contains the corresponding values.
Another example
Here’s another example using the built-in dataset, airquality. First, unmelted:
# make all the variable names lowercase
names(airquality) <- tolower(names(airquality))
head(airquality)
ozone solar.r wind temp month day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
and melted:
> head(melt(airquality,id=c('month','day')))
month day variable value
1 5 1 ozone 41
2 5 2 ozone 36
3 5 3 ozone 12
4 5 4 ozone 18
5 5 5 ozone NA
6 5 6 ozone 28
Tags: data management, dataframe, melt, R