
Question:
I have a data frame that I would like to merge from long to wide format, but I would like to have the time embedded into the variable name in the wide format. Here is an example data set with the long format:
id <- as.numeric(rep(1,16)) time <- rep(c(5,10,15,20), 4) varname <- c(rep("var1",4), rep("var2", 4), rep("var3", 4), rep("var4", 4)) value <- rnorm(16) tmpdata <- as.data.frame(cbind(id, time, varname, value)) > tmpdata id time varname value 1 5 var1 0.713888426169224 1 10 var1 1.71483653545922 1 15 var1 -1.51992072577836 1 20 var1 0.556992407683219 .... 4 20 var4 1.03752019932467
I would like this in a wide format with the following output:
id var1.5 var1.10 var1.15 var1.20 .... 1 0.71 1.71 -1.51 0.55 (and so on)
I've tried using reshape function in base R without success, and I was not sure how to accomplish this using the reshape package, as all of the examples put time as another variable in the wide format. Any ideas?
Solution:1
This is trivial with the reshape package:
library(reshape) cast(tmpdata, ... ~ varname + time)
Solution:2
I had to do it in two reshape
steps. The row headings may not be exactly what you needed, but can be renamed easily.
id <- as.numeric(rep(1, 16)) time <- rep(c(5,10,15,20), 4) varname <- c(rep("var1",4), rep("var2", 4), rep("var3", 4), rep("var4", 4)) value <- rnorm(16) tmpdata <- as.data.frame(cbind(id, time, varname, value)) first <- reshape(tmpdata, timevar="time", idvar=c("id", "varname"), direction="wide") second <- reshape(first, timevar="varname", idvar="id", direction="wide")
And the output:
> tmpdata id time varname value 1 1 5 var1 -0.231227494628982 2 1 10 var1 -1.80887236653438 3 1 15 var1 -0.443229294431553 4 1 20 var1 1.33719337048763 5 1 5 var2 0.673109282347586 6 1 10 var2 -0.42142267953938 7 1 15 var2 0.874367622725874 8 1 20 var2 -1.19917678039462 9 1 5 var3 1.13495606258399 10 1 10 var3 -0.0779385346672042 11 1 15 var3 -0.126775240288037 12 1 20 var3 -0.760739300144526 13 1 5 var4 -1.94626587907069 14 1 10 var4 1.25643195699455 15 1 15 var4 -0.50986941213717 16 1 20 var4 -1.01324846239812 > first id varname value.5 value.10 value.15 1 1 var1 -0.231227494628982 -1.80887236653438 -0.443229294431553 5 1 var2 0.673109282347586 -0.42142267953938 0.874367622725874 9 1 var3 1.13495606258399 -0.0779385346672042 -0.126775240288037 13 1 var4 -1.94626587907069 1.25643195699455 -0.50986941213717 value.20 1 1.33719337048763 5 -1.19917678039462 9 -0.760739300144526 13 -1.01324846239812 > second id value.5.var1 value.10.var1 value.15.var1 value.20.var1 1 1 -0.231227494628982 -1.80887236653438 -0.443229294431553 1.33719337048763 value.5.var2 value.10.var2 value.15.var2 value.20.var2 1 0.673109282347586 -0.42142267953938 0.874367622725874 -1.19917678039462 value.5.var3 value.10.var3 value.15.var3 value.20.var3 1 1.13495606258399 -0.0779385346672042 -0.126775240288037 -0.760739300144526 value.5.var4 value.10.var4 value.15.var4 value.20.var4 1 -1.94626587907069 1.25643195699455 -0.50986941213717 -1.01324846239812
Solution:3
I gave up on the old reshape() command 2 years ago (not Hadley's). It seems figuring that damn thing out each time was actually harder than just doing it the 'hard' way, which is much more flexible.
Your data in your example are all nicely sorted. You might have to sort your real data by var name and time first.
(renamed your tmpdata to tmp, made value numeric)
y <- lapply(split(tmp, tmp$id), function(x) x$value) df <- data.frame(unique(tmp$id,), do.call(rbind,y)) names(df) <- c('id', as.character(tmp$time:tmp$var))
Solution:4
Why not just paste varname and time together before you reshape?
Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
EmoticonEmoticon