reshape package:stats R Documentation _R_e_s_h_a_p_e _G_r_o_u_p_e_d _D_a_t_a _D_e_s_c_r_i_p_t_i_o_n: This function reshapes a data frame between 'wide' format with repeated measurements in separate columns of the same record and 'long' format with the repeated measurements in separate records. _U_s_a_g_e: reshape(data, varying = NULL, v.names = NULL, timevar = "time", idvar = "id", ids = 1:NROW(data), times = seq_along(varying[[1]]), drop = NULL, direction, new.row.names = NULL, sep = ".", split = if (sep==""){ list(regexp="[A-Za-z][0-9]",include=TRUE) } else { list(regexp=sep, include= FALSE, fixed=TRUE)} ) _A_r_g_u_m_e_n_t_s: data: a data frame varying: names of sets of variables in the wide format that correspond to single variables in long format ('time-varying'). This is canonically a list of vectors of variable names, but it can optionally be a matrix of names, or a single vector of names. In each case, the names can be replaced by indexes which are interpreted as referring to 'names(data)'. See below for more details and options. v.names: names of variables in the long format that correspond to multiple variables in the wide format. See below for details. timevar: the variable in long format that differentiates multiple records from the same group or individual. idvar: Names of one or more variables in long format that identify multiple records from the same group/individual. These variables may also be present in wide format ids: the values to use for a newly created 'idvar' variable in long format. times: the values to use for a newly created 'timevar' variable in long format. See below for details. drop: a vector of names of variables to drop before reshaping direction: character string, either '"wide"' to reshape to wide format, or '"long"' to reshape to long format. new.row.names: logical; if 'TRUE' and 'direction="wide"', create new row names in long format from the values of the id and time variables. sep: A character vector of length 1, indicating a separating character in the variable names in the wide format. This is used for guessing 'v.names' and 'times' arguments based on the names in 'varying'. If 'sep==""', the split is just before the first numeral that follows an alphabetic character. split: A list with three components, 'regexp', 'include', and (optionally) 'fixed'. This allows an extended interface to variable name splitting. See below for details. _D_e_t_a_i_l_s: The arguments to this function are described in terms of longitudinal data, as that is the application motivating the functions. A 'wide' longitudinal dataset will have one record for each individual with some time-constant variables that occupy single columns and some time-varying variables that occupy a column for each time point. In 'long' format there will be multiple records for each individual, with some variables being constant across these records and others varying across the records. A 'long' format dataset also needs a 'time' variable identifying which time point each record comes from and an 'id' variable showing which records refer to the same person. If the data frame resulted from a previous 'reshape' then the operation can be reversed simply by 'reshape(a)'. The 'direction' argument is optional and the other arguments are stored as attributes on the data frame. If 'direction="wide"' and no 'varying' or 'v.names' arguments are supplied it is assumed that all variables except 'idvar' and 'timevar' are time-varying. They are all expanded into multiple variables in wide format. If 'direction="long"' the 'varying' argument can be a vector of column names (or a corresponding index). The function will attempt to guess the 'v.names' and 'times' from these names. The default is variable names like 'x.1', 'x.2', where 'sep="."' specifies to split at the dot and drop it from the name. To have alphabetic followed by numeric times use 'sep=""'. Variable name splitting as described above is only attempted in the case where 'varying' is an atomic vector, if it is a list or a matrix, 'v.names' and 'times' will generally need to be specified, although they will default to, respectively, the first variable name in each set, and sequential times. Also, guessing is not attempted if 'v.names' is given explicitly. Notice that the order of variables in 'varying' is like 'x.1','y.1','x.2','y.2'. The 'split' argument should not usually be necessary. The 'split$regexp' component is passed to either 'strsplit()' or 'regexp()', where the latter is used if 'split$include' is 'TRUE', in which case the splitting occurs after the first character of the matched string. In the 'strsplit()' case, the separator is not included in the result, and it is possible to specify fixed-string matching using 'split$fixed'. _V_a_l_u_e: The reshaped data frame with added attributes to simplify reshaping back to the original form. _S_e_e _A_l_s_o: 'stack', 'aperm'; 'relist' for reshaping the result of 'unlist'. _E_x_a_m_p_l_e_s: summary(Indometh) wide <- reshape(Indometh, v.names="conc", idvar="Subject", timevar="time", direction="wide") wide reshape(wide, direction="long") reshape(wide, idvar="Subject", varying=list(2:12), v.names="conc", direction="long") ## times need not be numeric df <- data.frame(id=rep(1:4,rep(2,4)), visit=I(rep(c("Before","After"),4)), x=rnorm(4), y=runif(4)) df reshape(df, timevar="visit", idvar="id", direction="wide") ## warns that y is really varying reshape(df, timevar="visit", idvar="id", direction="wide", v.names="x") ## unbalanced 'long' data leads to NA fill in 'wide' form df2 <- df[1:7,] df2 reshape(df2, timevar="visit", idvar="id", direction="wide") ## Alternative regular expressions for guessing names df3 <- data.frame(id=1:4, age=c(40,50,60,50), dose1=c(1,2,1,2), dose2=c(2,1,2,1), dose4=c(3,3,3,3)) reshape(df3, direction="long", varying=3:5, sep="") ## an example that isn't longitudinal data state.x77 <- as.data.frame(state.x77) long <- reshape(state.x77, idvar="state", ids=row.names(state.x77), times=names(state.x77), timevar="Characteristic", varying=list(names(state.x77)), direction="long") reshape(long, direction="wide") reshape(long, direction="wide", new.row.names=unique(long$state)) ## multiple id variables df3 <- data.frame(school=rep(1:3,each=4), class=rep(9:10,6), time=rep(c(1,1,2,2),3), score=rnorm(12)) wide <- reshape(df3, idvar=c("school","class"), direction="wide") wide ## transform back reshape(wide)