reshape - Reproduce a datset to different format in R -
i have dataset data
below:
dput(data) structure(list(fn = structure(c(1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l), .label = "20131202-0985 ", class = "factor"), values = structure(c(1l, 8l, 7l, 6l, 5l, 9l, 2l, 4l, 3l), .label = c("|639778|21|nanyang circle|103.686721631628|1.34640300329567", "|8121|b01|somerset stn", "|96942883", "|sn30|smrt\n", "central", "four seasons hotel", "hotel", "ikea", "nanyang avenue"), class = "factor"), ind = structure(c(4l, 1l, 1l, 1l, 1l, 6l, 3l, 2l, 5l), .label = c("bn", "br", "bs", "loc", "pn", "rn"), class = "factor")), .names = c("fn", "values", "ind"), class = "data.frame", row.names = c(na, -9l ))
wanted above dataset converted in below format data frame(
out_data
). presently data
has 3 columns - , need covert these 16 columns in below format. need rehape input - given in screenshot data frame. cannot change below structure -
colnames(out_data) <- ("fn","h_blk","s_n/r_n","b_n","fl_n","u_n","pc","xc","yc","bs","brf","lct_dec","brn","bo pn","s_ty_cd")
the multiple value columns in inputnand in below format:
|639778|21|nanyang circle|103.686721631628|1.34640300329567
-|pc|h_blk|s_n/r_n|xc|yc
|8121|b01|somerset stn
->|bs|brf|lct_dec
|sn30|smrt
------>|brn|bo
if the
ind =loc - |pc|h_blk|s_n/r_n|xc|yc` updated s_ty_cd=loc ind= bn - b_n column should updated s_ty_cd=bn ind= rn - _n/r_n column should updated s_ty_cd=rn ind= bs `|bs|brf|lct_dec` should updated s_ty_cd=bs ind= br `|brn|bo` should updated s_ty_cd=br ind= pn pn s_ty_cd=pn
is there efficient way of doing this.
here's 1 method of transformation. first define helper functions various sub problems.
#define out cols outcols<-c("fn", "h_blk", "s_n/r_n", "b_n", "fl_n", "u_n", "pc", "xc", "yc", "bs", "brf", "lct_dec", "brn","bo","pn","s_ty_cd") #identify parts each compound value namevals <- function(ind, vals) { names<-if (ind=="loc") { c("pc","h_blk","s_n/r_n","xc","yc") } else if (ind=="bn") { c("b_n") } else if (ind=="rn") { c("s_n/r_n") } else if (ind=="bs") { c("bs","brf","lct_dec") } else if (ind=="br") { c("brn","bo") } else if (ind=="pn") { c("pn") } stopifnot(length(names)==length(vals)) stopifnot(all(names %in% outcols)) names(vals)<-names vals } #add missing values row fillrow <- function(nvals) { r<-rep(na, length(outcols)) r[match(names(nvals), outcols)]<-nvals r }
now apply these each row of data mapply
return character vector. here make sure split "values" column on pipe , remove leading pipe.
#combine rows character matrix dt<-mapply(function(fn,vals,ind){ x<-c(fn=fn,namevals(ind, vals), "s_ty_cd"=ind) fillrow(x) }, as.character(data$fn), strsplit(gsub("^\\|","",as.character(data$values)),"|", fixed=t), as.character(data$ind) )
finally tidy data can written out file write.table
. note missing values true r na
values. in write.table
, can set na = ""
if you'd rather print out blank values default "na" value.
#turn matrix data.frame proper names dd<-data.frame(unname(t(dt)), stringsasfactors=f) names(dd)<-outcols dd
Comments
Post a Comment