r - Creating a new column in a data frame whose entries depend on multiple columns in a another data frame -
i want make new column in data set values determined values in data set, it's not simple values in 1 column being function of values in other. here's example:
>df1 chromosome position 1 1 1 2 1 2 3 1 4 4 1 5 5 1 7 6 1 12 7 1 13 8 1 15 9 1 21 10 1 23 11 1 24 12 2 1 13 2 5 14 2 7 15 2 8 16 2 12 17 2 15 18 2 18 19 2 21 20 2 22
and
>df2 chromosome segment_start segment_end segment.number 1 1 1 5 1.1 2 1 6 20 1.2 3 1 21 25 1.3 4 2 1 7 2.1 5 2 8 16 2.2 6 2 18 22 2.3
i want make new column in df1 called 'segment', , value in segment determined segment (as determined 'segment_start', 'segment_end', , 'chromosome' df2) value in 'position' belongs to. example, in df1, row 7, position=13, , chromosome=1. because 13 between 6 , 20, entry in hypothetical 'segment' column 1.2, row 2 of df2, because 13 falls between segment_start , segment_end row (6 , 20, respectively), , 'chromosome' value df1 row 7 1, 'chromosome' in df2 row 2 1.
each row in df1 belongs 1 of segments described in df2; is, lies on same chromosome 1 of segments, , 'position' >=segment_start , <=segment_end. , want information df1, says segment each position belongs to.
i thinking of using if function, , started with:
if(df1$position>=df2$segment_start & df1$position<=df2$segment_end & df1$chromosome==df2$chromosome) df1$segment<-df2$segment.number
but not sure way feasible. if nothing else maybe code can illustrate i'm trying do. basically, want match each row position , chromosome segment in df2. thanks.
this appears rolling join. can use data.table
this
require(data.table) dt1 <- data.table(df1, key = c('chromosome','position')) dt2 <- data.table(df2, key = c('chromosome','section_start')) # perform join want (but retain # columns names names of dt2) # dt2[dt1, roll=true] # why have renamed , subset here) dt2[dt1, roll=true][ ,list(chromosome,position = segment_start,segment.number)] # chromosome position segment.number # 1: 1 1 1.1 # 2: 1 2 1.1 # 3: 1 4 1.1 # 4: 1 5 1.1 # 5: 1 7 1.2 # 6: 1 12 1.2 # 7: 1 13 1.2 # 8: 1 15 1.2 # 9: 1 21 1.3 # 10: 1 23 1.3 # 11: 1 24 1.3 # 12: 2 1 2.1 # 13: 2 5 2.1 # 14: 2 7 2.1 # 15: 2 8 2.2 # 16: 2 12 2.2 # 17: 2 15 2.2 # 18: 2 18 2.3 # 19: 2 21 2.3 # 20: 2 22 2.3
Comments
Post a Comment