python - pandas - merging with missing values -
there appears quirk pandas merge function. considers nan values equal, , merge nans other nans:
>>> foo = dataframe([ ['a',1,2], ['b',4,5], ['c',7,8], [np.nan,10,11] ], columns=['id','x','y']) >>> bar = dataframe([ ['a',3], ['c',9], [np.nan,12] ], columns=['id','z']) >>> pd.merge(foo, bar, how='left', on='id') out[428]: id x y z 0 1 2 3 1 b 4 5 nan 2 c 7 8 9 3 nan 10 11 12 [4 rows x 4 columns] this unlike rdb i've seen, missing values treated agnosticism , won't merged if equal. problematic datasets sparse data (every nan merged every other nan, resulting in huge dataframe!)
is there way ignore missing values during merge without first slicing them out?
you exclude values bar (and indeed foo if wanted) id null during merge. not sure it's you're after, though, sliced out.
(i've assumed left join you're interested in retaining of foo, want merge parts of bar match , not null.)
foo.merge(bar[pd.notnull(bar.id)], how='left', on='id') out[11]: id x y z 0 1 2 3 1 b 4 5 nan 2 c 7 8 9 3 nan 10 11 nan
Comments
Post a Comment