python - pandas - merging with missing values -
there appears quirk pandas merge function. considers nan
values equal, , merge nan
s other nan
s:
>>> foo = dataframe([ ['a',1,2], ['b',4,5], ['c',7,8], [np.nan,10,11] ], columns=['id','x','y']) >>> bar = dataframe([ ['a',3], ['c',9], [np.nan,12] ], columns=['id','z']) >>> pd.merge(foo, bar, how='left', on='id') out[428]: id x y z 0 1 2 3 1 b 4 5 nan 2 c 7 8 9 3 nan 10 11 12 [4 rows x 4 columns]
this unlike rdb i've seen, missing values treated agnosticism , won't merged if equal. problematic datasets sparse data (every nan merged every other nan, resulting in huge dataframe!)
is there way ignore missing values during merge without first slicing them out?
you exclude values bar
(and indeed foo
if wanted) id
null during merge. not sure it's you're after, though, sliced out.
(i've assumed left join you're interested in retaining of foo
, want merge parts of bar
match , not null.)
foo.merge(bar[pd.notnull(bar.id)], how='left', on='id') out[11]: id x y z 0 1 2 3 1 b 4 5 nan 2 c 7 8 9 3 nan 10 11 nan
Comments
Post a Comment