python - pandas - merging with missing values -


there appears quirk pandas merge function. considers nan values equal, , merge nans other nans:

>>> foo = dataframe([     ['a',1,2],     ['b',4,5],     ['c',7,8],     [np.nan,10,11] ], columns=['id','x','y'])  >>> bar = dataframe([     ['a',3],     ['c',9],     [np.nan,12] ], columns=['id','z'])  >>> pd.merge(foo, bar, how='left', on='id') out[428]:      id   x   y   z 0      1   2   3 1    b   4   5 nan 2    c   7   8   9 3  nan  10  11  12  [4 rows x 4 columns] 

this unlike rdb i've seen, missing values treated agnosticism , won't merged if equal. problematic datasets sparse data (every nan merged every other nan, resulting in huge dataframe!)

is there way ignore missing values during merge without first slicing them out?

you exclude values bar (and indeed foo if wanted) id null during merge. not sure it's you're after, though, sliced out.

(i've assumed left join you're interested in retaining of foo, want merge parts of bar match , not null.)

foo.merge(bar[pd.notnull(bar.id)], how='left', on='id')  out[11]:  id   x   y   z 0      1   2   3 1    b   4   5 nan 2    c   7   8   9 3  nan  10  11 nan 

Comments

Popular posts from this blog

c++ - OpenCV Error: Assertion failed <scn == 3 ::scn == 4> in unknown function, -

php - render data via PDO::FETCH_FUNC vs loop -

The canvas has been tainted by cross-origin data in chrome only -