Python Pandas - Remove values from first dataframe if not in second dataframe -

- June 15, 2012

i have user/item data recommender. i'm splitting test , train data, , need sure new users or items in test data omitted before evaluating recommender. approach works small datasets, when gets big, takes ever. there better way this?

# test set removing users or items not in train te = pd.dataframe({'user': [1,2,3,1,6,1], 'item':[16,12,19,15,13,12]}) tr = pd.dataframe({'user': [1,2,3,4,5], 'item':[11,12,13,14,15]}) print "training_______" print tr print "\ntesting_______" print te  # using 2 joins , selecting proper indices, 'new' members of test set removed b = pd.merge( pd.merge(te,tr, on='user', suffixes=['', '_d']) , tr, on='item', suffixes=['', '_d'])[['user', 'item']] print "\nsolution_______" print b

gives:

training_______    item  user 0    11     1 1    12     2 2    13     3 3    14     4 4    15     5  testing_______    item  user 0    16     1 1    12     2 2    19     3 3    15     1 4    13     6 5    12     1  solution_______    user  item 0     1    15 1     1    12 2     2    12

the solution correct (any new users or items cause whole row removed test. slow @ scale.

thanks in advance.

i think can achieve want using isin series method on each of columns:

in [11]: te['item'].isin(tr['item']) & te['user'].isin(tr['user']) out[11]: 0    false 1     true 2    false 3     true 4    false 5     true dtype: bool  in [12]: te[te['item'].isin(tr['item']) & te['user'].isin(tr['user'])] out[12]:    item  user 1    12     2 3    15     1 5    12     1

in 0.13 you'll able use new dataframe isin method (on current master):

in [21]: te[te.isin(tr.to_dict(outtype='list')).all(1)] out[21]:    item  user 1    12     2 3    15     1 5    12     1

hopefully release syntax should bit better on release:

te[te.isin(tr).all(1)]

Search This Blog

You

Python Pandas - Remove values from first dataframe if not in second dataframe -

Comments

Post a Comment

Popular posts from this blog

Need help in packaging app using TideSDK on Windows -

java - Why does my date parsing return a weird date? -

plot - Remove Objects from Legend When You Have Also Used Fit, Matlab -