r - How to delete rows from a dataframe that contain n*NA -
i have number of large datasets ~10 columns, , ~200000 rows. not columns contain values each row, although @ least 1 column must contain value row present, set threshold how many nas allowed in row.
my dataframe looks this:
id q r s t u v w x y z 1 5 na 3 8 9 na 8 6 4 b 5 na 4 6 1 9 7 4 9 3 c na 9 4 na 4 8 4 na 5 na d 2 2 6 8 4 na 3 7 1 32 and able delete rows contain more 2 cells containing na get
id q r s t u v w x y z 1 5 na 3 8 9 na 8 6 4 b 5 na 4 6 1 9 7 4 9 3 d 2 2 6 8 4 na 3 7 1 32 complete.cases removes rows containing na, , know 1 can delete rows contain na in columns there way modify non-specific columns contain na, how many of total do?
alternatively, dataframe generated merging several dataframes using
file1<-read.delim("~/file1.txt") file2<-read.delim(file=args[1]) file1<-merge(file1,file2,by="chr.pos",all=true) perhaps merge function altered?
thanks
use rowsums. remove rows data frame (df) contain precisely n na values:
df <- df[rowsums(is.na(df)) != n, ] or remove rows contain n or more na values:
df <- df[rowsums(is.na(df)) < n, ] in both cases of course replacing n number that's required
Comments
Post a Comment