r - How to delete rows from a dataframe that contain n*NA -
i have number of large datasets ~10 columns, , ~200000 rows. not columns contain values each row, although @ least 1 column must contain value row present, set threshold how many na
s allowed in row.
my dataframe looks this:
id q r s t u v w x y z 1 5 na 3 8 9 na 8 6 4 b 5 na 4 6 1 9 7 4 9 3 c na 9 4 na 4 8 4 na 5 na d 2 2 6 8 4 na 3 7 1 32
and able delete rows contain more 2 cells containing na get
id q r s t u v w x y z 1 5 na 3 8 9 na 8 6 4 b 5 na 4 6 1 9 7 4 9 3 d 2 2 6 8 4 na 3 7 1 32
complete.cases
removes rows containing na
, , know 1 can delete rows contain na
in columns there way modify non-specific columns contain na
, how many of total do?
alternatively, dataframe generated merging several dataframes using
file1<-read.delim("~/file1.txt") file2<-read.delim(file=args[1]) file1<-merge(file1,file2,by="chr.pos",all=true)
perhaps merge function altered?
thanks
use rowsums
. remove rows data frame (df
) contain precisely n na
values:
df <- df[rowsums(is.na(df)) != n, ]
or remove rows contain n or more na
values:
df <- df[rowsums(is.na(df)) < n, ]
in both cases of course replacing n
number that's required
Comments
Post a Comment