bash - shell: select unique row of flat file -
i have flat file looks
cat file id1, value1_1 id1, value1_2 id1, value1_3 id2, value2_1 id2, value2_1 id3, value3_1 id3...
as can see data sample, each id, there several values id , whatever value - same or not. me, don't care value picking up. value works me.
so want 1 value each id. don't care one, if have choose, row has longest length.
id1, value1_2 id2, value2_1 id3, value3_1
it might done in python there easy way in shell itself, open use sed or awk please don't write whole paragraph of awk code, please..
it might like:
# pseudo code # sort -k 1 file | uniq (max(length) id)
thanks lot !!
this find first line each id:
awk -f, '!seen[$1]++' file
explained:
- awk associative arrays not have pre-declared, first time id encountered,
seen[$1]
have value 0 (for numeric context). seen[$1]++
post-increments associative array element, expression evaluates 0 first time id seen, , evaluates positive integer other time.- awk treats 0 false , other number true, negate post-increment expression
!
operator. have true expression when id seen first time:!seen[$1]++
- awk programs
condition1 {body1} condition2 {body2} ...
.- the
body
executed when correspondingcondition
evaluates true. - if condition present body omitted, default action
{print}
- to complete, when body present condition omitted, default condition evaluates true , action performed every record.
- the
to sum up, awk program print current record whenever expression evaluates true, first time id seen.
if want longest line each id:
awk ' length($2) > max[$1] {max[$1] = length($2); line[$1] = $0} end {for (id in line) {print line[id]}} ' file
this may shuffle order of ids (associative arrays unordered collections). can pipe sort
if it's problem.
Comments
Post a Comment