bash - shell: select unique row of flat file -


i have flat file looks

cat file  id1, value1_1 id1, value1_2 id1, value1_3 id2, value2_1 id2, value2_1 id3, value3_1 id3... 

as can see data sample, each id, there several values id , whatever value - same or not. me, don't care value picking up. value works me.

so want 1 value each id. don't care one, if have choose, row has longest length.

id1, value1_2 id2, value2_1 id3, value3_1 

it might done in python there easy way in shell itself, open use sed or awk please don't write whole paragraph of awk code, please..

it might like:

# pseudo code # sort -k 1 file | uniq (max(length) id)   

thanks lot !!

this find first line each id:

awk -f, '!seen[$1]++' file 

explained:

  • awk associative arrays not have pre-declared, first time id encountered, seen[$1] have value 0 (for numeric context).
  • seen[$1]++ post-increments associative array element, expression evaluates 0 first time id seen, , evaluates positive integer other time.
  • awk treats 0 false , other number true, negate post-increment expression ! operator. have true expression when id seen first time: !seen[$1]++
  • awk programs condition1 {body1} condition2 {body2} ....
    • the body executed when corresponding condition evaluates true.
    • if condition present body omitted, default action {print}
    • to complete, when body present condition omitted, default condition evaluates true , action performed every record.

to sum up, awk program print current record whenever expression evaluates true, first time id seen.


if want longest line each id:

awk '     length($2) > max[$1] {max[$1] = length($2); line[$1] = $0}     end {for (id in line) {print line[id]}} ' file 

this may shuffle order of ids (associative arrays unordered collections). can pipe sort if it's problem.


Comments

Popular posts from this blog

plot - Remove Objects from Legend When You Have Also Used Fit, Matlab -

java - Why does my date parsing return a weird date? -

Need help in packaging app using TideSDK on Windows -