bash - shell: select unique row of flat file -


i have flat file looks

cat file  id1, value1_1 id1, value1_2 id1, value1_3 id2, value2_1 id2, value2_1 id3, value3_1 id3... 

as can see data sample, each id, there several values id , whatever value - same or not. me, don't care value picking up. value works me.

so want 1 value each id. don't care one, if have choose, row has longest length.

id1, value1_2 id2, value2_1 id3, value3_1 

it might done in python there easy way in shell itself, open use sed or awk please don't write whole paragraph of awk code, please..

it might like:

# pseudo code # sort -k 1 file | uniq (max(length) id)   

thanks lot !!

this find first line each id:

awk -f, '!seen[$1]++' file 

explained:

  • awk associative arrays not have pre-declared, first time id encountered, seen[$1] have value 0 (for numeric context).
  • seen[$1]++ post-increments associative array element, expression evaluates 0 first time id seen, , evaluates positive integer other time.
  • awk treats 0 false , other number true, negate post-increment expression ! operator. have true expression when id seen first time: !seen[$1]++
  • awk programs condition1 {body1} condition2 {body2} ....
    • the body executed when corresponding condition evaluates true.
    • if condition present body omitted, default action {print}
    • to complete, when body present condition omitted, default condition evaluates true , action performed every record.

to sum up, awk program print current record whenever expression evaluates true, first time id seen.


if want longest line each id:

awk '     length($2) > max[$1] {max[$1] = length($2); line[$1] = $0}     end {for (id in line) {print line[id]}} ' file 

this may shuffle order of ids (associative arrays unordered collections). can pipe sort if it's problem.


Comments

Popular posts from this blog

Need help in packaging app using TideSDK on Windows -

java - Why does my date parsing return a weird date? -

plot - Remove Objects from Legend When You Have Also Used Fit, Matlab -