bash - shell: select unique row of flat file -

i have flat file looks

cat file  id1, value1_1 id1, value1_2 id1, value1_3 id2, value2_1 id2, value2_1 id3, value3_1 id3...

as can see data sample, each id, there several values id , whatever value - same or not. me, don't care value picking up. value works me.

so want 1 value each id. don't care one, if have choose, row has longest length.

id1, value1_2 id2, value2_1 id3, value3_1

it might done in python there easy way in shell itself, open use sed or awk please don't write whole paragraph of awk code, please..

it might like:

# pseudo code # sort -k 1 file | uniq (max(length) id)

thanks lot !!

this find first line each id:

awk -f, '!seen[$1]++' file

explained:

awk associative arrays not have pre-declared, first time id encountered, seen[$1] have value 0 (for numeric context).
seen[$1]++ post-increments associative array element, expression evaluates 0 first time id seen, , evaluates positive integer other time.
awk treats 0 false , other number true, negate post-increment expression ! operator. have true expression when id seen first time: !seen[$1]++
awk programs condition1 {body1} condition2 {body2} ....
- the body executed when corresponding condition evaluates true.
- if condition present body omitted, default action {print}
- to complete, when body present condition omitted, default condition evaluates true , action performed every record.

to sum up, awk program print current record whenever expression evaluates true, first time id seen.

if want longest line each id:

awk '     length($2) > max[$1] {max[$1] = length($2); line[$1] = $0}     end {for (id in line) {print line[id]}} ' file

this may shuffle order of ids (associative arrays unordered collections). can pipe sort if it's problem.

You