mapreduce - Hadoop Pig count number -
i learning how use hadoop pig now.
if have input file this:
a,b,c,true s,c,v,false a,s,b,true ... the last field 1 need count... want know how many 'true' , 'false' in file.
i try:
records = load 'test/input.csv' using pigstorage(','); boolean = foreach records generate $3; groups = group boolean all; now gets stuck. want use:
count = foreach groups generate count('true');" to number of "true" error:
2013-08-07 16:32:36,677 [main] error org.apache.pig.tools.grunt.grunt - error 1070: not resolve count using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.] details @ logfile: /etc/pig/pig_1375911119028.log
can tell me problem is?
two things. firstly, count should count. in pig, builtin functions should called all-caps.
secondly, count counts number of values in bag, not value. therefore, should group true/false, count:
boolean = foreach records generate $3 trueorfalse ; groups = group boolean trueorfalse ; counts = foreach groups generate group trueorfalse, count(boolean) ; so output of dump counts like:
(true, 2) (false, 1) if want counts of true , false in own relations can filter output of counts. however, better split boolean, 2 separate counts:
boolean = foreach records generate $3 trueorfalse ; split boolean alltrue if trueorfalse == 'true', allfalse if trueorfalse == 'false' ; tcount = foreach (group alltrue all) generate count(alltrue) ; fcount = foreach (group allfalse all) generate count(allfalse) ;
Comments
Post a Comment