mapreduce - Hadoop Pig count number -
i learning how use hadoop pig now.
if have input file this:
a,b,c,true s,c,v,false a,s,b,true ...
the last field 1 need count... want know how many 'true' , 'false' in file.
i try:
records = load 'test/input.csv' using pigstorage(','); boolean = foreach records generate $3; groups = group boolean all;
now gets stuck. want use:
count = foreach groups generate count('true');"
to number of "true" error:
2013-08-07 16:32:36,677 [main] error org.apache.pig.tools.grunt.grunt - error 1070: not resolve count using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.] details @ logfile: /etc/pig/pig_1375911119028.log
can tell me problem is?
two things. firstly, count
should count
. in pig, builtin functions should called all-caps.
secondly, count
counts number of values in bag, not value. therefore, should group true/false, count
:
boolean = foreach records generate $3 trueorfalse ; groups = group boolean trueorfalse ; counts = foreach groups generate group trueorfalse, count(boolean) ;
so output of dump
counts
like:
(true, 2) (false, 1)
if want counts of true , false in own relations can filter
output of counts
. however, better split
boolean
, 2 separate counts:
boolean = foreach records generate $3 trueorfalse ; split boolean alltrue if trueorfalse == 'true', allfalse if trueorfalse == 'false' ; tcount = foreach (group alltrue all) generate count(alltrue) ; fcount = foreach (group allfalse all) generate count(allfalse) ;
Comments
Post a Comment