mapreduce - Hadoop Pig count number -


i learning how use hadoop pig now.

if have input file this:

a,b,c,true s,c,v,false a,s,b,true ... 

the last field 1 need count... want know how many 'true' , 'false' in file.

i try:

records = load 'test/input.csv' using pigstorage(','); boolean = foreach records generate $3; groups = group boolean all; 

now gets stuck. want use:

count = foreach groups generate count('true');"  

to number of "true" error:

2013-08-07 16:32:36,677 [main] error org.apache.pig.tools.grunt.grunt - error 1070: not resolve count using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.] details @ logfile: /etc/pig/pig_1375911119028.log

can tell me problem is?

two things. firstly, count should count. in pig, builtin functions should called all-caps.

secondly, count counts number of values in bag, not value. therefore, should group true/false, count:

boolean = foreach records generate $3 trueorfalse ; groups = group boolean trueorfalse ; counts = foreach groups generate group trueorfalse, count(boolean) ; 

so output of dump counts like:

(true, 2) (false, 1) 

if want counts of true , false in own relations can filter output of counts. however, better split boolean, 2 separate counts:

boolean = foreach records generate $3 trueorfalse ; split boolean alltrue if trueorfalse == 'true',                     allfalse if trueorfalse == 'false' ;  tcount = foreach (group alltrue all) generate count(alltrue) ; fcount = foreach (group allfalse all) generate count(allfalse) ; 

Comments

Popular posts from this blog

plot - Remove Objects from Legend When You Have Also Used Fit, Matlab -

java - Why does my date parsing return a weird date? -

Need help in packaging app using TideSDK on Windows -