hadoop - Calculate Average Count Using MapReduce in HBase -


i have table called log every single row represent single activity , have table structure this

info:date, info:ip_address, info:action, info:info

the example of data this

column family : info
date | ip_address | action | info
3 march 2014 | 191.2.2.2 | delete | blabla
4 march 2014 | 191.2.2.3 | view | blabla
5 march 2014 | 191.2.2.4 | create | blabla
3 march 2014 | 191.2.2.5 | delete | blabla
4 march 2014 | 191.2.2.5 | create | blabla
4 march 2014 | 191.2.2.6 | delete | blabla

what want calculate average of total of activity based on time. first things compute total activity based on time:

time | total_activity
3 march 2014 | 2
4 march 2014 | 3
5 march 2014 | 1

then, want calculate average of total_activity output represent this

(2 + 3 + 1) / 3 = 2

how can in hbase using mapreduce? thinking using 1 reducer capable compute total of activity based on time.

thanks

suggest scalding - it's easiest , fastest way write production hadoop jobs can tie in hbase , stuff. here project hbase & scalding https://github.com/parallelai/spyglass/blob/master/src/main/scala/parallelai/spyglass/hbase/example/simplehbasesourceexample.scala

then have @ scalding api work out how want: https://github.com/twitter/scalding/wiki/fields-based-api-reference


Comments

Popular posts from this blog

c++ - OpenCV Error: Assertion failed <scn == 3 ::scn == 4> in unknown function, -

php - render data via PDO::FETCH_FUNC vs loop -

The canvas has been tainted by cross-origin data in chrome only -