hadoop - Calculate Average Count Using MapReduce in HBase -
i have table called log
every single row represent single activity , have table structure this
info:date, info:ip_address, info:action, info:info
the example of data this
column family : info
date | ip_address | action | info
3 march 2014 | 191.2.2.2 | delete | blabla
4 march 2014 | 191.2.2.3 | view | blabla
5 march 2014 | 191.2.2.4 | create | blabla
3 march 2014 | 191.2.2.5 | delete | blabla
4 march 2014 | 191.2.2.5 | create | blabla
4 march 2014 | 191.2.2.6 | delete | blabla
what want calculate average of total of activity based on time. first things compute total activity based on time:
time | total_activity
3 march 2014 | 2
4 march 2014 | 3
5 march 2014 | 1
then, want calculate average of total_activity output represent this
(2 + 3 + 1) / 3 = 2
how can in hbase using mapreduce? thinking using 1 reducer capable compute total of activity based on time.
thanks
suggest scalding - it's easiest , fastest way write production hadoop jobs can tie in hbase , stuff. here project hbase & scalding https://github.com/parallelai/spyglass/blob/master/src/main/scala/parallelai/spyglass/hbase/example/simplehbasesourceexample.scala
then have @ scalding api work out how want: https://github.com/twitter/scalding/wiki/fields-based-api-reference
Comments
Post a Comment