scala - Compress Output Scalding / Cascading TsvCompressed -


so people have been having problems compressing output of scalding jobs including myself. after googling odd hiff of answer in obscure forum somewhere nothing suitable peoples copy , paste needs.

i output tsv, writes compressed output.

anyway after faffification managed write tsvcompressed output seems job (you still need set hadoop job system configuration properties, i.e. set compress true, , set codec sensible or defaults crappy deflate)

import com.twitter.scalding._ import cascading.tuple.fields import cascading.scheme.local import cascading.scheme.hadoop.{textline, textdelimited} import cascading.scheme.scheme import org.apache.hadoop.mapred.{outputcollector, recordreader, jobconf}  case class tsvcompressed(p: string) extends fixedpathsource(p) delimitedschemecompressed  trait delimitedschemecompressed extends source {   val types: array[class[_]] = null    override def localscheme = new local.textdelimited(fields.all, false, false, "\t", types)    override def hdfsscheme = {     val temp = new textdelimited(fields.all, false, false, "\t", types)     temp.setsinkcompression(textline.compress.enable)     temp.asinstanceof[scheme[jobconf,recordreader[_,_],outputcollector[_,_],_,_]]   } } 

Comments

Popular posts from this blog

c++ - OpenCV Error: Assertion failed <scn == 3 ::scn == 4> in unknown function, -

php - render data via PDO::FETCH_FUNC vs loop -

The canvas has been tainted by cross-origin data in chrome only -