java - how to use the linear regression of MLlib of apache spark? -


i'm new apache spark, , document of mllib, found example of scala, don't know scala, knows example in java? thanks! example code

import org.apache.spark.mllib.regression.linearregressionwithsgd import org.apache.spark.mllib.regression.labeledpoint  // load , parse data val data = sc.textfile("mllib/data/ridge-data/lpsa.data") val parseddata = data.map { line =>   val parts = line.split(',')   labeledpoint(parts(0).todouble, parts(1).split(' ').map(x => x.todouble).toarray) }  // building model val numiterations = 20 val model = linearregressionwithsgd.train(parseddata, numiterations)  // evaluate model on training examples , compute training error val valuesandpreds = parseddata.map { point =>   val prediction = model.predict(point.features)   (point.label, prediction) } val mse = valuesandpreds.map{ case(v, p) => math.pow((v - p), 2)}.reduce(_ +     _)/valuesandpreds.count println("training mean squared error = " + mse) 

from document of mllib thanks!

as indicated in documentation :

all of mllib’s methods use java-friendly types, can import , call them there same way in scala. caveat methods take scala rdd objects, while spark java api uses separate javardd class. can convert java rdd scala 1 calling .rdd() on javardd object.

this not easy, since still have reproduce scala code in java, works (at least in case).

having said that, here java implementation :

public void linreg() {     string master = "local";     sparkconf conf = new sparkconf().setappname("csvparser").setmaster(             master);     javasparkcontext sc = new javasparkcontext(conf);     javardd<string> data = sc.textfile("mllib/data/ridge-data/lpsa.data");     javardd<labeledpoint> parseddata = data             .map(new function<string, labeledpoint>() {             // see no ways of using lambda, hence more verbosity scala                 @override                 public labeledpoint call(string line) throws exception {                     string[] parts = line.split(",");                     string[] pointsstr = parts[1].split(" ");                     double[] points = new double[pointsstr.length];                     (int = 0; < pointsstr.length; i++)                         points[i] = double.valueof(pointsstr[i]);                     return new labeledpoint(double.valueof(parts[0]),                             vectors.dense(points));                 }             });      // building model     int numiterations = 20;     linearregressionmodel model = linearregressionwithsgd.train(     parseddata.rdd(), numiterations); // notice .rdd()      // evaluate model on training examples , compute training error     javardd<tuple2<double, double>> valuesandpred = parseddata             .map(point -> new tuple2<double, double>(point.label(), model                     .predict(point.features())));     // important point here tuple2 explicit creation.      double mse = valuesandpred.maptodouble(             tuple -> math.pow(tuple._1 - tuple._2, 2)).mean();     // can compute mean function, easier     system.out.println("training mean squared error = "             + string.valueof(mse)); } 

it far being perfect, hope make understand better how use scala examples on mllib documentation.


Comments

Popular posts from this blog

php - render data via PDO::FETCH_FUNC vs loop -

c++ - OpenCV Error: Assertion failed <scn == 3 ::scn == 4> in unknown function, -

The canvas has been tainted by cross-origin data in chrome only -