java - how to use the linear regression of MLlib of apache spark? -
i'm new apache spark, , document of mllib, found example of scala, don't know scala, knows example in java? thanks! example code
import org.apache.spark.mllib.regression.linearregressionwithsgd import org.apache.spark.mllib.regression.labeledpoint // load , parse data val data = sc.textfile("mllib/data/ridge-data/lpsa.data") val parseddata = data.map { line => val parts = line.split(',') labeledpoint(parts(0).todouble, parts(1).split(' ').map(x => x.todouble).toarray) } // building model val numiterations = 20 val model = linearregressionwithsgd.train(parseddata, numiterations) // evaluate model on training examples , compute training error val valuesandpreds = parseddata.map { point => val prediction = model.predict(point.features) (point.label, prediction) } val mse = valuesandpreds.map{ case(v, p) => math.pow((v - p), 2)}.reduce(_ + _)/valuesandpreds.count println("training mean squared error = " + mse)
from document of mllib thanks!
as indicated in documentation :
all of mllib’s methods use java-friendly types, can import , call them there same way in scala. caveat methods take scala rdd objects, while spark java api uses separate javardd class. can convert java rdd scala 1 calling .rdd() on javardd object.
this not easy, since still have reproduce scala code in java, works (at least in case).
having said that, here java implementation :
public void linreg() { string master = "local"; sparkconf conf = new sparkconf().setappname("csvparser").setmaster( master); javasparkcontext sc = new javasparkcontext(conf); javardd<string> data = sc.textfile("mllib/data/ridge-data/lpsa.data"); javardd<labeledpoint> parseddata = data .map(new function<string, labeledpoint>() { // see no ways of using lambda, hence more verbosity scala @override public labeledpoint call(string line) throws exception { string[] parts = line.split(","); string[] pointsstr = parts[1].split(" "); double[] points = new double[pointsstr.length]; (int = 0; < pointsstr.length; i++) points[i] = double.valueof(pointsstr[i]); return new labeledpoint(double.valueof(parts[0]), vectors.dense(points)); } }); // building model int numiterations = 20; linearregressionmodel model = linearregressionwithsgd.train( parseddata.rdd(), numiterations); // notice .rdd() // evaluate model on training examples , compute training error javardd<tuple2<double, double>> valuesandpred = parseddata .map(point -> new tuple2<double, double>(point.label(), model .predict(point.features()))); // important point here tuple2 explicit creation. double mse = valuesandpred.maptodouble( tuple -> math.pow(tuple._1 - tuple._2, 2)).mean(); // can compute mean function, easier system.out.println("training mean squared error = " + string.valueof(mse)); }
it far being perfect, hope make understand better how use scala examples on mllib documentation.
Comments
Post a Comment