python - Decision Tree Of SkLearn: Overfitting or Bug? -


i'm analyzing training error , validation error of decision tree model using tree package of sklearn.

#compute rms error def compute_error(x, y, model):  yfit = model.predict(x.toarray())  return np.mean(y != yfit)   def drawlearningcurve(model,xtrain, ytrain, xtest, ytest):  sizes = np.linspace(2, 25000, 50).astype(int)  train_error = np.zeros(sizes.shape)  crossval_error = np.zeros(sizes.shape)   i,size in enumerate(sizes):    model = model.fit(xtrain[:size,:].toarray(),ytrain[:size])    #compute validation error   crossval_error[i] = compute_error(xtest,ytest,model)    #compute training error   train_error[i] = compute_error(xtrain[:size,:],ytrain[:size],model)  sklearn import tree clf = tree.decisiontreeclassifier() drawlearningcurve(clf, xtr, ytr, xte, yte) 

the problem (i don't know whether problem) if give decision tree model function drawlearningcurve, receive result of training error 0.0 in each loop. related nature of dataset, or of tree package of sklearn? or there else wrong?

ps: training error absolutely not 0.0 @ other models naive-bayes, knn or ann.

the commends give pretty useful directions. i'd add parameter might want tweak called max_depth.

what worries me more compute_error function odd. fact error of 0 says classifier makes no errors on training set. however, if did make mistakes error function won't tell that.

import numpy np np.mean([0,0,0,0] != [0,0,0,0]) # perfect match, error 0 0.0  np.mean([0,0,0,0] != [1, 1, 1, 1]) # 100% wrong answers 1.0  np.mean([0,0,0,0] != [1, 1, 1, 0]) # 75% wrong answers 1.0  np.mean([0,0,0,0] != [1, 1, 0, 0]) # 50% wrong answers 1.0  np.mean([0,0,0,0] != [1, 1, 2, 2]) # 50% wrong answers 1.0 

what want np.sum(y != yfit), or better, 1 of error functions come sklearn, such accuracy_score.


Comments

Popular posts from this blog

php - render data via PDO::FETCH_FUNC vs loop -

c++ - OpenCV Error: Assertion failed <scn == 3 ::scn == 4> in unknown function, -

The canvas has been tainted by cross-origin data in chrome only -