决策树示例

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

决策树示例

%**************************************************************

%* mex interface to Andy Liaw et al.'s C code (used in R package randomForest)

%* Added by Abhishek Jaiantilal ( abhishek.jaiantilal@ )

%* License: GPLv2

%* Version: 0.02

%

% Calls Regression Random Forest

% A wrapper matlab file that calls the mex file

% This does training given the data and labels

% Documentation copied from R-packages pdf

% /web/packages/randomForest/randomForest.pdf

% Tutorial on getting this working in tutorial_ClassRF.m

%%**************************************************************

% function model = classRF_train(X,Y,ntree,mtry, extra_options)

%

%___Options

% requires 2 arguments and the rest 3 are optional

% X: data matrix

% Y: target values

% ntree (optional): number of trees (default is 500). also if set to 0

% will default to 500

% mtry (default is floor(sqrt(size(X,2))) D=number of features in X). also if set to 0

% will default to 500

%

%

% Note: TRUE = 1 and FALSE = 0 below

% extra_options represent a structure containing various misc. options to

% control the RF

% extra_options.replace = 0 or 1 (default is 1) sampling with or without

% replacement

% extra_options.strata = (not Implemented)

% extra_options.sampsize = Size(s) of sample to draw. For classification,

% if sampsize is a vector of the length the number of strata, then sampling is stratified by strata,

% and the elements of sampsize indicate the numbers to be drawn from the strata. I don't yet know how this works.

% extra_options.nodesize = Minimum size of terminal nodes. Setting this number larger causes

smaller trees

% to be grown (and thus take less time). Note that the default values are different

% for classification (1) and regression (5).

% extra_options.importance = Should importance of predictors be assessed?

% extra_options.localImp = Should casewise importance measure be computed? (Setting this to TRUE will

% override importance.)

% extra_options.proximity = Should proximity measure among the rows be calculated?

% extra_options.oob_prox = Should proximity be calculated only on 'out-of-bag' data?

% extra_options.do_trace = If set to TRUE, give a more verbose output as randomForest is run. If set to

% some integer, then running output is printed for every

% do_trace trees.

% extra_options.keep_inbag = Should an n by ntree matrix be returned that keeps track of which samples are

% 'in-bag' in which trees (but not how many times, if sampling with replacement)

% extra_options.corr_bias = which happens only for regression. perform bias correction for regression? Note: Experimental. Use at your own

% risk.

% extra_options.nPerm = Number of times the OOB data are permuted per tree for assessing variable

% importance. Number larger than 1 gives slightly more stable estimate, but not

% very effective. Currently only implemented for regression.

%

%

%___Returns model which has

% importance = a matrix with nclass + 2 (for classification) or two (for regression) columns.

% For classification, the first nclass columns are the class-specific measures

% computed as mean decrease in accuracy. The nclass + 1st column is the

% mean decrease in accuracy over all classes. The last column is the mean decrease

% in Gini index. For Regression, the first column is the mean decrease in

% accuracy and the second the mean decrease in MSE. If importance=FALSE,

% the last measure is still returned as a vector.

% importanceSD = The ?standard errors? of the permutation-based importance measure. For classification,

% a p by nclass + 1 matrix corresponding to the first nclass + 1

% columns of the importance matrix. For regression, a length p vector.

% localImp = a p by n matrix containing the casewise importance measures, the [i,j] element

% of which is the importance of i-th variable on the j-th case. NULL if

% localImp=FALSE.

% ntree = number of trees grown.

相关文档
最新文档