Package weka.classifiers.meta
Class ThresholdSelector
- java.lang.Object
-
- weka.classifiers.Classifier
-
- weka.classifiers.SingleClassifierEnhancer
-
- weka.classifiers.RandomizableSingleClassifierEnhancer
-
- weka.classifiers.meta.ThresholdSelector
-
- All Implemented Interfaces:
java.io.Serializable,java.lang.Cloneable,CapabilitiesHandler,Drawable,OptionHandler,Randomizable,RevisionHandler
public class ThresholdSelector extends RandomizableSingleClassifierEnhancer implements OptionHandler, Drawable
A metaclassifier that selecting a mid-point threshold on the probability output by a Classifier. The midpoint threshold is set so that a given performance measure is optimized. Currently this is the F-measure. Performance is measured either on the training data, a hold-out set or using cross-validation. In addition, the probabilities returned by the base learner can have their range expanded so that the output probabilities will reside between 0 and 1 (this is useful if the scheme normally produces probabilities in a very narrow range). Valid options are:-C <integer> The class for which threshold is determined. Valid values are: 1, 2 (for first and second classes, respectively), 3 (for whichever class is least frequent), and 4 (for whichever class value is most frequent), and 5 (for the first class named any of "yes","pos(itive)" "1", or method 3 if no matches). (default 5).
-X <number of folds> Number of folds used for cross validation. If just a hold-out set is used, this determines the size of the hold-out set (default 3).
-R <integer> Sets whether confidence range correction is applied. This can be used to ensure the confidences range from 0 to 1. Use 0 for no range correction, 1 for correction based on the min/max values seen during threshold selection (default 0).
-E <integer> Sets the evaluation mode. Use 0 for evaluation using cross-validation, 1 for evaluation using hold-out set, and 2 for evaluation on the training data (default 1).
-M [FMEASURE|ACCURACY|TRUE_POS|TRUE_NEG|TP_RATE|PRECISION|RECALL] Measure used for evaluation (default is FMEASURE).
-manual <real> Set a manual threshold to use. This option overrides automatic selection and options pertaining to automatic selection will be ignored. (default -1, i.e. do not use a manual threshold).
-S <num> Random number seed. (default 1)
-D If set, classifier is run in debug mode and may output additional info to the console
-W Full name of base classifier. (default: weka.classifiers.functions.Logistic)
Options specific to classifier weka.classifiers.functions.Logistic:
-D Turn on debugging output.
-R <ridge> Set the ridge in the log-likelihood.
-M <number> Set the maximum number of iterations (default -1, until convergence).
Options after -- are passed to the designated sub-classifier.- Version:
- $Revision: 1.43 $
- Author:
- Eibe Frank (eibe@cs.waikato.ac.nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static intACCURACYaccuracystatic intEVAL_CROSS_VALIDATIONn-fold cross-validationstatic intEVAL_TRAINING_SETentire training setstatic intEVAL_TUNED_SPLITsingle tuned foldstatic intFMEASUREF-measurestatic intOPTIMIZE_0first class valuestatic intOPTIMIZE_1second class valuestatic intOPTIMIZE_LFREQleast frequent class valuestatic intOPTIMIZE_MFREQmost frequent class valuestatic intOPTIMIZE_POS_NAMEclass value name, either 'yes' or 'pos(itive)'static intPRECISIONprecisionstatic intRANGE_BOUNDSCorrect based on min/max observedstatic intRANGE_NONEno range correctionstatic intRECALLrecallstatic Tag[]TAGS_EVALThe evaluation modesstatic Tag[]TAGS_MEASUREthe measure to usestatic Tag[]TAGS_OPTIMIZEHow to determine which class value to optimize forstatic Tag[]TAGS_RANGEType of correction applied to threshold rangestatic intTP_RATEtrue-positive ratestatic intTRUE_NEGtrue-negativestatic intTRUE_POStrue-positive-
Fields inherited from interface weka.core.Drawable
BayesNet, Newick, NOT_DRAWABLE, TREE
-
-
Constructor Summary
Constructors Constructor Description ThresholdSelector()Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidbuildClassifier(Instances instances)Generates the classifier.java.lang.StringdesignatedClassTipText()double[]distributionForInstance(Instance instance)Calculates the class membership probabilities for the given test instance.java.lang.StringevaluationModeTipText()CapabilitiesgetCapabilities()Returns default capabilities of the classifier.SelectedTaggetDesignatedClass()Gets the method to determine which class value to optimize.SelectedTaggetEvaluationMode()Gets the evaluation mode used.doublegetManualThresholdValue()Returns the value of the manual threshold.SelectedTaggetMeasure()get measure used for determining thresholdintgetNumXValFolds()Get the number of folds used for cross-validation.java.lang.String[]getOptions()Gets the current settings of the Classifier.SelectedTaggetRangeCorrection()Gets the confidence range correction mode used.java.lang.StringgetRevision()Returns the revision string.java.lang.StringglobalInfo()java.lang.Stringgraph()Returns graph describing the classifier (if possible).intgraphType()Returns the type of graph this classifier represents.java.util.EnumerationlistOptions()Returns an enumeration describing the available options.static voidmain(java.lang.String[] argv)Main method for testing this class.java.lang.StringmanualThresholdValueTipText()java.lang.StringmeasureTipText()Tooltip for this property.java.lang.StringnumXValFoldsTipText()java.lang.StringrangeCorrectionTipText()voidsetDesignatedClass(SelectedTag newMethod)Sets the method to determine which class value to optimize.voidsetEvaluationMode(SelectedTag newMethod)Sets the evaluation mode used.voidsetManualThresholdValue(double threshold)Sets the value for a manual threshold.voidsetMeasure(SelectedTag newMeasure)set measure used for determining thresholdvoidsetNumXValFolds(int newNumFolds)Set the number of folds used for cross-validation.voidsetOptions(java.lang.String[] options)Parses a given list of options.voidsetRangeCorrection(SelectedTag newMethod)Sets the confidence range correction mode used.java.lang.StringtoString()Returns description of the cross-validated classifier.-
Methods inherited from class weka.classifiers.RandomizableSingleClassifierEnhancer
getSeed, seedTipText, setSeed
-
Methods inherited from class weka.classifiers.SingleClassifierEnhancer
classifierTipText, getClassifier, setClassifier
-
Methods inherited from class weka.classifiers.Classifier
classifyInstance, debugTipText, forName, getDebug, makeCopies, makeCopy, setDebug
-
-
-
-
Field Detail
-
RANGE_NONE
public static final int RANGE_NONE
no range correction- See Also:
- Constant Field Values
-
RANGE_BOUNDS
public static final int RANGE_BOUNDS
Correct based on min/max observed- See Also:
- Constant Field Values
-
TAGS_RANGE
public static final Tag[] TAGS_RANGE
Type of correction applied to threshold range
-
EVAL_TRAINING_SET
public static final int EVAL_TRAINING_SET
entire training set- See Also:
- Constant Field Values
-
EVAL_TUNED_SPLIT
public static final int EVAL_TUNED_SPLIT
single tuned fold- See Also:
- Constant Field Values
-
EVAL_CROSS_VALIDATION
public static final int EVAL_CROSS_VALIDATION
n-fold cross-validation- See Also:
- Constant Field Values
-
TAGS_EVAL
public static final Tag[] TAGS_EVAL
The evaluation modes
-
OPTIMIZE_0
public static final int OPTIMIZE_0
first class value- See Also:
- Constant Field Values
-
OPTIMIZE_1
public static final int OPTIMIZE_1
second class value- See Also:
- Constant Field Values
-
OPTIMIZE_LFREQ
public static final int OPTIMIZE_LFREQ
least frequent class value- See Also:
- Constant Field Values
-
OPTIMIZE_MFREQ
public static final int OPTIMIZE_MFREQ
most frequent class value- See Also:
- Constant Field Values
-
OPTIMIZE_POS_NAME
public static final int OPTIMIZE_POS_NAME
class value name, either 'yes' or 'pos(itive)'- See Also:
- Constant Field Values
-
TAGS_OPTIMIZE
public static final Tag[] TAGS_OPTIMIZE
How to determine which class value to optimize for
-
FMEASURE
public static final int FMEASURE
F-measure- See Also:
- Constant Field Values
-
ACCURACY
public static final int ACCURACY
accuracy- See Also:
- Constant Field Values
-
TRUE_POS
public static final int TRUE_POS
true-positive- See Also:
- Constant Field Values
-
TRUE_NEG
public static final int TRUE_NEG
true-negative- See Also:
- Constant Field Values
-
TP_RATE
public static final int TP_RATE
true-positive rate- See Also:
- Constant Field Values
-
PRECISION
public static final int PRECISION
precision- See Also:
- Constant Field Values
-
RECALL
public static final int RECALL
recall- See Also:
- Constant Field Values
-
TAGS_MEASURE
public static final Tag[] TAGS_MEASURE
the measure to use
-
-
Method Detail
-
measureTipText
public java.lang.String measureTipText()
Tooltip for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setMeasure
public void setMeasure(SelectedTag newMeasure)
set measure used for determining threshold- Parameters:
newMeasure- Tag representing measure to be used
-
getMeasure
public SelectedTag getMeasure()
get measure used for determining threshold- Returns:
- Tag representing measure used
-
listOptions
public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptionsin interfaceOptionHandler- Overrides:
listOptionsin classRandomizableSingleClassifierEnhancer- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(java.lang.String[] options) throws java.lang.ExceptionParses a given list of options. Valid options are:-C <integer> The class for which threshold is determined. Valid values are: 1, 2 (for first and second classes, respectively), 3 (for whichever class is least frequent), and 4 (for whichever class value is most frequent), and 5 (for the first class named any of "yes","pos(itive)" "1", or method 3 if no matches). (default 5).
-X <number of folds> Number of folds used for cross validation. If just a hold-out set is used, this determines the size of the hold-out set (default 3).
-R <integer> Sets whether confidence range correction is applied. This can be used to ensure the confidences range from 0 to 1. Use 0 for no range correction, 1 for correction based on the min/max values seen during threshold selection (default 0).
-E <integer> Sets the evaluation mode. Use 0 for evaluation using cross-validation, 1 for evaluation using hold-out set, and 2 for evaluation on the training data (default 1).
-M [FMEASURE|ACCURACY|TRUE_POS|TRUE_NEG|TP_RATE|PRECISION|RECALL] Measure used for evaluation (default is FMEASURE).
-manual <real> Set a manual threshold to use. This option overrides automatic selection and options pertaining to automatic selection will be ignored. (default -1, i.e. do not use a manual threshold).
-S <num> Random number seed. (default 1)
-D If set, classifier is run in debug mode and may output additional info to the console
-W Full name of base classifier. (default: weka.classifiers.functions.Logistic)
Options specific to classifier weka.classifiers.functions.Logistic:
-D Turn on debugging output.
-R <ridge> Set the ridge in the log-likelihood.
-M <number> Set the maximum number of iterations (default -1, until convergence).
Options after -- are passed to the designated sub-classifier.- Specified by:
setOptionsin interfaceOptionHandler- Overrides:
setOptionsin classRandomizableSingleClassifierEnhancer- Parameters:
options- the list of options as an array of strings- Throws:
java.lang.Exception- if an option is not supported
-
getOptions
public java.lang.String[] getOptions()
Gets the current settings of the Classifier.- Specified by:
getOptionsin interfaceOptionHandler- Overrides:
getOptionsin classRandomizableSingleClassifierEnhancer- Returns:
- an array of strings suitable for passing to setOptions
-
getCapabilities
public Capabilities getCapabilities()
Returns default capabilities of the classifier.- Specified by:
getCapabilitiesin interfaceCapabilitiesHandler- Overrides:
getCapabilitiesin classSingleClassifierEnhancer- Returns:
- the capabilities of this classifier
- See Also:
Capabilities
-
buildClassifier
public void buildClassifier(Instances instances) throws java.lang.Exception
Generates the classifier.- Specified by:
buildClassifierin classClassifier- Parameters:
instances- set of instances serving as training data- Throws:
java.lang.Exception- if the classifier has not been generated successfully
-
distributionForInstance
public double[] distributionForInstance(Instance instance) throws java.lang.Exception
Calculates the class membership probabilities for the given test instance.- Overrides:
distributionForInstancein classClassifier- Parameters:
instance- the instance to be classified- Returns:
- predicted class probability distribution
- Throws:
java.lang.Exception- if instance could not be classified successfully
-
globalInfo
public java.lang.String globalInfo()
- Returns:
- a description of the classifier suitable for displaying in the explorer/experimenter gui
-
designatedClassTipText
public java.lang.String designatedClassTipText()
- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getDesignatedClass
public SelectedTag getDesignatedClass()
Gets the method to determine which class value to optimize. Will be one of OPTIMIZE_0, OPTIMIZE_1, OPTIMIZE_LFREQ, OPTIMIZE_MFREQ, OPTIMIZE_POS_NAME.- Returns:
- the class selection mode.
-
setDesignatedClass
public void setDesignatedClass(SelectedTag newMethod)
Sets the method to determine which class value to optimize. Will be one of OPTIMIZE_0, OPTIMIZE_1, OPTIMIZE_LFREQ, OPTIMIZE_MFREQ, OPTIMIZE_POS_NAME.- Parameters:
newMethod- the new class selection mode.
-
evaluationModeTipText
public java.lang.String evaluationModeTipText()
- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setEvaluationMode
public void setEvaluationMode(SelectedTag newMethod)
Sets the evaluation mode used. Will be one of EVAL_TRAINING, EVAL_TUNED_SPLIT, or EVAL_CROSS_VALIDATION- Parameters:
newMethod- the new evaluation mode.
-
getEvaluationMode
public SelectedTag getEvaluationMode()
Gets the evaluation mode used. Will be one of EVAL_TRAINING, EVAL_TUNED_SPLIT, or EVAL_CROSS_VALIDATION- Returns:
- the evaluation mode.
-
rangeCorrectionTipText
public java.lang.String rangeCorrectionTipText()
- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setRangeCorrection
public void setRangeCorrection(SelectedTag newMethod)
Sets the confidence range correction mode used. Will be one of RANGE_NONE, or RANGE_BOUNDS- Parameters:
newMethod- the new correciton mode.
-
getRangeCorrection
public SelectedTag getRangeCorrection()
Gets the confidence range correction mode used. Will be one of RANGE_NONE, or RANGE_BOUNDS- Returns:
- the confidence correction mode.
-
numXValFoldsTipText
public java.lang.String numXValFoldsTipText()
- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getNumXValFolds
public int getNumXValFolds()
Get the number of folds used for cross-validation.- Returns:
- the number of folds used for cross-validation.
-
setNumXValFolds
public void setNumXValFolds(int newNumFolds)
Set the number of folds used for cross-validation.- Parameters:
newNumFolds- the number of folds used for cross-validation.
-
graphType
public int graphType()
Returns the type of graph this classifier represents.
-
graph
public java.lang.String graph() throws java.lang.ExceptionReturns graph describing the classifier (if possible).
-
manualThresholdValueTipText
public java.lang.String manualThresholdValueTipText()
- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setManualThresholdValue
public void setManualThresholdValue(double threshold) throws java.lang.ExceptionSets the value for a manual threshold. If this option is set (non-negative value between 0 and 1), then options pertaining to automatic threshold selection are ignored.- Parameters:
threshold- the manual threshold to use- Throws:
java.lang.Exception
-
getManualThresholdValue
public double getManualThresholdValue()
Returns the value of the manual threshold. (a negative value indicates that no manual threshold is being used.- Returns:
- the value of the manual threshold.
-
toString
public java.lang.String toString()
Returns description of the cross-validated classifier.- Overrides:
toStringin classjava.lang.Object- Returns:
- description of the cross-validated classifier as a string
-
getRevision
public java.lang.String getRevision()
Returns the revision string.- Specified by:
getRevisionin interfaceRevisionHandler- Overrides:
getRevisionin classClassifier- Returns:
- the revision
-
main
public static void main(java.lang.String[] argv)
Main method for testing this class.- Parameters:
argv- the options
-
-