Package weka.classifiers.bayes
Class BayesianLogisticRegression
- java.lang.Object
-
- weka.classifiers.Classifier
-
- weka.classifiers.bayes.BayesianLogisticRegression
-
- All Implemented Interfaces:
java.io.Serializable,java.lang.Cloneable,CapabilitiesHandler,OptionHandler,RevisionHandler,TechnicalInformationHandler
public class BayesianLogisticRegression extends Classifier implements OptionHandler, TechnicalInformationHandler
Implements Bayesian Logistic Regression for both Gaussian and Laplace Priors.
For more information, see
Alexander Genkin, David D. Lewis, David Madigan (2004). Large-scale bayesian logistic regression for text categorization. URL http://www.stat.rutgers.edu/~madigan/PAPERS/shortFat-v3a.pdf. BibTeX:@techreport{Genkin2004, author = {Alexander Genkin and David D. Lewis and David Madigan}, institution = {DIMACS}, title = {Large-scale bayesian logistic regression for text categorization}, year = {2004}, URL = {http://www.stat.rutgers.edu/\~madigan/PAPERS/shortFat-v3a.pdf} }- Version:
- $Revision: 7984 $
- Author:
- Navendu Garg (gargnav at iit dot edu)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description double[]BetaVectorArray for storing coefficients of Bayesian regression model.doubleChangeThis variable is used to keep track of change in the value of delta summation of r(i).intClassIndexThe class index from the training datastatic intCV_BASEDdouble[]DeltaTrust Region Radiusdouble[]DeltaBetaArray to store Regression Coefficient updates.double[]DeltaRThis vector is used to store the increments on the R(i).double[]DeltaUpdateTrust Region Radius Updatestatic intGAUSSIANDistributions availablejava.lang.StringHyperparameterRangeCV Hyperparameter Rangedouble[]HyperparametersArray to store Hyperparameter values for each feature.intHyperparameterSelectionHyperparameter selection methoddoubleHyperparameterValueBest hyperparameter for test phasestatic double[]InputHyperparameterValuesSet of values to be used as hyperparameter values during Cross-Validation.intiterationCounterIteration counterstatic intLAPLACIANstatic double[]LogLikelihoodLog-likelihood values to be used to choose the best hyperparameter.Filterm_FilterFilter interface used to point to weka.filters.unsupervised.attribute.Normalize objectintm_seedseed for randomizing the instances before CVintmaxIterationsMaximum number of iterationsstatic intNORM_BASEDMethods for selecting the hyperparameter valuebooleanNormalizeDataChoose whether to normalize data or notintNumFoldsNumFolds for CV based Hyperparameters selectionintPriorClassDistribution Prior classdouble[]RR(i)= BetaVector X x(i) X y(i).static intSPECIFIC_VALUEstatic Tag[]TAGS_HYPER_METHODstatic Tag[]TAGS_PRIORdoubleThresholdThreshold for binary classification of probabilisitic estimatedoubleToleranceTolerance criteria for the stopping criterion.
-
Constructor Summary
Constructors Constructor Description BayesianLogisticRegression()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static doublebigF(double r, double sigma)This is a convient function that defines and upper bound (Delta>0) for values of r(i) reachable by updates in the trust region.voidbuildClassifier(Instances data)(1) Set the data to the class attribute m_Instances. (2)Call the method initialize() to initialize the values.doubleclassifyInstance(Instance instance)Classifies the given instance using the Bayesian Logistic Regression function.static doubleclassSgn(double value)This class is used to mask the internal class labels.doubleCVBasedHyperparameter()Method computes the best hyperparameter value by doing cross -validation on the training data and compute the likelihood.java.lang.StringdebugTipText()Returns the tip text for this propertyCapabilitiesgetCapabilities()This method tests what kind of data this classifier can handle.java.lang.StringgetHyperparameterRange()Get the range of hyperparameter values to consider during CV-based selection.SelectedTaggetHyperparameterSelection()Get the method used to select the hyperparameterdoublegetHyperparameterValue()Get the hyperparameter value.doublegetLoglikeliHood(double[] betas, Instances instances)intgetMaxIterations()Get the maximum number of iterations to performintgetNumFolds()Return the number of folds for CV-based hyperparameter selectionjava.lang.String[]getOptions()Gets the current settings of the Classifier.SelectedTaggetPriorClass()Get the type of prior to use.java.lang.StringgetRevision()Returns the revision string.intgetSeed()Get the seed for randomizing the instances for CV-based hyperparameter selectionTechnicalInformationgetTechnicalInformation()Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.doublegetThreshold()Return the threshold being used.doublegetTolerance()Get the tolerance valuejava.lang.StringglobalInfo()java.lang.StringhyperparameterRangeTipText()Returns the tip text for this propertyjava.lang.StringhyperparameterSelectionTipText()Returns the tip text for this propertyjava.lang.StringhyperparameterValueTipText()Returns the tip text for this propertyvoidinitialize()(1)Initialize m_Beta[j] to 0.booleanisDebug()Returns true if debug is turned on.booleanisNormalizeData()Returns true if the data is to be normalized firstjava.util.EnumerationlistOptions()Returns an enumeration describing the available options.static doublelogisticLinkFunction(double r)This method computes the values for the logistic link function.static voidmain(java.lang.String[] argv)Main method for testing this class.java.lang.StringmaxIterationsTipText()Returns the tip text for this propertyjava.lang.StringnormalizeDataTipText()Returns the tip text for this propertydoublenormBasedHyperParameter()This function computes the norm-based hyperparameters and stores them in the m_Hyperparameters.java.lang.StringnumFoldsTipText()Returns the tip text for this propertyjava.lang.StringpriorClassTipText()Returns the tip text for this propertyjava.lang.StringseedTipText()Returns the tip text for this propertyvoidsetDebug(boolean debugMode)Set debugging mode.voidsetHyperparameterRange(java.lang.String hyperparameterRange)Set the range of hyperparameter values to consider during CV-based selectionvoidsetHyperparameterSelection(SelectedTag newMethod)Set the method used to select the hyperparametervoidsetHyperparameterValue(double hyperparameterValue)Set the hyperparameter value.voidsetMaxIterations(int maxIterations)Set the maximum number of iterations to performvoidsetNormalizeData(boolean normalizeData)Set whether to normalize the data or notvoidsetNumFolds(int numFolds)Set the number of folds to use for CV-based hyperparameter selectionvoidsetOptions(java.lang.String[] options)Parses a given list of options.voidsetPriorClass(SelectedTag newMethod)Set the type of prior to use.voidsetSeed(int seed)Set the seed for randomizing the instances for CV-based hyperparameter selectionvoidsetThreshold(double threshold)Set the threshold to use.voidsetTolerance(double tolerance)Set the tolerance valuestatic doublesgn(double r)Sign for a given value.booleanstoppingCriterion()This method implements the stopping criterion function.java.lang.StringthresholdTipText()Returns the tip text for this propertyjava.lang.StringtoleranceTipText()Returns the tip text for this propertyjava.lang.StringtoString()Outputs the linear regression model as a string.-
Methods inherited from class weka.classifiers.Classifier
distributionForInstance, forName, getDebug, makeCopies, makeCopy
-
-
-
-
Field Detail
-
LogLikelihood
public static double[] LogLikelihood
Log-likelihood values to be used to choose the best hyperparameter.
-
InputHyperparameterValues
public static double[] InputHyperparameterValues
Set of values to be used as hyperparameter values during Cross-Validation.
-
NormalizeData
public boolean NormalizeData
Choose whether to normalize data or not
-
Tolerance
public double Tolerance
Tolerance criteria for the stopping criterion.
-
Threshold
public double Threshold
Threshold for binary classification of probabilisitic estimate
-
GAUSSIAN
public static final int GAUSSIAN
Distributions available- See Also:
- Constant Field Values
-
LAPLACIAN
public static final int LAPLACIAN
- See Also:
- Constant Field Values
-
TAGS_PRIOR
public static final Tag[] TAGS_PRIOR
-
PriorClass
public int PriorClass
Distribution Prior class
-
NumFolds
public int NumFolds
NumFolds for CV based Hyperparameters selection
-
m_seed
public int m_seed
seed for randomizing the instances before CV
-
NORM_BASED
public static final int NORM_BASED
Methods for selecting the hyperparameter value- See Also:
- Constant Field Values
-
CV_BASED
public static final int CV_BASED
- See Also:
- Constant Field Values
-
SPECIFIC_VALUE
public static final int SPECIFIC_VALUE
- See Also:
- Constant Field Values
-
TAGS_HYPER_METHOD
public static final Tag[] TAGS_HYPER_METHOD
-
HyperparameterSelection
public int HyperparameterSelection
Hyperparameter selection method
-
ClassIndex
public int ClassIndex
The class index from the training data
-
HyperparameterValue
public double HyperparameterValue
Best hyperparameter for test phase
-
HyperparameterRange
public java.lang.String HyperparameterRange
CV Hyperparameter Range
-
maxIterations
public int maxIterations
Maximum number of iterations
-
iterationCounter
public int iterationCounter
Iteration counter
-
BetaVector
public double[] BetaVector
Array for storing coefficients of Bayesian regression model.
-
DeltaBeta
public double[] DeltaBeta
Array to store Regression Coefficient updates.
-
DeltaUpdate
public double[] DeltaUpdate
Trust Region Radius Update
-
Delta
public double[] Delta
Trust Region Radius
-
Hyperparameters
public double[] Hyperparameters
Array to store Hyperparameter values for each feature.
-
R
public double[] R
R(i)= BetaVector X x(i) X y(i). This an intermediate value with respect to vector BETA, input values and corresponding class labels
-
DeltaR
public double[] DeltaR
This vector is used to store the increments on the R(i). It is also used to determining the stopping criterion.
-
Change
public double Change
This variable is used to keep track of change in the value of delta summation of r(i).
-
m_Filter
public Filter m_Filter
Filter interface used to point to weka.filters.unsupervised.attribute.Normalize object
-
-
Method Detail
-
globalInfo
public java.lang.String globalInfo()
-
initialize
public void initialize() throws java.lang.Exception(1)Initialize m_Beta[j] to 0. (2)Initialize m_DeltaUpdate[j].
- Throws:
java.lang.Exception
-
getCapabilities
public Capabilities getCapabilities()
This method tests what kind of data this classifier can handle. return Capabilities- Specified by:
getCapabilitiesin interfaceCapabilitiesHandler- Overrides:
getCapabilitiesin classClassifier- Returns:
- the capabilities of this object
- See Also:
Capabilities
-
buildClassifier
public void buildClassifier(Instances data) throws java.lang.Exception
- (1) Set the data to the class attribute m_Instances.
- (2)Call the method initialize() to initialize the values.
- Specified by:
buildClassifierin classClassifier- Parameters:
data- training data- Throws:
java.lang.Exception- if classifier can't be built successfully.
-
classSgn
public static double classSgn(double value)
This class is used to mask the internal class labels.- Parameters:
value- internal class label- Returns:
- -1 for internal class label 0
- +1 for internal class label 1
-
getTechnicalInformation
public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.- Specified by:
getTechnicalInformationin interfaceTechnicalInformationHandler- Returns:
- the technical information about this class
-
bigF
public static double bigF(double r, double sigma)This is a convient function that defines and upper bound (Delta>0) for values of r(i) reachable by updates in the trust region. r BetaVector X x(i)y(i). delta A parameter where sigma > 0- Returns:
- double function value
-
stoppingCriterion
public boolean stoppingCriterion()
This method implements the stopping criterion function.- Returns:
- boolean whether to stop or not.
-
logisticLinkFunction
public static double logisticLinkFunction(double r)
This method computes the values for the logistic link function.f(r)=exp(r)/(1+exp(r))
- Returns:
- output value
-
sgn
public static double sgn(double r)
Sign for a given value.- Parameters:
r-- Returns:
- double +1 if r>0, -1 if r<0
-
normBasedHyperParameter
public double normBasedHyperParameter()
This function computes the norm-based hyperparameters and stores them in the m_Hyperparameters.
-
classifyInstance
public double classifyInstance(Instance instance) throws java.lang.Exception
Classifies the given instance using the Bayesian Logistic Regression function.- Overrides:
classifyInstancein classClassifier- Parameters:
instance- the test instance- Returns:
- the classification
- Throws:
java.lang.Exception- if classification can't be done successfully
-
toString
public java.lang.String toString()
Outputs the linear regression model as a string.- Overrides:
toStringin classjava.lang.Object- Returns:
- the model as string
-
CVBasedHyperparameter
public double CVBasedHyperparameter() throws java.lang.ExceptionMethod computes the best hyperparameter value by doing cross -validation on the training data and compute the likelihood. The method can parse a range of values or a list of values.- Returns:
- Best hyperparameter value with the max likelihood value on the training data.
- Throws:
java.lang.Exception
-
getLoglikeliHood
public double getLoglikeliHood(double[] betas, Instances instances)- Returns:
- likelihood for a given set of betas and instances
-
listOptions
public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptionsin interfaceOptionHandler- Overrides:
listOptionsin classClassifier- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(java.lang.String[] options) throws java.lang.ExceptionParses a given list of options. Valid options are:-D Show Debugging Output
-P <integer> Distribution of the Prior (1=Gaussian, 2=Laplacian) (default: 1=Gaussian)
-H <integer> Hyperparameter Selection Method (1=Norm-based, 2=CV-based, 3=specific value) (default: 1=Norm-based)
-V <double> Specified Hyperparameter Value (use in conjunction with -H 3) (default: 0.27)
-R <string> Hyperparameter Range (use in conjunction with -H 2) (format: R:start-end,multiplier OR L:val(1), val(2), ..., val(n)) (default: R:0.01-316,3.16)
-Tl <double> Tolerance Value (default: 0.0005)
-S <double> Threshold Value (default: 0.5)
-F <integer> Number Of Folds (use in conjuction with -H 2) (default: 2)
-I <integer> Max Number of Iterations (default: 100)
-N Normalize the data
-seed <number> Seed for randomizing instances order in CV-based hyperparameter selection (default: 1)
- Specified by:
setOptionsin interfaceOptionHandler- Overrides:
setOptionsin classClassifier- Parameters:
options- the list of options as an array of strings- Throws:
java.lang.Exception- if an option is not supported
-
getOptions
public java.lang.String[] getOptions()
Description copied from class:ClassifierGets the current settings of the Classifier.- Specified by:
getOptionsin interfaceOptionHandler- Overrides:
getOptionsin classClassifier- Returns:
- an array of strings suitable for passing to setOptions
-
main
public static void main(java.lang.String[] argv)
Main method for testing this class.- Parameters:
argv- the options
-
debugTipText
public java.lang.String debugTipText()
Returns the tip text for this property- Overrides:
debugTipTextin classClassifier- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDebug
public void setDebug(boolean debugMode)
Description copied from class:ClassifierSet debugging mode.- Overrides:
setDebugin classClassifier- Parameters:
debugMode- true if debug output should be printed
-
hyperparameterSelectionTipText
public java.lang.String hyperparameterSelectionTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getHyperparameterSelection
public SelectedTag getHyperparameterSelection()
Get the method used to select the hyperparameter- Returns:
- the method used to select the hyperparameter
-
setHyperparameterSelection
public void setHyperparameterSelection(SelectedTag newMethod)
Set the method used to select the hyperparameter- Parameters:
newMethod- the method used to set the hyperparameter
-
priorClassTipText
public java.lang.String priorClassTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setPriorClass
public void setPriorClass(SelectedTag newMethod)
Set the type of prior to use.- Parameters:
newMethod- the type of prior to use.
-
getPriorClass
public SelectedTag getPriorClass()
Get the type of prior to use.- Returns:
- the type of prior to use
-
thresholdTipText
public java.lang.String thresholdTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getThreshold
public double getThreshold()
Return the threshold being used.- Returns:
- the threshold
-
setThreshold
public void setThreshold(double threshold)
Set the threshold to use.- Parameters:
threshold- the threshold to use
-
toleranceTipText
public java.lang.String toleranceTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getTolerance
public double getTolerance()
Get the tolerance value- Returns:
- the tolerance value
-
setTolerance
public void setTolerance(double tolerance)
Set the tolerance value- Parameters:
tolerance- the tolerance value to use
-
hyperparameterValueTipText
public java.lang.String hyperparameterValueTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getHyperparameterValue
public double getHyperparameterValue()
Get the hyperparameter value. Used when the hyperparameter selection method is set to specific value- Returns:
- the hyperparameter value
-
setHyperparameterValue
public void setHyperparameterValue(double hyperparameterValue)
Set the hyperparameter value. Used when the hyperparameter selection method is set to specific value- Parameters:
hyperparameterValue- the value of the hyperparameter
-
numFoldsTipText
public java.lang.String numFoldsTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getNumFolds
public int getNumFolds()
Return the number of folds for CV-based hyperparameter selection- Returns:
- the number of CV folds
-
setNumFolds
public void setNumFolds(int numFolds)
Set the number of folds to use for CV-based hyperparameter selection- Parameters:
numFolds- number of folds to select
-
seedTipText
public java.lang.String seedTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setSeed
public void setSeed(int seed)
Set the seed for randomizing the instances for CV-based hyperparameter selection- Parameters:
seed- the seed to use
-
getSeed
public int getSeed()
Get the seed for randomizing the instances for CV-based hyperparameter selection- Returns:
- the seed to use
-
maxIterationsTipText
public java.lang.String maxIterationsTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getMaxIterations
public int getMaxIterations()
Get the maximum number of iterations to perform- Returns:
- the maximum number of iterations
-
setMaxIterations
public void setMaxIterations(int maxIterations)
Set the maximum number of iterations to perform- Parameters:
maxIterations- maximum number of iterations
-
normalizeDataTipText
public java.lang.String normalizeDataTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
isNormalizeData
public boolean isNormalizeData()
Returns true if the data is to be normalized first- Returns:
- true if the data is to be normalized
-
setNormalizeData
public void setNormalizeData(boolean normalizeData)
Set whether to normalize the data or not- Parameters:
normalizeData- true if data is to be normalized
-
hyperparameterRangeTipText
public java.lang.String hyperparameterRangeTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getHyperparameterRange
public java.lang.String getHyperparameterRange()
Get the range of hyperparameter values to consider during CV-based selection.- Returns:
- the range of hyperparameters as a Stringe
-
setHyperparameterRange
public void setHyperparameterRange(java.lang.String hyperparameterRange)
Set the range of hyperparameter values to consider during CV-based selection- Parameters:
hyperparameterRange- the range of hyperparameter values
-
isDebug
public boolean isDebug()
Returns true if debug is turned on.- Returns:
- true if debug is turned on
-
getRevision
public java.lang.String getRevision()
Returns the revision string.- Specified by:
getRevisionin interfaceRevisionHandler- Overrides:
getRevisionin classClassifier- Returns:
- the revision
-
-