Class InterquartileRange
- java.lang.Object
-
- weka.filters.Filter
-
- weka.filters.SimpleFilter
-
- weka.filters.SimpleBatchFilter
-
- weka.filters.unsupervised.attribute.InterquartileRange
-
- All Implemented Interfaces:
java.io.Serializable,CapabilitiesHandler,OptionHandler,RevisionHandler
public class InterquartileRange extends SimpleBatchFilter
A filter for detecting outliers and extreme values based on interquartile ranges. The filter skips the class attribute.
Outliers:
Q3 + OF*IQR < x <= Q3 + EVF*IQR
or
Q1 - EVF*IQR <= x < Q1 - OF*IQR
Extreme values:
x > Q3 + EVF*IQR
or
x < Q1 - EVF*IQR
Key:
Q1 = 25% quartile
Q3 = 75% quartile
IQR = Interquartile Range, difference between Q1 and Q3
OF = Outlier Factor
EVF = Extreme Value Factor Valid options are:-D Turns on output of debugging information.
-R <col1,col2-col4,...> Specifies list of columns to base outlier/extreme value detection on. If an instance is considered in at least one of those attributes an outlier/extreme value, it is tagged accordingly. 'first' and 'last' are valid indexes. (default none)
-O <num> The factor for outlier detection. (default: 3)
-E <num> The factor for extreme values detection. (default: 2*Outlier Factor)
-E-as-O Tags extreme values also as outliers. (default: off)
-P Generates Outlier/ExtremeValue pair for each numeric attribute in the range, not just a single indicator pair for all the attributes. (default: off)
-M Generates an additional attribute 'Offset' per Outlier/ExtremeValue pair that contains the multiplier that the value is off the median. value = median + 'multiplier' * IQR Note: implicitely sets '-P'. (default: off)Thanks to Dale for a few brainstorming sessions.- Version:
- $Revision: 9529 $
- Author:
- Dale Fletcher (dale at cs dot waikato dot ac dot nz), fracpete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static intNON_NUMERICindicator for non-numeric attributes
-
Constructor Summary
Constructors Constructor Description InterquartileRange()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.StringattributeIndicesTipText()Returns the tip text for this propertyjava.lang.StringdetectionPerAttributeTipText()Returns the tip text for this propertyjava.lang.StringextremeValuesAsOutliersTipText()Returns the tip text for this propertyjava.lang.StringextremeValuesFactorTipText()Returns the tip text for this propertyjava.lang.StringgetAttributeIndices()Gets the current range selectionCapabilitiesgetCapabilities()Returns the Capabilities of this filter.booleangetDetectionPerAttribute()Gets whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").booleangetExtremeValuesAsOutliers()Get whether extreme values are also tagged as outliers.doublegetExtremeValuesFactor()Gets the factor for determining the thresholds for extreme values.java.lang.String[]getOptions()Gets the current settings of the filter.doublegetOutlierFactor()Gets the factor for determining the thresholds for outliers.booleangetOutputOffsetMultiplier()Gets whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.java.lang.StringgetRevision()Returns the revision string.java.lang.StringglobalInfo()Returns a string describing this filterjava.util.EnumerationlistOptions()Returns an enumeration describing the available options.static voidmain(java.lang.String[] args)Main method for testing this class.java.lang.StringoutlierFactorTipText()Returns the tip text for this propertyjava.lang.StringoutputOffsetMultiplierTipText()Returns the tip text for this propertyvoidsetAttributeIndices(java.lang.String value)Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).voidsetAttributeIndicesArray(int[] value)Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).voidsetDetectionPerAttribute(boolean value)Set whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").voidsetExtremeValuesAsOutliers(boolean value)Set whether extreme values are also tagged as outliers.voidsetExtremeValuesFactor(double value)Sets the factor for determining the thresholds for extreme values.voidsetOptions(java.lang.String[] options)Parses a list of options for this object.voidsetOutlierFactor(double value)Sets the factor for determining the thresholds for outliers.voidsetOutputOffsetMultiplier(boolean value)Set whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.-
Methods inherited from class weka.filters.SimpleBatchFilter
batchFinished, input
-
Methods inherited from class weka.filters.SimpleFilter
debugTipText, getDebug, setDebug, setInputFormat
-
Methods inherited from class weka.filters.Filter
batchFilterFile, filterFile, getCapabilities, getOutputFormat, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, numPendingOutput, output, outputPeek, toString, useFilter, wekaStaticWrapper
-
-
-
-
Field Detail
-
NON_NUMERIC
public static final int NON_NUMERIC
indicator for non-numeric attributes- See Also:
- Constant Field Values
-
-
Method Detail
-
globalInfo
public java.lang.String globalInfo()
Returns a string describing this filter- Specified by:
globalInfoin classSimpleFilter- Returns:
- a description of the filter suitable for displaying in the explorer/experimenter gui
-
listOptions
public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptionsin interfaceOptionHandler- Overrides:
listOptionsin classSimpleFilter- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(java.lang.String[] options) throws java.lang.ExceptionParses a list of options for this object. Valid options are:-D Turns on output of debugging information.
-R <col1,col2-col4,...> Specifies list of columns to base outlier/extreme value detection on. If an instance is considered in at least one of those attributes an outlier/extreme value, it is tagged accordingly. 'first' and 'last' are valid indexes. (default none)
-O <num> The factor for outlier detection. (default: 3)
-E <num> The factor for extreme values detection. (default: 2*Outlier Factor)
-E-as-O Tags extreme values also as outliers. (default: off)
-P Generates Outlier/ExtremeValue pair for each numeric attribute in the range, not just a single indicator pair for all the attributes. (default: off)
-M Generates an additional attribute 'Offset' per Outlier/ExtremeValue pair that contains the multiplier that the value is off the median. value = median + 'multiplier' * IQR Note: implicitely sets '-P'. (default: off)- Specified by:
setOptionsin interfaceOptionHandler- Overrides:
setOptionsin classSimpleFilter- Parameters:
options- the list of options as an array of strings- Throws:
java.lang.Exception- if an option is not supported- See Also:
SimpleFilter.reset()
-
getOptions
public java.lang.String[] getOptions()
Gets the current settings of the filter.- Specified by:
getOptionsin interfaceOptionHandler- Overrides:
getOptionsin classSimpleFilter- Returns:
- an array of strings suitable for passing to setOptions
-
attributeIndicesTipText
public java.lang.String attributeIndicesTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getAttributeIndices
public java.lang.String getAttributeIndices()
Gets the current range selection- Returns:
- a string containing a comma separated list of ranges
-
setAttributeIndices
public void setAttributeIndices(java.lang.String value)
Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).- Parameters:
value- a string representing the list of attributes. Since the string will typically come from a user, attributes are indexed from 1.
eg: first-3,5,6-last- Throws:
java.lang.IllegalArgumentException- if an invalid range list is supplied
-
setAttributeIndicesArray
public void setAttributeIndicesArray(int[] value)
Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).- Parameters:
value- an array containing indexes of attributes to work on. Since the array will typically come from a program, attributes are indexed from 0.- Throws:
java.lang.IllegalArgumentException- if an invalid set of ranges is supplied
-
outlierFactorTipText
public java.lang.String outlierFactorTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setOutlierFactor
public void setOutlierFactor(double value)
Sets the factor for determining the thresholds for outliers.- Parameters:
value- the factor.
-
getOutlierFactor
public double getOutlierFactor()
Gets the factor for determining the thresholds for outliers.- Returns:
- the factor.
-
extremeValuesFactorTipText
public java.lang.String extremeValuesFactorTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setExtremeValuesFactor
public void setExtremeValuesFactor(double value)
Sets the factor for determining the thresholds for extreme values.- Parameters:
value- the factor.
-
getExtremeValuesFactor
public double getExtremeValuesFactor()
Gets the factor for determining the thresholds for extreme values.- Returns:
- the factor.
-
extremeValuesAsOutliersTipText
public java.lang.String extremeValuesAsOutliersTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setExtremeValuesAsOutliers
public void setExtremeValuesAsOutliers(boolean value)
Set whether extreme values are also tagged as outliers.- Parameters:
value- whether or not to tag extreme values also as outliers.
-
getExtremeValuesAsOutliers
public boolean getExtremeValuesAsOutliers()
Get whether extreme values are also tagged as outliers.- Returns:
- true if extreme values are also tagged as outliers.
-
detectionPerAttributeTipText
public java.lang.String detectionPerAttributeTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDetectionPerAttribute
public void setDetectionPerAttribute(boolean value)
Set whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").- Parameters:
value- whether or not to generate indicator attribute pairs for each numeric attribute.
-
getDetectionPerAttribute
public boolean getDetectionPerAttribute()
Gets whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").- Returns:
- true if indicator attribute pairs are generated for each numeric attribute.
-
outputOffsetMultiplierTipText
public java.lang.String outputOffsetMultiplierTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setOutputOffsetMultiplier
public void setOutputOffsetMultiplier(boolean value)
Set whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.- Parameters:
value- whether or not to generate the additional attribute.
-
getOutputOffsetMultiplier
public boolean getOutputOffsetMultiplier()
Gets whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.- Returns:
- true if the additional attribute is generated.
-
getCapabilities
public Capabilities getCapabilities()
Returns the Capabilities of this filter.- Specified by:
getCapabilitiesin interfaceCapabilitiesHandler- Overrides:
getCapabilitiesin classFilter- Returns:
- the capabilities of this object
- See Also:
Capabilities
-
getRevision
public java.lang.String getRevision()
Returns the revision string.- Specified by:
getRevisionin interfaceRevisionHandler- Overrides:
getRevisionin classFilter- Returns:
- the revision
-
main
public static void main(java.lang.String[] args)
Main method for testing this class.- Parameters:
args- should contain arguments to the filter: use -h for help
-
-