Package weka.core
Class Stopwords
- java.lang.Object
-
- weka.core.Stopwords
-
- All Implemented Interfaces:
RevisionHandler
public class Stopwords extends java.lang.Object implements RevisionHandler
Class that can test whether a given string is a stop word. Lowercases all words before the test. The format for reading and writing is one word per line, lines starting with '#' are interpreted as comments and therefore skipped. The default stopwords are based on Rainbow. Accepts the following parameter: -i file
loads the stopwords from the given file -o file
saves the stopwords to the given file -p
outputs the current stopwords on stdout Any additional parameters are interpreted as words to test as stopwords.- Version:
- $Revision: 1.6 $
- Author:
- Eibe Frank (eibe@cs.waikato.ac.nz), Ashraf M. Kibriya (amk14@cs.waikato.ac.nz), FracPete (fracpete at waikato dot ac dot nz)
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidadd(java.lang.String word)adds the given word to the stopword list (is automatically converted to lower case and trimmed)voidclear()removes all stopwordsjava.util.Enumerationelements()Returns a sorted enumeration over all stored stopwordsjava.lang.StringgetRevision()Returns the revision string.booleanis(java.lang.String word)Returns true if the given string is a stop word.static booleanisStopword(java.lang.String str)Returns true if the given string is a stop word.static voidmain(java.lang.String[] args)Accepts the following parameter:voidread(java.io.BufferedReader reader)Generates a new Stopwords object from the reader.voidread(java.io.File file)Generates a new Stopwords object from the given filevoidread(java.lang.String filename)Generates a new Stopwords object from the given filebooleanremove(java.lang.String word)removes the word from the stopword listjava.lang.StringtoString()returns the current stopwords in a stringvoidwrite(java.io.BufferedWriter writer)Writes the current stopwords to the given writer.voidwrite(java.io.File file)Writes the current stopwords to the given filevoidwrite(java.lang.String filename)Writes the current stopwords to the given file
-
-
-
Constructor Detail
-
Stopwords
public Stopwords()
initializes the stopwords (based on Rainbow).
-
-
Method Detail
-
clear
public void clear()
removes all stopwords
-
add
public void add(java.lang.String word)
adds the given word to the stopword list (is automatically converted to lower case and trimmed)- Parameters:
word- the word to add
-
remove
public boolean remove(java.lang.String word)
removes the word from the stopword list- Parameters:
word- the word to remove- Returns:
- true if the word was found in the list and then removed
-
is
public boolean is(java.lang.String word)
Returns true if the given string is a stop word.- Parameters:
word- the word to test- Returns:
- true if the word is a stopword
-
elements
public java.util.Enumeration elements()
Returns a sorted enumeration over all stored stopwords- Returns:
- the enumeration over all stopwords
-
read
public void read(java.lang.String filename) throws java.lang.ExceptionGenerates a new Stopwords object from the given file- Parameters:
filename- the file to read the stopwords from- Throws:
java.lang.Exception- if reading fails
-
read
public void read(java.io.File file) throws java.lang.ExceptionGenerates a new Stopwords object from the given file- Parameters:
file- the file to read the stopwords from- Throws:
java.lang.Exception- if reading fails
-
read
public void read(java.io.BufferedReader reader) throws java.lang.ExceptionGenerates a new Stopwords object from the reader. The reader is closed automatically.- Parameters:
reader- the reader to get the stopwords from- Throws:
java.lang.Exception- if reading fails
-
write
public void write(java.lang.String filename) throws java.lang.ExceptionWrites the current stopwords to the given file- Parameters:
filename- the file to write the stopwords to- Throws:
java.lang.Exception- if writing fails
-
write
public void write(java.io.File file) throws java.lang.ExceptionWrites the current stopwords to the given file- Parameters:
file- the file to write the stopwords to- Throws:
java.lang.Exception- if writing fails
-
write
public void write(java.io.BufferedWriter writer) throws java.lang.ExceptionWrites the current stopwords to the given writer. The writer is closed automatically.- Parameters:
writer- the writer to get the stopwords from- Throws:
java.lang.Exception- if writing fails
-
toString
public java.lang.String toString()
returns the current stopwords in a string- Overrides:
toStringin classjava.lang.Object- Returns:
- the current stopwords
-
isStopword
public static boolean isStopword(java.lang.String str)
Returns true if the given string is a stop word.- Parameters:
str- the word to test- Returns:
- true if the word is a stopword
-
getRevision
public java.lang.String getRevision()
Returns the revision string.- Specified by:
getRevisionin interfaceRevisionHandler- Returns:
- the revision
-
main
public static void main(java.lang.String[] args) throws java.lang.ExceptionAccepts the following parameter: -i file
loads the stopwords from the given file -o file
saves the stopwords to the given file -p
outputs the current stopwords on stdout Any additional parameters are interpreted as words to test as stopwords.- Parameters:
args- commandline parameters- Throws:
java.lang.Exception- if something goes wrong
-
-