maspack.util
Class ReaderTokenizer

java.lang.Object
  extended by maspack.util.ReaderTokenizer

public class ReaderTokenizer
extends java.lang.Object

A tokenizer class that implements the same functionality as java.io.StreamTokenizer, but with enhancements which allow it to read numbers with exponents, read integers formatted in hex, and save and restore settings.

The tokenizer reads characters from a Reader, removes comments and whitespace, and parses the remainder into tokens. One token is parsed for each call to nextToken. The following tokens may be generated:

numbers
A floating point number, or decimal or hexadecimal integer. Number recognition is enabled by default but can be enabled or disabled using parseNumbers.
word identifiers
A continuous sequence of word characters not beginning with a digit. By default, word characters consist of alphanumerics, underscores (_), and any character whose unicode value exceeds 0xA0. Other characters can be designated as word characters using wordChar or its sibling methods.
strings
Any sequence of characters between a pair of identical quote characters. By default, ' and " are enabled as quote characters. Other characters can be designated as quote characters using quoteChar. Strings include the usual C-style escape sequences beginning with backslash (\). Setting backslash to a quote character is likely to produce strange result.
character token
Any character not parsed into one of the above tokens; such characters correspond to ordinary characters. A character can be designated as ordinary using ordinaryChar or its sibling methods. The method resetSyntax sets all characters to be ordinary.
end of line
An end of line. By default. recognition of this token is disabled, but can be enabled or disabled using eolIsSignificant.
end of file
End of input from the reader.
Three comment styles are supported: By default, # is designated as a comment character and C/C++ comments are disabled.

By default, whitespace consists of any character with an ascii value from 0x00 to 0x20. This includes new lines, carriage returns, and spaces. Other characters can be designated as whitespace using whitespaceChar and its sibling routines.

Basic Usage

The basic usage paradigm for ReaderTokenizer is:

We first give a simple example that reads in pairs of words and numbers, such as following:

     foo 1.67
     dayton 678
     thadius 1e-4
 
This can be parsed with the following code:
 HashMap<String,Double> wordNumberPairs = new HashMap<String,Double>();
 ReaderTokenizer rtok = new ReaderTokenizer (new FileReader ("foo.txt"));
 while (rtok.nextToken() != ReaderTokenizer.TT_EOF) {
    if (rtok.ttype != ReaderTokenizer.TT_WORD) {
       throw new IOException ("word expected, line " + rtok.lineno());
    }
    String word = rtok.sval;
    if (rtok.nextToken() != ReaderTokenizer.TT_NUMBER) {
       throw new IOException ("number expected, line " + rtok.lineno());
    }
    Double number = new Double (rtok.nval);
    wordNumberPairs.put (word, number);
 }
 
The application uses nextToken to continuously read tokens until the end of the file. Once a token is read, its type is inspected using the ttype field. If the type is inappropriate, then an exception is thrown, along with the tokenizer's current line number which is obtained using lineno. The numeric value for a number token is stored in the field nval, while the string value for a word token is stored in the field sval.

We now give a more complex example that expects input to consist of either vectors of numbers, or file names, as indicated by the keywords vector and file. Keywords should be followed by an equal sign (=), vectors can be any length but should be surrounded by square brackets ([ ]), and file names should be enclosed in double quotes ("), as in the following sample:

     vector=[ 0.4 0.4 6.7 ]
     file="bar.txt"
     vector = [ 1.3 0.4 6.7 1.2 ]
 
This can be parsed with the following code:
 ReaderTokenizer rtok = new ReaderTokenizer (new InputStreamReader (System.in));
 ArrayList<Double> vector = new ArrayList<Double>();
 String fileName = null;
 while (rtok.nextToken() != ReaderTokenizer.TT_EOF) {
    if (rtok.ttype != ReaderTokenizer.TT_WORD) {
       throw new IOException ("keyword expected, line " + rtok.lineno());
    }
    String keyword = rtok.sval;
    if (keyword.equals ("vector")) {
       rtok.scanToken ('=');
       rtok.scanToken ('[');
       vector.clear();
       while (rtok.nextToken() == ReaderTokenizer.TT_NUMBER) {
          vector.add (rtok.nval);
       }
       if (rtok.ttype != ']') {
          throw new IOException ("']' expected, line " + rtok.lineno());
       }
       // do something with vector ...
    }
    else if (keyword.equals ("file")) {
       rtok.scanToken ('=');
       rtok.nextToken();
       if (!rtok.tokenIsQuotedString ('"')) {
          throw new IOException ("quoted string expected, line "
          + rtok.lineno());
       }
       fileName = rtok.sval;
       // do something with file ...
    }
    else {
       throw new IOException ("unrecognized keyword, line " + rtok.lineno());
    }
 }
 
This code is similar to the first example, except that it also uses the methods scanToken and tokenIsQuotedString. The first is a convenience routine that reads the next token and verifies that it's a specific type of ordinary character token (and throws a diagnostic exception if this is not the case). This facilitates compact code in cases where we know exactly what sort of input is expected. Similar methods exist for other token types, such as scanNumber, scanWord, or scanQuotedString. The second method, tokenIsQuotedString, verifies that the most recently read token is in fact a string delimited by a specific quote character. Similar methods exist to verify other token types: tokenIsNumber, tokenIsWord, or tokenIsBoolean.

Token Lookahead

ReaderTokenizer supports one token of lookahead. That is, the application may read a token, and then, if it is not recognized, return it to the input stream to be read by another part of the application. Tokens are returned using the method pushBack. Push back is implemented very efficiently, and so applications should not be shy about using it.

Here is an example consisting of a routine that reads in a set of numbers and stops when a non-numeric token is read:

 public Double[] readNumbers (ReaderTokenizer rtok) throws IOException {
    ArrayList<Double> numbers = new ArrayList<Double>();
    while (rtok.nextToken() == ReaderTokenizer.TT_NUMBER) {
       numbers.add (rtok.nval);
    }
    rtok.pushBack();
    return numbers.toArray (new Double[0]);
 }
 
The non-numeric token is returned to the input stream so that it is available to whatever parsing code is invoked next.

Reading Numbers

A major reason for implementing ReaderTokenizer is that java.io.StreamTokenizer does not properly handle long or integer values, or floating point numbers with exponents. ReaderTokenizer does handle these cases. In particular:

One can use tokenIsInteger to query whether or not a numeric token corresponds to an integer value. Similarly, the convenience routines scanInteger, scanLong, and scanShort can be used to require that an integer value is scanned and converted to either int, long, or short. Another convenience routine, scanNumbers, can be used to read a sequence of numbers and place them in an array.

Saving and Restoring State

A deficiency of java.io.StreamTokenizer is that it does not permit its settings to be queried and set arbitrarily. This makes it impossible to transparently save and restore the tokenizer state in different contexts. For example, if a parsing routine requires a specific setting (say, accepting end-of-line as a token), then it has no way of knowing if it should undo this setting upon completion.

ReaderTokenizer allows of its state variables to be queried, so a that parsing routine can set and restore state transparently:

    void specialParsingRoutine (ReaderTokenizer rtok)
     {
       boolean eolTokenSave = rtok.getEolIsSignificant(); // save
       rtok.eolIsSignificant(true);
 
       ... do parsing that requires EOL tokens ...
 
       rtok.eolIsSignificant(eolTokenSave); // restore
     }
 

This includes the ability to save and restore the type settings for individual characters. In the following example, $, @, and & are set to word characters, and then restored to whatever their previous settings were:

 void anotherSpecialParsingRoutine (ReaderTokenizer rtok) {
    int[] charSaves = rtok.getCharSettings ("$@&");
    rtok.wordChars ("$@&");
    // do parsing that requires $, @, and & to be word characters
 
    rtok.setCharSettings ("$@&", typeSaves); // restore settings
 }
 


Field Summary
 long lval
          If the current token is an integer as well as a number, then this field contains the long value of that integer.
 double nval
          If the current token is a number, then this field contains the value of that number.
 java.lang.String sval
          If the current token is a word or string, then this field contains the value of that word or string.
static int TT_EOF
          A constant indicating the end of the input.
static int TT_EOL
          A constant indicating the end of a line.
static int TT_NOTHING
          A constant indicating that the current token has no value
static int TT_NUMBER
          A constant indicating that the current token is a number.
static int TT_WORD
          A constant indicating that the current token is a word.
 int ttype
          Contains the type of the token read after a call to nextToken.
 
Constructor Summary
ReaderTokenizer(java.io.Reader reader)
          Creates a new ReaderTokenizer from the specified Reader.
 
Method Summary
 void clearNumericExtensionChars()
          Clears all numeric extension characters.
 void close()
          Close the underlying reader for this tokenizer.
 void commentChar(int ch)
          Sets the specified character to be a comment character.
 void eolIsSignificant(boolean enable)
          Specifies whether or not end-of-line should be treated as a token.
 int getCharSetting(int ch)
          Gets the setting associated with a character.
 int[] getCharSettings(int low, int high)
          Gets the settings associated with a set of characters specified by the range low <= ch <= high.
 int[] getCharSettings(java.lang.String str)
          Gets the settings associated with a set of characters specified by a string and returns them in an array.
 boolean getEolIsSignificant()
          Returns true if end-of-line is treated as a token by this tokenizer.
 boolean getLowerCaseMode()
          Returns true if lower-case mode is enabled for this tokenizer.
 java.lang.String getNumericExtensionChars()
          Returns a String specifying all characters which are enabled as numeric extensions.
 boolean getParseNumbers()
          Returns true if number parsing is enabled for this tokenizer.
 java.io.Reader getReader()
          Returns the Reader which supplies the input for this tokenizer.
 java.lang.String getResourceName()
          Returns the name of the resource (e.g., File or URL) associated with this ReaderTokenizer.
 boolean getSlashSlashComments()
          Returns true if C++-style slash-slash comments are enabled.
 boolean getSlashStarComments()
          Returns true if C-style slash-star comments are enabled.
 boolean isCommentChar(int ch)
          Returns true if the specified character is a comment character.
 boolean isNumericExtensionChar(int ch)
          Returns true if the specified character is a numeric extension character.
 boolean isOrdinaryChar(int ch)
          Returns true if the specified character is an ordinary character.
 boolean isQuoteChar(int ch)
          Returns true if the specified character is a quote character.
 boolean isWhitespaceChar(int ch)
          Returns true if the specified character is an whitespace character.
 boolean isWordChar(int ch)
          Returns true if the specified character is an word character.
 java.lang.String lastCommentLine()
          Returns the last comment line (excluding the trailing newline) that was read by this tokenizer, or null if no comments have been read yet.
 int lineno()
          Returns the current line number.
 void lowerCaseMode(boolean enable)
          Enables or disables lower-case mode.
 int nextToken()
          Parses the next token from the input and returns its type.
 void numericExtensionChar(int ch)
          Sets the specified character to be a numeric extension character.
 void numericExtensionChars(int low, int high)
          Sets all characters in the range low <= ch <= high to be numeric extension characters.
 void numericExtensionChars(java.lang.String chars)
          Sets some specified characters to be numeric extension characters.
 void ordinaryChar(int ch)
          Sets the specified character to be "ordinary", so that it indicates a token whose type is given by the character itself.
 void ordinaryChars(int low, int high)
          Sets all characters in the range low <= ch <= high to be "ordinary".
 void ordinaryChars(java.lang.String str)
          Sets all characters specified by a string to be "ordinary".
 void parseNumbers(boolean enable)
          Enables parsing of numbers by this tokenizer.
 void pushBack()
          Pushes the current token back into the input, so that it may be read again using nextToken.
 void quoteChar(int ch)
          Sets the specified character to be a "quote" character, so that it delimits a quoted string.
 void resetSyntax()
          Sets all characters to be ordinary characters, so that they are treated as individual tokens, as disabled number parsing.
 boolean scanBoolean()
          Reads the next token and checks that it represents a boolean.
 void scanCharacter(int ch)
          Reads the next token and verifies that it matches a specified character.
 int scanInteger()
          Reads the next token and checks that it is an integer.
 long scanLong()
          Reads the next token and checks that it is an integer.
 double scanNumber()
          Reads the next token and checks that it is a number.
 int scanNumbers(double[] vals, int max)
          Reads a series of numeric tokens and returns their values.
 java.lang.String scanQuotedString(char quoteChar)
          Reads the next token and checks that it is a quoted string delimited by the specified quote character.
 short scanShort()
          Reads the next token and checks that it is an integer.
 void scanToken(int type)
          Reads the next token and verifies that it is of the specified type.
 java.lang.String scanWord()
          Reads the next token and checks that it is a word.
 java.lang.String scanWord(java.lang.String word)
          Reads the next token and checks that it is a specific word.
 java.lang.String scanWordOrQuotedString(char quoteChar)
          Reads the next token and checks that it is either a word or a quoted string delimited by the specified quote character.
 void setCharSetting(int ch, int setting)
          Assigns the settings for a character.
 void setCharSettings(int low, int high, int[] settings)
          Assigns settings for a set of characters specified by the range low <= ch <= high.
 void setCharSettings(java.lang.String str, int[] settings)
          Assigns settings for a set of characters specified by a string.
 void setLineno(int num)
          Sets the current line number.
 void setReader(java.io.Reader reader)
          Sets the Reader which supplies the input for this tokenizer.
 void setResourceName(java.lang.String name)
          Sets the name of the resource (e.g., File or URL) associated with this ReaderTokenizer.
 void slashSlashComments(boolean enable)
          Enables the handling of C++-style slash-slash comments, commenting out all characters between // and the next line.
 void slashStarComments(boolean enable)
          Enables the handling of C-style slash-star comments, commenting out all characters, inclusing new lines, between /* and */.
 boolean tokenIsBoolean()
          Returns true if the current token represents a boolean.
 boolean tokenIsInteger()
          Returns true if the current token is an integer.
 boolean tokenIsNumber()
          Returns true if the current token is a number.
 boolean tokenIsQuotedString(char quoteChar)
          Returns true if the current token is a quoted string delimited by the specified quote character.
 boolean tokenIsWord()
          Returns true if the current token is a word.
 boolean tokenIsWordOrQuotedString(char quoteChar)
          Returns true if the current token is either a word or a quoted string delimited by the specified quote character.
 java.lang.String tokenName()
          Returns a string identifying the current token.
 java.lang.String toString()
          Returns a string describing the type and value of the current token, as well as the current line number.
 void whitespaceChar(int ch)
          Sets the specified character to be a "white space" character, so that it delimits tokens and does not otherwise take part in their formation.
 void whitespaceChars(int low, int high)
          Sets all characters in the range low <= ch <= high to "whitespace" characters.
 void wordChar(int ch)
          Sets the specified character to be a "word" character, so that it can form part of a work token.
 void wordChars(int low, int high)
          Sets all characters in the range low <= ch <= high to be "word" characters.
 void wordChars(java.lang.String chars)
          Sets some specified characters to be "word" characters, so that they can form part of a work token.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

TT_EOF

public static final int TT_EOF
A constant indicating the end of the input.

See Also:
Constant Field Values

TT_EOL

public static final int TT_EOL
A constant indicating the end of a line.

See Also:
Constant Field Values

TT_NUMBER

public static final int TT_NUMBER
A constant indicating that the current token is a number.

See Also:
Constant Field Values

TT_WORD

public static final int TT_WORD
A constant indicating that the current token is a word.

See Also:
Constant Field Values

TT_NOTHING

public static final int TT_NOTHING
A constant indicating that the current token has no value

See Also:
Constant Field Values

ttype

public int ttype
Contains the type of the token read after a call to nextToken.


nval

public double nval
If the current token is a number, then this field contains the value of that number.


lval

public long lval
If the current token is an integer as well as a number, then this field contains the long value of that integer.


sval

public java.lang.String sval
If the current token is a word or string, then this field contains the value of that word or string.

Constructor Detail

ReaderTokenizer

public ReaderTokenizer(java.io.Reader reader)
Creates a new ReaderTokenizer from the specified Reader.

Parameters:
reader - Reader that provides the input stream
Method Detail

lineno

public int lineno()
Returns the current line number.

Returns:
current line number

setLineno

public void setLineno(int num)
Sets the current line number.

Parameters:
num - new line number

eolIsSignificant

public void eolIsSignificant(boolean enable)
Specifies whether or not end-of-line should be treated as a token. If treated as a token, then end-of-line is recognized as a token indicated by the type TT_EOL.

Parameters:
enable - if true, then end-of-line is treated as a token
See Also:
getEolIsSignificant()

getEolIsSignificant

public boolean getEolIsSignificant()
Returns true if end-of-line is treated as a token by this tokenizer.

Returns:
true if end-of-line is treated as a token
See Also:
eolIsSignificant(boolean)

lowerCaseMode

public void lowerCaseMode(boolean enable)
Enables or disables lower-case mode. In lower-case mode, the values of word tokens that appear in the sval field are converted to lower case.

Parameters:
enable - if true, enables lower case mode

getLowerCaseMode

public boolean getLowerCaseMode()
Returns true if lower-case mode is enabled for this tokenizer.

Returns:
true if lower-case mode is enabled

pushBack

public void pushBack()
Pushes the current token back into the input, so that it may be read again using nextToken. One token of push-back is supported.


getCharSetting

public int getCharSetting(int ch)
Gets the setting associated with a character. The setting describes the type of a character ( ordinary, word, comment, quote, or whitespace), and whether or not the character is a numeric extension. However, the information is opaque to the user, and is intended mainly to allow the saving and restoring of character settings within a parsing procedure.

Parameters:
ch - character for which setting is required
Returns:
setting associated the character
See Also:
setCharSetting(int, int), getCharSettings(String), getCharSettings(int,int)

getCharSettings

public int[] getCharSettings(java.lang.String str)
Gets the settings associated with a set of characters specified by a string and returns them in an array. For more information on character settings, see getCharSetting.

Parameters:
str - characters for which setting are required
Returns:
settings associated with each character
See Also:
setCharSettings(String,int[]), getCharSetting(int), getCharSettings(int,int)

getCharSettings

public int[] getCharSettings(int low,
                             int high)
Gets the settings associated with a set of characters specified by the range low <= ch <= high. For more information on character settings, see getCharSetting.

Parameters:
low - lowest character whose setting is desired
high - highest character whose setting is desired
Returns:
settings associated with each character
See Also:
setCharSettings(int,int,int[]), getCharSetting(int), getCharSettings(String)

setCharSetting

public void setCharSetting(int ch,
                           int setting)
Assigns the settings for a character. For more information on character settings, see getCharSetting.

Parameters:
ch - character to be set
setting - setting for the character
See Also:
getCharSetting(int), setCharSettings(String,int[]), setCharSettings(int,int,int[])

setCharSettings

public void setCharSettings(java.lang.String str,
                            int[] settings)
Assigns settings for a set of characters specified by a string. The settings are provided in an accompanying array. For more information on character settings, see getCharSetting.

Parameters:
str - characters to be set
settings - setting for each character
See Also:
getCharSettings(String), setCharSetting(int,int), setCharSettings(int,int,int[])

setCharSettings

public void setCharSettings(int low,
                            int high,
                            int[] settings)
Assigns settings for a set of characters specified by the range low <= ch <= high. The settings are provided in an accompanying array. For more information on character settings, see getCharSetting.

Parameters:
low - lowest character to be set
high - highest character to be set
settings - setting for each character
See Also:
getCharSettings(String), setCharSetting(int,int), setCharSettings(String,int[])

ordinaryChar

public void ordinaryChar(int ch)
Sets the specified character to be "ordinary", so that it indicates a token whose type is given by the character itself.

Setting the end-of-line character to be ordinary may interfere with ability of this tokenizer to count lines. Setting numeric characters to be ordinary may interfere with the ability of this tokenizer to parse numbers, if numeric parsing is enabled.

Parameters:
ch - character to be designated as ordinary.
See Also:
isOrdinaryChar(int), ordinaryChars(int, int)

ordinaryChars

public void ordinaryChars(int low,
                          int high)
Sets all characters in the range low <= ch <= high to be "ordinary". See ordinaryChar for more information on ordinary characters.

Parameters:
low - lowest character to be designated as ordinary.
high - highest character to be designated as ordinary.
See Also:
isOrdinaryChar(int), ordinaryChar(int)

ordinaryChars

public void ordinaryChars(java.lang.String str)
Sets all characters specified by a string to be "ordinary". See ordinaryChar for more information on ordinary characters.

Parameters:
str - string giving the ordinary characters
See Also:
isOrdinaryChar(int), ordinaryChar(int)

isOrdinaryChar

public final boolean isOrdinaryChar(int ch)
Returns true if the specified character is an ordinary character.

Parameters:
ch - character to be queried
Returns:
true if chch is an ordinary character
See Also:
ordinaryChar(int), ordinaryChars(int, int)

wordChar

public void wordChar(int ch)
Sets the specified character to be a "word" character, so that it can form part of a work token.

Setting the end-of-line character to be word may interfere with ability of this tokenizer to count lines. Digits and other characters found in numbers may be specified as word characters, but if numeric parsing is enabled, the formation of numbers will take precedence over the formation of words.

Parameters:
ch - character to be designated as a word character
See Also:
isWordChar(int), wordChars(int,int), wordChars(String)

wordChars

public void wordChars(java.lang.String chars)
Sets some specified characters to be "word" characters, so that they can form part of a work token. See wordChar for more details.

Parameters:
chars - characters to be designated as word characters
See Also:
isWordChar(int), wordChar(int)

wordChars

public void wordChars(int low,
                      int high)
Sets all characters in the range low <= ch <= high to be "word" characters. See wordChar for more information on word characters.

Parameters:
low - lowest character to be designated as a word character
high - highest character to be designated as a word character
See Also:
isWordChar(int), wordChar(int)

isWordChar

public final boolean isWordChar(int ch)
Returns true if the specified character is an word character.

Parameters:
ch - character to be queried
Returns:
true if chch is an word character
See Also:
wordChar(int), wordChars(int,int), wordChars(String)

numericExtensionChar

public void numericExtensionChar(int ch)
Sets the specified character to be a numeric extension character. Other settings for the character are unaffected. Numeric extensions are sequences of characters which directly follow a number, without any intervening whitespace. They are generally used to provide qualifying information for numeric tokens, as in 100004L, 10msec, or 2.0f. Any detected numeric extension is placed in the sval field.

Setting the end-of-line character to be word may interfere with ability of this tokenizer to count lines. Digits and other characters found in numbers may be specified as word characters, but if numeric parsing is enabled, the formation of numbers will take precedence over the formation of words.

Parameters:
ch - character to be designated for numeric extension
See Also:
isNumericExtensionChar(int), numericExtensionChars(int,int), numericExtensionChars(String)

numericExtensionChars

public void numericExtensionChars(java.lang.String chars)
Sets some specified characters to be numeric extension characters. Other settings for the characters are unaffected. For more information on numeric extensions, see numericExtensionChar.

Parameters:
chars - characters to be designated as numeric extensions
See Also:
isNumericExtensionChar(int), numericExtensionChar(int), numericExtensionChars(int,int)

numericExtensionChars

public void numericExtensionChars(int low,
                                  int high)
Sets all characters in the range low <= ch <= high to be numeric extension characters. Other settings for the characters are unaffected. For more information on numeric extensions, see numericExtensionChar.

Parameters:
low - lowest character to be designated as a numeric extension character
high - highest character to be designated as a numeric extension character
See Also:
isNumericExtensionChar(int), numericExtensionChar(int), numericExtensionChars(String)

isNumericExtensionChar

public final boolean isNumericExtensionChar(int ch)
Returns true if the specified character is a numeric extension character. For more information on numeric extensions, see numericExtensionChar.

Parameters:
ch - character to be queried
Returns:
true if chch is a numeric extension character
See Also:
numericExtensionChar(int), numericExtensionChars(String), numericExtensionChars(int,int)

getNumericExtensionChars

public java.lang.String getNumericExtensionChars()
Returns a String specifying all characters which are enabled as numeric extensions. For more information on numeric extensions, see numericExtensionChar.

Returns:
string giving all numeric extension characters
See Also:
isNumericExtensionChar(int), numericExtensionChar(int), numericExtensionChars(String)

clearNumericExtensionChars

public void clearNumericExtensionChars()
Clears all numeric extension characters. For more information on numeric extensions, see numericExtensionChar.

See Also:
isNumericExtensionChar(int), numericExtensionChar(int), numericExtensionChars(String)

whitespaceChar

public void whitespaceChar(int ch)
Sets the specified character to be a "white space" character, so that it delimits tokens and does not otherwise take part in their formation.

Parameters:
ch - character to be designated as whitespace
See Also:
isWhitespaceChar(int), whitespaceChars(int, int)

whitespaceChars

public void whitespaceChars(int low,
                            int high)
Sets all characters in the range low <= ch <= high to "whitespace" characters. See whitespaceChar for more information on whitespace characters.

Parameters:
low - lowest character to be designated as whitespace
high - highest character to be designated as whitespace
See Also:
isWhitespaceChar(int), whitespaceChar(int)

isWhitespaceChar

public final boolean isWhitespaceChar(int ch)
Returns true if the specified character is an whitespace character.

Parameters:
ch - character to be queried
Returns:
true if chch is an whitespace character
See Also:
whitespaceChar(int), whitespaceChars(int, int)

quoteChar

public void quoteChar(int ch)
Sets the specified character to be a "quote" character, so that it delimits a quoted string. When a quote character is encountered, a quoted string is formed consisting of all characters following the quote character, up to (but excluding) the next instance of that quote character, or an end-of-line, or the end of input. Usual C-style escape sequences are recognized and may be used to include the quote character within the string.

Parameters:
ch - character to be designated as a quote character
See Also:
isQuoteChar(int)

isQuoteChar

public final boolean isQuoteChar(int ch)
Returns true if the specified character is a quote character.

Parameters:
ch - character to be queried
Returns:
true if chch is a quote character
See Also:
quoteChar(int)

resetSyntax

public void resetSyntax()
Sets all characters to be ordinary characters, so that they are treated as individual tokens, as disabled number parsing.

See Also:
parseNumbers(boolean)

parseNumbers

public void parseNumbers(boolean enable)
Enables parsing of numbers by this tokenizer. If number parsing is enabled, then the following numeric tokens are recognized:

getParseNumbers

public boolean getParseNumbers()
Returns true if number parsing is enabled for this tokenizer.

Returns:
true if number parsing is enabled
See Also:
parseNumbers(boolean)

slashStarComments

public void slashStarComments(boolean enable)
Enables the handling of C-style slash-star comments, commenting out all characters, inclusing new lines, between /* and */.

Parameters:
enable - if true, enables C-style comments
See Also:
getSlashStarComments()

getSlashStarComments

public boolean getSlashStarComments()
Returns true if C-style slash-star comments are enabled.

Returns:
true if slash-star comments are enabled
See Also:
slashStarComments(boolean)

slashSlashComments

public void slashSlashComments(boolean enable)
Enables the handling of C++-style slash-slash comments, commenting out all characters between // and the next line.

Parameters:
enable - if true, enables slash-slash comments
See Also:
getSlashSlashComments()

getSlashSlashComments

public boolean getSlashSlashComments()
Returns true if C++-style slash-slash comments are enabled.

Returns:
true if slash-slash comments are enabled
See Also:
slashSlashComments(boolean)

commentChar

public void commentChar(int ch)
Sets the specified character to be a comment character. Occurance of a comment character causes all other characters between it and the next line to be discarded.

Parameters:
ch - character to be designated as a comment character.
See Also:
isCommentChar(int)

isCommentChar

public final boolean isCommentChar(int ch)
Returns true if the specified character is a comment character.

Parameters:
ch - character to be queried
Returns:
true if chch is a comment character
See Also:
commentChar(int)

toString

public java.lang.String toString()
Returns a string describing the type and value of the current token, as well as the current line number.

Overrides:
toString in class java.lang.Object
Returns:
string containing token and line information

nextToken

public int nextToken()
              throws java.io.IOException
Parses the next token from the input and returns its type. The token type is also placed in the field ttype. If the token is numeric, then the associated numeric value is placed in the field nval. If the token is a word or quoted string, then the associated string value is placed in the field sval.

Returns:
type of the token read
Throws:
java.io.IOException

setReader

public void setReader(java.io.Reader reader)
Sets the Reader which supplies the input for this tokenizer.

Parameters:
reader - new Reader

getReader

public java.io.Reader getReader()
Returns the Reader which supplies the input for this tokenizer.

Returns:
this tokenizer's reader

getResourceName

public java.lang.String getResourceName()
Returns the name of the resource (e.g., File or URL) associated with this ReaderTokenizer. This can be then be used to provide diagnostic information when an input error occurs. It is up to the application to set the resource name in advance using setResourceName(java.lang.String). If a resource name has not been set, this method returns null.

Returns:
resource name, if set

setResourceName

public void setResourceName(java.lang.String name)
Sets the name of the resource (e.g., File or URL) associated with this ReaderTokenizer.

Parameters:
name - name o

close

public void close()
Close the underlying reader for this tokenizer.


tokenName

public java.lang.String tokenName()
Returns a string identifying the current token.

Returns:
token name string

scanCharacter

public void scanCharacter(int ch)
                   throws java.io.IOException
Reads the next token and verifies that it matches a specified character. For this to be true, ttype must either equal the character directly, or must equal TT_WORD with sval containing a one-character string that matches the character.

Parameters:
ch - character to match
Throws:
java.io.IOException - if the character is not matched

scanToken

public void scanToken(int type)
               throws java.io.IOException
Reads the next token and verifies that it is of the specified type.

Parameters:
type - type of the expected token
Throws:
java.io.IOException - if the token is not of the expected type

scanNumber

public double scanNumber()
                  throws java.io.IOException
Reads the next token and checks that it is a number. If the token is a number, its numeric value is returned. Otherwise, an exception is thrown.

Returns:
numeric value of the next token
Throws:
java.io.IOException - if the token is not a number

scanInteger

public int scanInteger()
                throws java.io.IOException
Reads the next token and checks that it is an integer. If the token is an integer, its value is returned. Otherwise, an exception is thrown. If the value lies outside the allowed range for an integer, it is truncated in the high-order bits.

Returns:
integer value of the next token
Throws:
java.io.IOException - if the token is not an integer

scanLong

public long scanLong()
              throws java.io.IOException
Reads the next token and checks that it is an integer. If the token is an integer, its value is returned as a long. Otherwise, an exception is thrown. If the value lies outside the allowed range for a long, it is truncated in the high-order bits.

Returns:
long integer value of the next token
Throws:
java.io.IOException - if the token is not an integer

scanShort

public short scanShort()
                throws java.io.IOException
Reads the next token and checks that it is an integer. If the token is an integer, its value is returned as a short. Otherwise, an exception is thrown. If the value lies outside the allowed range for a short, it is truncated in the high-order bits.

Returns:
short integer value of the next token
Throws:
java.io.IOException - if the token is not an integer

scanBoolean

public boolean scanBoolean()
                    throws java.io.IOException
Reads the next token and checks that it represents a boolean. A token represents a boolean if it is a word token whose value equals (ignoring case) either true or false. If the token represents a boolean, its value is returned. Otherwise, an exception is thrown.

Returns:
boolean value of the next token
Throws:
java.io.IOException - if the token does not represent a boolean
See Also:
tokenIsBoolean()

scanQuotedString

public java.lang.String scanQuotedString(char quoteChar)
                                  throws java.io.IOException
Reads the next token and checks that it is a quoted string delimited by the specified quote character.

Parameters:
quoteChar - quote character that delimits the string
Returns:
value of the quoted string
Throws:
java.io.IOException - if the token does not represent a quoted string delimited by the specified character
See Also:
tokenIsQuotedString(char)

scanWordOrQuotedString

public java.lang.String scanWordOrQuotedString(char quoteChar)
                                        throws java.io.IOException
Reads the next token and checks that it is either a word or a quoted string delimited by the specified quote character.

Parameters:
quoteChar - quote character that delimits strings
Returns:
value of the word or quoted string
Throws:
java.io.IOException - if the token does not represent a word or quoted string delimited by the specified character
See Also:
tokenIsQuotedString(char)

scanWord

public java.lang.String scanWord()
                          throws java.io.IOException
Reads the next token and checks that it is a word.

Returns:
value of the word
Throws:
java.io.IOException - if the token does not represent a word

scanWord

public java.lang.String scanWord(java.lang.String word)
                          throws java.io.IOException
Reads the next token and checks that it is a specific word.

Parameters:
word - expected value of the word to be scanned
Returns:
value of the word
Throws:
java.io.IOException - if the token is not a word with the specified value

scanNumbers

public int scanNumbers(double[] vals,
                       int max)
                throws java.io.IOException
Reads a series of numeric tokens and returns their values. Reading halts when either a non-numeric token is encountered, or max numbers have been read. Note that this token will also be numeric if the input contains more than max consecutive numeric tokens.

Parameters:
vals - used to return numeric values
max - maximum number of numeric tokens to read
Returns:
number of numeric tokens actually read
Throws:
java.io.IOException

tokenIsNumber

public boolean tokenIsNumber()
Returns true if the current token is a number. This is a convenience routine for checking that ttype equals TT_NUMBER.

Returns:
true if the current token is a number.

tokenIsInteger

public boolean tokenIsInteger()
Returns true if the current token is an integer. (This will also imply that the token is a number.)

Returns:
true if the current token is an integer.
See Also:
scanInteger()

tokenIsWord

public boolean tokenIsWord()
Returns true if the current token is a word. This is a convenience routine for checking that ttype equals TT_WORD.

Returns:
true if the current token is a word.

tokenIsBoolean

public boolean tokenIsBoolean()
Returns true if the current token represents a boolean. A token represents a boolean if it is a word token equal (ignoring case) to either true or false.

Returns:
true if the current token is a boolean.
See Also:
scanBoolean()

tokenIsQuotedString

public boolean tokenIsQuotedString(char quoteChar)
Returns true if the current token is a quoted string delimited by the specified quote character.

Parameters:
quoteChar - quote character used to delimit the string
Returns:
true if the current token is a quoted string.

tokenIsWordOrQuotedString

public boolean tokenIsWordOrQuotedString(char quoteChar)
Returns true if the current token is either a word or a quoted string delimited by the specified quote character.

Parameters:
quoteChar - quote character used to delimit the string
Returns:
true if the current token is a word or a quoted string.

lastCommentLine

public java.lang.String lastCommentLine()
Returns the last comment line (excluding the trailing newline) that was read by this tokenizer, or null if no comments have been read yet.

Returns:
last read comment line