|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
ObjectTokenStream
Tokenizer
FastTokenizer
public class FastTokenizer
Like Lucene's StandardTokenizer, but handles the easy cases very quickly. Punts the hard cases to a real StandardTokenizer, but this is rare enough that the speed increase is very substantial. Does not currently support Chinese/Japanese/Korean, but adding this support would be pretty easy.
| Nested Class Summary | |
|---|---|
private class |
FastTokenizer.DribbleReader
This class is used, when the fast tokenizer encounters a questionable situation, to dribble out characters to a standard tokenizer that can do a more complete job. |
| Field Summary | |
|---|---|
private static char[] |
charType
|
private FastTokenizer.DribbleReader |
dribbleReader
Used to dribble out tokens to a standard tokenizer; used when we encounter a case that's hard to figure out. |
(package private) static char |
fakeChar
We use a special character to mark the end of a FastTokenizer.DribbleReader. |
(package private) static String |
fakeWord
This is the special word used by DribbleReader |
private int |
pos
Position within the source array |
private char[] |
source
Array of characters to read from |
private Tokenizer |
stdTokenizer
Standard tokenizer, used for hard cases only |
| Fields inherited from class Tokenizer |
|---|
input |
| Constructor Summary | |
|---|---|
FastTokenizer(FastStringReader reader)
Create a tokenizer that will tokenize the stream of characters from the given reader. |
|
| Method Summary | |
|---|---|
Token |
next()
Retrieve the next token in the stream, or null if there are no more. |
private static void |
setCharType(char type,
char from,
char to)
Utility method used when setting up the character type table |
| Methods inherited from class Tokenizer |
|---|
close |
| Methods inherited from class Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
private char[] source
private int pos
source array
static final char fakeChar
FastTokenizer.DribbleReader.
static final String fakeWord
private FastTokenizer.DribbleReader dribbleReader
private Tokenizer stdTokenizer
private static final char[] charType
| Constructor Detail |
|---|
public FastTokenizer(FastStringReader reader)
reader - Reader to get data from.| Method Detail |
|---|
private static void setCharType(char type,
char from,
char to)
public Token next()
throws IOException
next in class TokenStreamIOException
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||