|
|||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--java.util.AbstractCollection | +--java.util.AbstractSet | +--nz.util.TrieStringSet
This Collection class provides a Set interface implemented by a data structure called a Trie. As such, it is efficient for certain set operations, like contains(). Note that this implementation is not synchronized. If multiple threads access this set concurrently, and at least one of the threads modifies the set, you must supply external synchronization. Otherwise, the internal data structures almost certainly will be become corrupted.
Because the trie data structure is wholly dependent on the nature of strings, this Collection stores all objects as their String representation; in other words, you can put any kind of object you want to into this collection, but you can only get Strings out of it (insertion is done using the toString() operator.) The most efficient operation of this class can be obtained when you put char[] array objects into it. Note that when you put a char[] array into the set, you still get a String out.
The TrieStringSet Iterator does not support the remove operation, however it is safe to remove the item just returned by the iterator or any previous item. If you the next item to be iterated over, then the iterator will return it anyway. See the description of the clear() method for more remarks about mixing operations.
Note that the Trie is naturally a sorted data structure, although it always sorts in numerical order of its character values, which is almost always not the same as dictionary order. That is why this class implements Set rather than SortedSet. However, you can count on this numerical-sort behavior. The iterator will ALWAYS deliver results in this order, shorter strings first, etc.
TrieStringSet is fully serializable, although the internal structure of the set practically guarantees that the serialized output will be quite large, and that reading the results will be slow. It is almost always faster to get all the strings out with toArray(), serialize that to a file, and later load the strings back into a TrieStringSet.
While the Trie data structure is slower than a good tree or hash table for just about everything, and uses more memory, it can do some operations very quickly. For example, in heavily query-oriented applications, the TrieStringSet outperforms the Java TreeSet and approaches the performance of the Java HashSet; it can even exceed the performance of HashSet in applications where char[] arrays are used instead of Strings. Also, the trie can do some useful string set operations efficiently that the other structures cannot do at all: prefix checking. In this class, the operation to check whether the given string is a prefix of any string in the set is implemented by the containsPrefix() method.
Note that there are a lot more things that trie data structures can do well, under the right circumstances. In this particular case, the TrieStringSet is kept slow by the requirement to handle unicode characters in a reasonable way.
The guts of a trie use an array to hold things. Since the Java character set is unicode, and thus quite large, a simple sparse array implementation is used for this purpose. Note that the current implementation will be more wasteful for strings composed mostly of characters in upper unicode pages. Of course, it is pretty darn wasteful now, actually.
Set
,
nom.ziring.util.PagedArray
, Serialized FormConstructor Summary | |
TrieStringSet()
Create a new, empty TrieStringSet. |
|
TrieStringSet(java.lang.String[] list)
Create a new TrieStringSet that contains all the elements from an array. |
Method Summary | |
boolean |
add(java.lang.Object o)
Add a string or char array to this TrieStringSet. |
protected boolean |
addInternal(nz.util.TrieStringSet.TSSPagedArray p,
char[] a,
int c,
int maxC)
|
protected boolean |
addInternal(nz.util.TrieStringSet.TSSPagedArray p,
java.lang.String s,
int c,
int maxC)
|
void |
clear()
Clear out this TrieStringSet. |
boolean |
contains(char[] a,
int n)
Check whether this TrieStringSet contains a particular element. |
boolean |
contains(java.lang.Object o)
Check whether this TrieStringSet contains a particular element. |
boolean |
containsPrefix(java.lang.Object o)
Check whether the supplied string or char array is in the set, or is a prefix of any string in the set. |
protected nz.util.TrieStringSet.TSSPagedArray |
internalContains(nz.util.TrieStringSet.TSSPagedArray p,
char[] s,
int c,
int maxC)
|
protected nz.util.TrieStringSet.TSSPagedArray |
internalContains(nz.util.TrieStringSet.TSSPagedArray p,
java.lang.String s,
int c,
int maxC)
|
java.util.Iterator |
iterator()
Create and return a new Iterator for this TrieStringSet. |
boolean |
remove(java.lang.Object o)
Remove an element from the TrieStringSet. |
int |
size()
Return the number of (unique) Strings in this TrieStringSet. |
java.lang.Object[] |
toArray(java.lang.Object[] a)
The toArray(Object []a) method is not supported. |
Methods inherited from class java.util.AbstractSet |
equals, hashCode, removeAll |
Methods inherited from class java.util.AbstractCollection |
addAll, containsAll, isEmpty, retainAll, toArray, toString |
Methods inherited from class java.lang.Object |
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Methods inherited from interface java.util.Set |
addAll, containsAll, isEmpty, retainAll, toArray |
Constructor Detail |
public TrieStringSet()
public TrieStringSet(java.lang.String[] list)
list
- An array of String valuesMethod Detail |
public int size()
size
in class java.util.AbstractCollection
public boolean add(java.lang.Object o)
add
in class java.util.AbstractCollection
o
- String object to be added to the set.protected boolean addInternal(nz.util.TrieStringSet.TSSPagedArray p, java.lang.String s, int c, int maxC)
protected boolean addInternal(nz.util.TrieStringSet.TSSPagedArray p, char[] a, int c, int maxC)
public void clear()
clear
in class java.util.AbstractCollection
public boolean contains(java.lang.Object o)
contains
in class java.util.AbstractCollection
o
- String or char[] object that might be in this setpublic boolean contains(char[] a, int n)
a
- A char arrayn
- How many elements of the char array to use, positivepublic boolean containsPrefix(java.lang.Object o)
Example: if the set contained the strings "keyboard", "mouse", and "monitor", then containsPrefix("key") would return true.
o
- String or char[] prefix to look for in the setprotected nz.util.TrieStringSet.TSSPagedArray internalContains(nz.util.TrieStringSet.TSSPagedArray p, java.lang.String s, int c, int maxC)
protected nz.util.TrieStringSet.TSSPagedArray internalContains(nz.util.TrieStringSet.TSSPagedArray p, char[] s, int c, int maxC)
public java.lang.Object[] toArray(java.lang.Object[] a)
toArray
in class java.util.AbstractCollection
ArrayStoreException
- alwayspublic boolean remove(java.lang.Object o)
remove
in class java.util.AbstractCollection
o
- An object purportedly in this setpublic java.util.Iterator iterator()
iterator
in class java.util.AbstractCollection
|
|||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |