DLESE Tools
v1.6.0

org.dlese.dpc.index
Class SimpleLuceneIndex

java.lang.Object
  extended by org.dlese.dpc.index.SimpleLuceneIndex

public final class SimpleLuceneIndex
extends Object

A simple API for searching, reading and writing Lucene indexes.

Author:
John Weatherley, Dave Deniman
See Also:
ResultDoc, DocReader

Field Summary
static boolean BLOCK
          Indicates update operations will be blocked until the current one returns.
static int DEFAULT_AND
          Use to set the boolean search operator to AND.
static int DEFAULT_OR
          Use to set the boolean search operator to OR.
static boolean NO_BLOCK
          Indicates update operations will be allowed while others are still in progress.
 
Constructor Summary
SimpleLuceneIndex(String indexDirPath)
          Initializes or creates an index at the given location using a default search field named "default" and a StandardAnalyzer for index searching and creation.
SimpleLuceneIndex(String indexDirPath, org.apache.lucene.analysis.Analyzer analyzer)
          Initializes or creates an index at the given location using a default search field named "default" and the given Analyzer.
SimpleLuceneIndex(String indexDirPath, String defaultField, org.apache.lucene.analysis.Analyzer analyzer)
          Initializes or creates an index at the given location using the default search field, additional stop words and analyzer indicated.
 
Method Summary
 boolean addDoc(org.apache.lucene.document.Document doc)
          Adds a Document to the index.
 boolean addDoc(org.apache.lucene.document.Document doc, boolean block)
          Adds a Document to the index.
 boolean addDocs(org.apache.lucene.document.Document[] docs)
          Adds a group of Documents to the index.
 boolean addDocs(org.apache.lucene.document.Document[] docs, boolean block)
          Adds a group of Documents to the index.
 void close()
          Closes the writers and performs clean-up
 void deleteAndReinititlize()
          Deletes the index and re-initializes a new, empty one in its place.
 void doWithDocument(Callback cal, String field, String term)
          Calls the callback function of cal for each document matching the term in the given field
 void doWithDocument(Callback cal, String field, String[] terms)
          Calls the callback function of cal for each document matching the terms in the given field
static String encodeToTerm(String s)
          Encodes a String to an appropriate format that can be indexed as a single term using a StandardAnalyzer.
static String encodeToTerm(String s, boolean encodeWildCards)
          Encodes a String to an appropriate format that can be indexed as a single term using a StandardAnalyzer.
static String encodeToTerm(String s, boolean encodeWildCards, boolean encodeSpace)
          Encodes a String to an appropriate format that can be indexed as a single term or terms using a StandardAnalyzer.
static String escape(String term)
          Escapes all Lucene QueryParser reserved characters with a preceeding \.
static String escape(String term, String preserveChars)
          Escapes the Lucene QueryParser reserved characters with a preceeding \ except those included in preserveChars.
protected  void finalize()
          Override finalize to ensure resources are released...
 org.apache.lucene.analysis.Analyzer getAnalyzer()
          Gets the analyzer that has been configured for this index.
 Object getAttribute(String key)
          Gets an attribute from this SimpleLuceneIndex.
static String getDateStamp()
          Gets a datestamp of the current time formatted for display with logs and output.
 String getDefaultSearchField()
          Gets the name of the field that is searched by default if no field is indicated.
 org.apache.lucene.document.Document getDocument(int n)
          Gets the nth document in the index.
 List getFields()
          Gets a list of all fields in the index listed alphabetically.
 String getIndexLocation()
          Gets the ablsolute path to the directory where the index resides.
 long getLastModifiedCount()
          Gets the version number of the last time the index was modified by adding, deleting or changing a document.
 org.apache.lucene.queryParser.QueryParser.Operator getLuceneOperator()
          Gets the Lucene boolean operator that is currently being used for searches.
static org.apache.lucene.util.Version getLuceneVersion()
          Gets /the version of Lucene.
 int getNumDocs()
          Gets the total number of documents in the index.
 int getNumDocs(org.apache.lucene.search.Query query)
          Gets the number of documents that match the given query.
 int getNumDocs(String query)
          Gets the number of documents that match the given query.
 int getOperator()
          Gets the boolean operator that is currently being used for searches.
 String getOperatorString()
          Gets the boolean operator that is currently being used for searches as a String (AND or OR).
 org.apache.lucene.queryParser.QueryParser getQueryParser()
          Gets a new instance of the QueryParser used by this SimpleLuceneIndex that uses it's Analyzers, defaultField and boolean operator settings.
 org.apache.lucene.queryParser.QueryParser getQueryParser(String defaultSearchField)
          Gets a new instance of the QueryParser used by this SimpleLuceneIndex that uses it's Analyzers and boolean operator settings, allowing one to specify the default search field.
 org.apache.lucene.index.IndexReader getReader()
          Gets the IndexReader.
 Map getTermAndDocCounts(String[] fields)
          Gets a Map of all terms that are in the index under the given fields.
 Map getTermCounts()
          Gets a Map of all terms that are in the index.
 Map getTermCounts(String field)
          Gets a Map of all terms that are in the index under the given field.
 Map getTermCounts(String[] fields)
          Gets a Map of all terms that are in the index under the given fields.
 int getTermFrequency(String term)
          Gets the termFrequency across all fields in the index
 int getTermFrequency(String field, String term)
          Gets the termFrequency of terms in the given field.
 Map getTermLists()
          Gets a Map of Lists that contain the terms for each field in the index.
 List getTerms(String field)
          Gets a list of all terms that are in the index under the given field name.
 boolean isIndexing()
          Indicates whether the index is currently being updated or modified.
 List listDocs()
          Gets a list of all Documents in the index.
 List listDocs(String field, String term)
          Gets a list of all Documents in the index that match the given term in the given field.
 List listDocs(String field, String[] terms)
          Gets a list of all Documents in the index that match the given terms in the given field.
 List listTerms()
          Gets a list of all terms in the index.
 boolean removeDocs(String field, String value)
          Removes all Documents that match the given term within the given field.
 boolean removeDocs(String field, String[] values)
          Removes all documents that match the given terms within the given field.
 boolean removeDocs(String field, String value, boolean block)
          See removeDocs(String,String) for description.
 ResultDocList searchDocs(org.apache.lucene.search.Query query)
          Performs a search over the index using the qiven Query using the pre-defined default field, returning an ordered array of matching ranked results.
 ResultDocList searchDocs(org.apache.lucene.search.Query query, org.apache.lucene.search.Filter filter)
          Performs a search over the index using the qiven Query and Filter using the pre-defined default field, returning an ordered array of matching ranked results.
 ResultDocList searchDocs(org.apache.lucene.search.Query query, org.apache.lucene.search.Filter filter, org.apache.lucene.search.Sort sortBy, HashMap docReaderAttributes)
          Performs a search over the index using the qiven Query Object and Filter using the pre-defined default field, returning an ordered array of matching ranked results.
 ResultDocList searchDocs(org.apache.lucene.search.Query query, HashMap docReaderAttributes)
          Performs a search over the index using the Query object, returning an ordered array of matching ranked results.
 ResultDocList searchDocs(String query)
          Performs a search over the index using the qiven query String, returning an ordered array of matching ranked results.
 ResultDocList searchDocs(String query, org.apache.lucene.analysis.Analyzer analyzer)
          Performs a search over the index using the qiven query String and Analyzer, returning an ordered array of matching ranked results.
 ResultDocList searchDocs(String query, org.apache.lucene.search.Filter filter)
          Performs a search over the index using the qiven query String and Filter using the pre-defined default field, returning an ordered array of matching ranked results.
 ResultDocList searchDocs(String query, org.apache.lucene.search.Filter filter, HashMap docReaderAttributes)
          Performs a search over the index using the qiven query String and Filter using the pre-defined default field, returning an ordered array of matching ranked results.
 ResultDocList searchDocs(String query, org.apache.lucene.search.Filter filter, org.apache.lucene.search.Sort sortBy, HashMap docReaderAttributes, org.apache.lucene.analysis.Analyzer analyzer)
          Performs a search over the index using the qiven query String and Filter using the pre-defined default field, returning an ordered array of matching ranked results.
 ResultDocList searchDocs(String query, HashMap docReaderAttributes)
          Performs a search over the index using the qiven query String, returning an ordered array of matching ranked results.
 ResultDocList searchDocs(String query, HashMap docReaderAttributes, org.apache.lucene.analysis.Analyzer analyzer)
          Performs a search over the index using the qiven query String, returning an ordered array of matching ranked results.
 ResultDocList searchDocs(String query, org.apache.lucene.search.Sort sortBy)
          Performs a search over the index using the qiven query String, returning an ordered array of matching ranked results.
 ResultDocList searchDocs(String query, String defaultField)
          Performs a search over the index using the qiven query String, returning an ordered array of matching ranked results.
 ResultDocList searchDocs(String query, String defaultField, org.apache.lucene.search.Filter filter, org.apache.lucene.search.Sort sortBy)
          Performs a search over the index using the qiven query String, default field and Filter, returning an ordered array of matching ranked results.
 void setAttribute(String key, Object attribute)
          Sets an attribute that will be available for access in search results by calling DocReader.getAttribute(String) or ResultDoc.getAttribute(String).
static void setDebug(boolean db)
          Sets the debug attribute of the SimpleLuceneIndex object
 void setOperator(int operator)
          Sets the boolean operator used during searches.
 void stopIndexing()
          Instructs the indexer to stop processing updates.
 boolean update(String deleteField, ArrayList deleteValues, ArrayList addDocs)
          Updates the index by first deleting the documents that match the value(s) indicated in deleteValues in the field deleteField, then adding the documents in addDocs.
 boolean update(String deleteField, ArrayList deleteValues, ArrayList addDocs, boolean block)
          Updates the index by first deleting the documents that match the value(s) indicated in deleteValues in the field deleteField, then adding the documents in addDocs.
 boolean update(String deleteField, String[] deleteValues, org.apache.lucene.document.Document[] addDocs)
          Updates the index by first deleting the documents that match the value(s) indicated in deleteValues in the field deleteField, then adding the documents in addDocs.
 boolean update(String deleteField, String[] deleteValues, org.apache.lucene.document.Document[] addDocs, boolean block)
          Updates the index by first deleting the documents that match the value(s) indicated in deleteValues in the field deleteField, then adding the documents in addDocs.
 boolean update(String deleteField, String deleteValue, ArrayList addDocs, boolean block)
          See update(String, String[], Document[], boolean) for description.
 boolean update(String deleteField, String deleteValue, org.apache.lucene.document.Document[] addDocs, boolean block)
          See update(String, String[], Document[], boolean) for description.
 boolean update(String deleteField, String deleteValue, org.apache.lucene.document.Document addDoc, boolean block)
          See update(String, String[], Document[], boolean) for description.
 
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

BLOCK

public static final boolean BLOCK
Indicates update operations will be blocked until the current one returns. When this is passed into a method, the method will not return until the update operation has completed.

See Also:
Constant Field Values

NO_BLOCK

public static final boolean NO_BLOCK
Indicates update operations will be allowed while others are still in progress. When this is passed into a method, the method will return immediately rather than waiting for the update operation to complete.

See Also:
Constant Field Values

DEFAULT_OR

public static final int DEFAULT_OR
Use to set the boolean search operator to OR.

See Also:
setOperator(int operator), getOperator(), Constant Field Values

DEFAULT_AND

public static final int DEFAULT_AND
Use to set the boolean search operator to AND.

See Also:
setOperator(int operator), getOperator(), Constant Field Values
Constructor Detail

SimpleLuceneIndex

public SimpleLuceneIndex(String indexDirPath)
Initializes or creates an index at the given location using a default search field named "default" and a StandardAnalyzer for index searching and creation.

Parameters:
indexDirPath - The directory where the index is located or will be created.

SimpleLuceneIndex

public SimpleLuceneIndex(String indexDirPath,
                         org.apache.lucene.analysis.Analyzer analyzer)
Initializes or creates an index at the given location using a default search field named "default" and the given Analyzer.

Parameters:
indexDirPath - The directory where the index is located or will be created.
analyzer - The default Analyzer to use for searching and index creation

SimpleLuceneIndex

public SimpleLuceneIndex(String indexDirPath,
                         String defaultField,
                         org.apache.lucene.analysis.Analyzer analyzer)
Initializes or creates an index at the given location using the default search field, additional stop words and analyzer indicated.

Parameters:
indexDirPath - The directory where the index is located or will be created.
defaultField - The name of the field used for default searching, for example "default".
analyzer - The default Analyzer to use for searching and index creation
Method Detail

deleteAndReinititlize

public void deleteAndReinititlize()
Deletes the index and re-initializes a new, empty one in its place.


setAttribute

public void setAttribute(String key,
                         Object attribute)
Sets an attribute that will be available for access in search results by calling DocReader.getAttribute(String) or ResultDoc.getAttribute(String).

Parameters:
key - The key used to reference the attribute.
attribute - Any Java Object.
See Also:
ResultDoc.getAttribute(String), DocReader.getAttribute(String)

getAttribute

public Object getAttribute(String key)
Gets an attribute from this SimpleLuceneIndex. Note that these attributes are available for access in search results by calling DocReader.getAttribute(String) or ResultDoc.getAttribute(String). The key 'thisIndex' returns this index.

Parameters:
key - The key used to reference the attribute.
Returns:
The Java Object that is stored under the given key or null if none exists.
See Also:
ResultDoc.getAttribute(String), DocReader.getAttribute(String)

getIndexLocation

public String getIndexLocation()
Gets the ablsolute path to the directory where the index resides.

Returns:
The absolue path to the index.

searchDocs

public ResultDocList searchDocs(String query)
Performs a search over the index using the qiven query String, returning an ordered array of matching ranked results.

Parameters:
query - The query to perform over the index.
Returns:
An ordered array of ranked results matching the given query.
See Also:
setOperator(int operator), getOperator()

searchDocs

public ResultDocList searchDocs(String query,
                                org.apache.lucene.search.Sort sortBy)
Performs a search over the index using the qiven query String, returning an ordered array of matching ranked results.

Parameters:
query - The query to perform over the index.
sortBy - A Sort to apply to the results or null to use relevancy ranking
Returns:
An ordered array of ranked results matching the given query.
See Also:
setOperator(int operator), getOperator()

searchDocs

public ResultDocList searchDocs(String query,
                                org.apache.lucene.analysis.Analyzer analyzer)
Performs a search over the index using the qiven query String and Analyzer, returning an ordered array of matching ranked results.

Parameters:
query - The query to perform over the index.
analyzer - The Analyzer to use to determine the tokens in the query.
Returns:
An ordered array of ranked results matching the given query.
See Also:
setOperator(int operator), getOperator()

searchDocs

public ResultDocList searchDocs(String query,
                                HashMap docReaderAttributes)
Performs a search over the index using the qiven query String, returning an ordered array of matching ranked results.

Parameters:
query - The query to perform over the index.
docReaderAttributes - Attributes that are included for use in DocReaders via the ResultDocConfig.
Returns:
An ordered array of ranked results matching the given query.
See Also:
setOperator(int operator), getOperator()

searchDocs

public ResultDocList searchDocs(org.apache.lucene.search.Query query,
                                HashMap docReaderAttributes)
Performs a search over the index using the Query object, returning an ordered array of matching ranked results.

Parameters:
query - The Query to search over the index.
docReaderAttributes - Attributes that are included for use in DocReaders via the ResultDocConfig.
Returns:
An ordered array of ranked results matching the given query.
See Also:
setOperator(int operator), getOperator()

searchDocs

public ResultDocList searchDocs(String query,
                                HashMap docReaderAttributes,
                                org.apache.lucene.analysis.Analyzer analyzer)
Performs a search over the index using the qiven query String, returning an ordered array of matching ranked results.

Parameters:
query - The query to perform over the index.
docReaderAttributes - Attributes that are included for use in DocReaders via the ResultDocConfig.
analyzer - The analyzer to use, or null to use the default
Returns:
An ordered array of ranked results matching the given query.
See Also:
setOperator(int operator), getOperator()

searchDocs

public ResultDocList searchDocs(String query,
                                String defaultField)
Performs a search over the index using the qiven query String, returning an ordered array of matching ranked results.

Parameters:
query - The query to perform over the index.
defaultField - The default field to search in.
Returns:
An ordered array of ranked results matching the given query.
See Also:
setOperator(int operator), getOperator()

searchDocs

public ResultDocList searchDocs(String query,
                                String defaultField,
                                org.apache.lucene.search.Filter filter,
                                org.apache.lucene.search.Sort sortBy)
Performs a search over the index using the qiven query String, default field and Filter, returning an ordered array of matching ranked results.

Parameters:
query - The query to perform over the index.
defaultField - The default field to search in, or null to use the pre-defined default field.
filter - A filter used for the search.
sortBy - A Sort to apply to the results or null to use relevancy ranking
Returns:
An ordered array of ranked results matching the given query.
See Also:
setOperator(int operator), getOperator()

searchDocs

public ResultDocList searchDocs(String query,
                                org.apache.lucene.search.Filter filter)
Performs a search over the index using the qiven query String and Filter using the pre-defined default field, returning an ordered array of matching ranked results.

Parameters:
query - The query to perform over the index.
filter - A filter used for the search.
Returns:
An ordered array of ranked results matching the given query.
See Also:
setOperator(int operator), getOperator()

searchDocs

public ResultDocList searchDocs(org.apache.lucene.search.Query query,
                                org.apache.lucene.search.Filter filter)
Performs a search over the index using the qiven Query and Filter using the pre-defined default field, returning an ordered array of matching ranked results.

Parameters:
query - The Query to perform over the index.
filter - A filter used for the search.
Returns:
An ordered array of ranked results matching the given query.
See Also:
setOperator(int operator), getOperator()

searchDocs

public ResultDocList searchDocs(org.apache.lucene.search.Query query)
Performs a search over the index using the qiven Query using the pre-defined default field, returning an ordered array of matching ranked results.

Parameters:
query - The Query to perform over the index.
Returns:
An ordered array of ranked results matching the given query.
See Also:
setOperator(int operator), getOperator()

searchDocs

public ResultDocList searchDocs(String query,
                                org.apache.lucene.search.Filter filter,
                                HashMap docReaderAttributes)
Performs a search over the index using the qiven query String and Filter using the pre-defined default field, returning an ordered array of matching ranked results.

Parameters:
query - The query to perform over the index.
filter - A filter used for the search.
docReaderAttributes - Attributes that are included for use in DocReaders via the ResultDocConfig.
Returns:
An ordered array of ranked results matching the given query.
See Also:
setOperator(int operator), getOperator()

searchDocs

public ResultDocList searchDocs(org.apache.lucene.search.Query query,
                                org.apache.lucene.search.Filter filter,
                                org.apache.lucene.search.Sort sortBy,
                                HashMap docReaderAttributes)
Performs a search over the index using the qiven Query Object and Filter using the pre-defined default field, returning an ordered array of matching ranked results.

Parameters:
query - The Query to perform over the index.
filter - A filter used for the search.
docReaderAttributes - Attributes that are included for use in DocReaders via the ResultDocConfig.
sortBy - A Sort to apply to the results or null to use relevancy ranking
Returns:
An ordered array of ranked results matching the given query.
See Also:
setOperator(int operator), getOperator()

searchDocs

public ResultDocList searchDocs(String query,
                                org.apache.lucene.search.Filter filter,
                                org.apache.lucene.search.Sort sortBy,
                                HashMap docReaderAttributes,
                                org.apache.lucene.analysis.Analyzer analyzer)
Performs a search over the index using the qiven query String and Filter using the pre-defined default field, returning an ordered array of matching ranked results.

Parameters:
query - The query to perform over the index.
filter - A Filter used for the search or null for none.
sortBy - A Sort to apply to the results or null to use relevancy ranking
docReaderAttributes - Attributes that are included for use in DocReaders via the ResultDocConfig or null for none
analyzer - The analyzer to use, or null to use the default
Returns:
An ordered array of ranked results matching the given query.
See Also:
setOperator(int operator), getOperator()

setOperator

public void setOperator(int operator)
Sets the boolean operator used during searches. Once set, the given boolean operator will be used for all subsequent searches. If this method is never called the boolean operator defaults to OR.

Parameters:
operator - The new boolean operator value.
See Also:
DEFAULT_OR, DEFAULT_AND

getOperator

public int getOperator()
Gets the boolean operator that is currently being used for searches.

Returns:
The boolean operator value.
See Also:
DEFAULT_OR, DEFAULT_AND

getLuceneOperator

public org.apache.lucene.queryParser.QueryParser.Operator getLuceneOperator()
Gets the Lucene boolean operator that is currently being used for searches.

Returns:
The boolean operator value.

getQueryParser

public final org.apache.lucene.queryParser.QueryParser getQueryParser()
Gets a new instance of the QueryParser used by this SimpleLuceneIndex that uses it's Analyzers, defaultField and boolean operator settings.

Returns:
The QueryParser used by this SimpleLuceneIndex

getQueryParser

public final org.apache.lucene.queryParser.QueryParser getQueryParser(String defaultSearchField)
Gets a new instance of the QueryParser used by this SimpleLuceneIndex that uses it's Analyzers and boolean operator settings, allowing one to specify the default search field.

Parameters:
defaultSearchField - The search field used as default when none is specified in the query
Returns:
The QueryParser used by this SimpleLuceneIndex with the given default search field

getOperatorString

public String getOperatorString()
Gets the boolean operator that is currently being used for searches as a String (AND or OR).

Returns:
The boolean operator value as a String (AND or OR).
See Also:
DEFAULT_OR, DEFAULT_AND

getDefaultSearchField

public String getDefaultSearchField()
Gets the name of the field that is searched by default if no field is indicated.

Returns:
The defaultSearchFirld value

getReader

public org.apache.lucene.index.IndexReader getReader()
Gets the IndexReader.

Returns:
The reader value

getNumDocs

public int getNumDocs(String query)
Gets the number of documents that match the given query.

Parameters:
query - The query to perform over the index.
Returns:
The number of matching documents.

getNumDocs

public int getNumDocs(org.apache.lucene.search.Query query)
Gets the number of documents that match the given query.

Parameters:
query - The query to perform over the index.
Returns:
The number of matching documents.

getNumDocs

public int getNumDocs()
Gets the total number of documents in the index.

Returns:
The number of documents in the index.

listDocs

public List listDocs()
Gets a list of all Documents in the index. Note: This method loads all Documents and requires a large amount of memory for large result sets (consider using search instead).

Returns:
A list of all documents in the index.

listDocs

public List listDocs(String field,
                     String term)
Gets a list of all Documents in the index that match the given term in the given field. Note: This method loads all Documents and requires a large amount of memory for large result sets (consider using search instead).

Parameters:
field - The field searched.
term - The term to match.
Returns:
A list of matching documents.

listDocs

public List listDocs(String field,
                     String[] terms)
Gets a list of all Documents in the index that match the given terms in the given field. Note: This method loads all Documents and requires a large amount of memory for large result sets (consider using search instead).

Parameters:
field - The field searched.
terms - The terms to match.
Returns:
A list of matching documents.

doWithDocument

public void doWithDocument(Callback cal,
                           String field,
                           String[] terms)
Calls the callback function of cal for each document matching the terms in the given field

Parameters:
cal -
field -
terms -

doWithDocument

public void doWithDocument(Callback cal,
                           String field,
                           String term)
Calls the callback function of cal for each document matching the term in the given field

Parameters:
cal -
field -
term -

listTerms

public List listTerms()
Gets a list of all terms in the index.

Returns:
A list of all terms in the index.

getFields

public List getFields()
Gets a list of all fields in the index listed alphabetically. Depending on the state of the index, the list may contain fileds that are empty, meaning all terms for the given field have been deleted and there are no possible matching queries within the field.

Returns:
A list of all fields in the index.

getTermLists

public Map getTermLists()
Gets a Map of Lists that contain the terms for each field in the index. The keys in the Map are Strings that represent all fields in the index. The List that is returned for each key contains all terms that are in the index for the given field.

Returns:
A Map of term Lists keyed by field Strings.

getTerms

public List getTerms(String field)
Gets a list of all terms that are in the index under the given field name. Implementation note: this method is not efficient. If you need to use this method frequently, consider caching the results and using getLastModifiedCount() to determe when to update the cache.

Parameters:
field - The indexed field name.
Returns:
List of terms in the index under the given field.

getTermCounts

public Map getTermCounts(String field)
Gets a Map of all terms that are in the index under the given field. The keys in the map are Strings that list the terms. The values in the Map are Integers that hold the total count of the terms in the given field across all documents.

Implementation note: this method is not efficient. If you need to use this method frequently, consider caching the results and using getLastModifiedCount() to determe when to update the cache.

Parameters:
field - The indexed field name.
Returns:
Map containing terms/counts for all terms in the index under the given field.

getTermCounts

public Map getTermCounts()
Gets a Map of all terms that are in the index. The keys in the map are Strings that list the terms. The values in the Map are Integers that hold the total count of the terms across all documents.

Implementation note: this method is not efficient. If you need to use this method frequently, consider caching the results and using getLastModifiedCount() to determe when to update the cache.

Returns:
Map containing terms/counts for all terms in the index under the given field.

getTermCounts

public final Map getTermCounts(String[] fields)
Gets a Map of all terms that are in the index under the given fields. The keys in the map are Strings that list the terms. The values in the Map are Integers that hold the total count of the terms in the given fields across all documents.

Implementation note: this method is not efficient. If you need to use this method frequently, consider caching the results and using getLastModifiedCount() to determe when to update the cache.

Parameters:
fields - The indexed field names.
Returns:
Map containing terms/counts for all terms in the index under the given fields.

getTermAndDocCounts

public final Map getTermAndDocCounts(String[] fields)
Gets a Map of all terms that are in the index under the given fields. The keys in the map are Strings that list the terms. The values in the Map are TermDocCount Objects, which contain the term count, the total number of documents containing the term in one or more of the given field(s), and a list of fields in which the term appears.

Implementation note: this method is not efficient. If you need to use this method frequently, consider caching the results and using getLastModifiedCount() to determe when to update the cache. Also, this method is considerably slower when more than one field is requested. This is because an extry query is required for each term that is found.

Parameters:
fields - The indexed field names.
Returns:
Map containing a TermDocCount Object for all terms in the index under the given fields.
See Also:
TermDocCount

getTermFrequency

public int getTermFrequency(String term)
Gets the termFrequency across all fields in the index

Parameters:
term - The term.
Returns:
The termFrequency value.

getTermFrequency

public int getTermFrequency(String field,
                            String term)
Gets the termFrequency of terms in the given field.

Parameters:
field - The field.
term - The term.
Returns:
The termFrequency.

addDoc

public boolean addDoc(org.apache.lucene.document.Document doc)
Adds a Document to the index. Blocks all other update operations until complete.

Parameters:
doc - The Document to add.
Returns:
True if successful.

addDoc

public boolean addDoc(org.apache.lucene.document.Document doc,
                      boolean block)
Adds a Document to the index.

Parameters:
doc - The Document to add.
block - Indicates whether to block other updates until complete.
Returns:
True if successful.

addDocs

public boolean addDocs(org.apache.lucene.document.Document[] docs)
Adds a group of Documents to the index. Blocks all other update operations until complete.

Parameters:
docs - The Documents to add.
Returns:
True if successful.

addDocs

public boolean addDocs(org.apache.lucene.document.Document[] docs,
                       boolean block)
Adds a group of Documents to the index. Blocks all other update operations until complete.

Parameters:
docs - The Documents to add.
block - Indicates whether to block other updates until complete.
Returns:
True if successful.

removeDocs

public boolean removeDocs(String field,
                          String value)
Removes all Documents that match the given term within the given field. This is useful for removing a single document that is indexed with a unique ID field, or to remove a group of documents mathcing the same term for a given field. Blocks all other index update operations until this is complete.

Parameters:
field - The field that is searched.
value - The term that is matched for deletes.
Returns:
True if the delete was successful.

removeDocs

public boolean removeDocs(String field,
                          String value,
                          boolean block)
See removeDocs(String,String) for description. Adds the ability to control whether blocking occurs during the update.

Parameters:
field - The field that is searched.
value - The term that is matched for deletes.
block - Indicates whether or not to block other update operations.
Returns:
True if the delete was successful.

removeDocs

public boolean removeDocs(String field,
                          String[] values)
Removes all documents that match the given terms within the given field. This is useful for removing all individual documents that are indexed with a unique ID field. Blocks all other index update operations until this is complete.

Parameters:
field - The field that is searched.
values - The terms that are matched for deletes.
Returns:
True if the delete was successful.

update

public boolean update(String deleteField,
                      String[] deleteValues,
                      org.apache.lucene.document.Document[] addDocs,
                      boolean block)
Updates the index by first deleting the documents that match the value(s) indicated in deleteValues in the field deleteField, then adding the documents in addDocs. Assuming the deleteField contains a unique ID for the Document, the Document may be removed by indicating the ID in the deleteValues list. To replace an entry in the index for a single item, supply the item's ID in the deleteValues list and supply the new Document for the item in the addDocs list.

Parameters:
deleteField - The field searched for deleteValues.
deleteValues - The value matched in deleteField to indicate which document(s) to delete.
addDocs - An array of Documents to add to the index
block - Indicates whether or not to block other threads or JVMs from read/write from the index during the delete/add operation.
Returns:
True if no errors, otherwise false.

update

public boolean update(String deleteField,
                      String[] deleteValues,
                      org.apache.lucene.document.Document[] addDocs)
Updates the index by first deleting the documents that match the value(s) indicated in deleteValues in the field deleteField, then adding the documents in addDocs. See update(String, String[], Document[], boolean) for description. Performs an update with blocking on.

Parameters:
deleteField - The field searched for deleteValues.
deleteValues - Array of Strings containing the value matched in deleteField to indicate which document(s) to delete
addDocs - Array containing Documents to add to the index
Returns:
True if no errors, otherwise false.

update

public boolean update(String deleteField,
                      String deleteValue,
                      org.apache.lucene.document.Document[] addDocs,
                      boolean block)
See update(String, String[], Document[], boolean) for description.

Parameters:
deleteField - The field searched for deleteValue.
deleteValue - Matching docs are deleted.
addDocs - These Docs are added to the index
block - Block or run in background.
Returns:
True if no errors.

update

public boolean update(String deleteField,
                      String deleteValue,
                      org.apache.lucene.document.Document addDoc,
                      boolean block)
See update(String, String[], Document[], boolean) for description.

Parameters:
deleteField - The field searched for deleteValue.
deleteValue - Matching docs are deleted.
addDoc - The Doc to be added to the index
block - Block or run in background.
Returns:
True if no errors.

update

public boolean update(String deleteField,
                      String deleteValue,
                      ArrayList addDocs,
                      boolean block)
See update(String, String[], Document[], boolean) for description.

Parameters:
deleteField - The field searched for deleteValue.
deleteValue - Matching docs are deleted.
addDocs - These Docs are added to the index
block - Block or run in background.
Returns:
True if no errors.

update

public boolean update(String deleteField,
                      ArrayList deleteValues,
                      ArrayList addDocs,
                      boolean block)
Updates the index by first deleting the documents that match the value(s) indicated in deleteValues in the field deleteField, then adding the documents in addDocs. See update(String, String[], Document[], boolean) for description.

Parameters:
deleteField - The field searched for deleteValues.
deleteValues - ArrayList of Strings containing the value matched in deleteField to indicate which document(s) to delete
addDocs - An ArrayList containing Documents to add to the index
block - Indicates whether or not to block other threads or JVMs from read/write from the index during the delete/add operation.
Returns:
True if no errors, otherwise false.

update

public boolean update(String deleteField,
                      ArrayList deleteValues,
                      ArrayList addDocs)
Updates the index by first deleting the documents that match the value(s) indicated in deleteValues in the field deleteField, then adding the documents in addDocs. See update(String, String[], Document[], boolean) for description. Performs an update with blocking on.

Parameters:
deleteField - The field searched for deleteValues.
deleteValues - ArrayList of Strings containing the value matched in deleteField to indicate which document(s) to delete
addDocs - An ArrayList containing Documents to add to the index
Returns:
True if no errors, otherwise false.

getLastModifiedCount

public long getLastModifiedCount()
Gets the version number of the last time the index was modified by adding, deleting or changing a document. The version number counts the number of times the index was modified. If the index is deleted and rebuilt, the count will continue to be incremented until the next time the JVM is re-started. After the JVM has been re-started, the count will resume with the count of the new current index.

Returns:
The lastModifiedCount value

getDocument

public org.apache.lucene.document.Document getDocument(int n)
Gets the nth document in the index.

Parameters:
n - The document number
Returns:
The document value

isIndexing

public boolean isIndexing()
Indicates whether the index is currently being updated or modified. This means documents are in the process of being added or removed from the index.

Returns:
True if the index is in the process of being updated.
See Also:
stopIndexing()

stopIndexing

public void stopIndexing()
Instructs the indexer to stop processing updates. Once complete, the index will be ready for future updating and searching but any additions or deletions that had not been completed will be lost. This method may take several seconds to return.

See Also:
isIndexing()

getAnalyzer

public final org.apache.lucene.analysis.Analyzer getAnalyzer()
Gets the analyzer that has been configured for this index.

Returns:
The Analyzer

close

public void close()
Closes the writers and performs clean-up


finalize

protected void finalize()
Override finalize to ensure resources are released...

Overrides:
finalize in class Object

escape

public static final String escape(String term)
Escapes all Lucene QueryParser reserved characters with a preceeding \. The resulting String will be interpereted by the QueryParser as a single term.

Parameters:
term - The original String
Returns:
The escaped term
See Also:
QueryParser.escape(String)

escape

public static final String escape(String term,
                                  String preserveChars)
Escapes the Lucene QueryParser reserved characters with a preceeding \ except those included in preserveChars.

Parameters:
term - The original String
preserveChars - List of characters NOT to escape
Returns:
The escaped term
See Also:
QueryParser.escape(String)

encodeToTerm

public static final String encodeToTerm(String s)
Encodes a String to an appropriate format that can be indexed as a single term using a StandardAnalyzer. White-space is also encoded and incorporated into the single term. Note that this can not be unencoded. Save the value of the term in a separate field if it needs to be retrieved for display.

Specifically: each letter or number character is left unchanded. All other characters are encoded as the letter 'x' followed by the integer value of the character, for example '@' is encoded as 'x64'.

Parameters:
s - The string to encode.
Returns:
Encoded String that can be used as a single term.

encodeToTerm

public static final String encodeToTerm(String s,
                                        boolean encodeWildCards)
Encodes a String to an appropriate format that can be indexed as a single term using a StandardAnalyzer. White-space is also encoded and incorporated into the single term. Leaving the wild card '*' char un-encoded will produce a String that can be used to search encoded terms using wild cards. Note that this can not be unencoded. Save the value of the term in a separate field if it needs to be retrieved for display.

Specifically: each letter or number character is left unchanded. All other characters are encoded as the letter 'x' followed by the integer value of the character, for example '@' is encoded as 'x64'.

Parameters:
s - The string to encode.
encodeWildCards - True to have the '*' char encoded, false to leave it un-encoded.
Returns:
Encoded String that can be used as a single term.

encodeToTerm

public static final String encodeToTerm(String s,
                                        boolean encodeWildCards,
                                        boolean encodeSpace)
Encodes a String to an appropriate format that can be indexed as a single term or terms using a StandardAnalyzer. Leaving the space char un-encoded will produce a String that will be tokenized by the space char into individual terms. Leaving the wild card '*' char un-encoded will produce a String that can be used to search encoded terms using wild cards. Note that this can not be unencoded. Save the value of the term in a separate field if it needs to be retrieved for display.

Specifically: each letter or number character is left unchanded. All other characters are encoded as the letter 'x' followed by the integer value of the character, for example '@' is encoded as 'x64'.

Parameters:
s - The string to encode.
encodeWildCards - True to have the '*' char encoded, false to leave it un-encoded.
encodeSpace - True to have the space ' ' char encoded, false to leave it un-encoded.
Returns:
Encoded String that can be used as a single term or terms.

getLuceneVersion

public static org.apache.lucene.util.Version getLuceneVersion()
Gets /the version of Lucene.

Returns:
The luceneVersion value

getDateStamp

public static final String getDateStamp()
Gets a datestamp of the current time formatted for display with logs and output.

Returns:
A datestamp for display purposes.

setDebug

public static void setDebug(boolean db)
Sets the debug attribute of the SimpleLuceneIndex object

Parameters:
db - The new debug value

DLESE Tools
v1.6.0