DLESE Tools
v1.6.0

org.dlese.dpc.index.writer
Class IndexingTools

java.lang.Object
  extended by org.dlese.dpc.index.writer.IndexingTools

public class IndexingTools
extends Object

Tools to aid in indexing.

Author:
John Weatherley

Field Summary
static String adminDefaultFieldName
          Admin default field 'admindefault'
static String defaultFieldName
          Default field 'default'
static String PHRASE_SEPARATOR
          String used to separate and preserve phrases indexed as text, includes leading and trailing white space.
static String stemsFieldName
          Stems field 'stems'
 
Constructor Summary
IndexingTools()
           
 
Method Summary
static void addToAdminDefaultField(org.apache.lucene.document.Document myDoc, String content)
          Indexes the given text into the admin default field.
static void addToDefaultAndStemsFields(org.apache.lucene.document.Document myDoc, String content)
          Indexes the given text into the default and stems fields.
static String encodeToTerm(String text)
          Same as {org.dlese.dpc.index.SimpleLuceneIndex#encodeToTerm(String)}.
static String encodeToTerm(String text, boolean encodeWildCards)
          Same as {org.dlese.dpc.index.SimpleLuceneIndex#encodeToTerm(String,boolean)}.
static String[] extractSeparatePhrasesFromString(String separatedPhrases)
          Extracts the phrases from a String that was created using the method makeSeparatePhrasesFromNodes(List nodes) or makeSeparatePhrasesFromStrings(List strings).
static String[] extractStringsFromString(String separatedWords)
          Extracts the words from a String that was created using the method makeStringFromNodes(List nodes).
static String[] getAnalyzedTerms(String textToParse, String field, org.apache.lucene.analysis.Analyzer analyzer)
          Extracts all terms in any field from a Lucene query using the given Analyzer.
static org.apache.lucene.analysis.Token[] getAnalyzedTokens(String textToParse, String field, org.apache.lucene.analysis.Analyzer analyzer)
          Extracts all Tokens from a Lucene query using the given Analyzer.
static StringBuffer getAnalyzerOutput(String textToParse, String field, org.apache.lucene.analysis.Analyzer analyzer)
          Creates a StringBuffer to display the tokens created by a given analyzer.
static String makeSeparatePhrasesFromNodes(List nodes)
          Creates a String separated by the phrase separator term from the text of each of the Element or Attributes dom4j Nodes provided.
static String makeSeparatePhrasesFromStrings(List strings)
          Creates a String separated by the phrase separator term from each of the Strings provided.
static String makeSeparatePhrasesFromStrings(String[] strings)
          Creates a String separated by the phrase separator term from each of the Strings provided.
static String makeStringFromNodes(List nodes)
          Creates a String separated by spaces from the text of each of the Element or Attributes dom4j Nodes provided.
static String tokenizeID(String ID)
          Tokenizes a DLESE ID by replacing the char - with a blank space.
static String tokenizeURI(String uri)
          Tokenizes a URI by replacing the unindexable chars with a blank space.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

defaultFieldName

public static final String defaultFieldName
Default field 'default'

See Also:
Constant Field Values

stemsFieldName

public static final String stemsFieldName
Stems field 'stems'

See Also:
Constant Field Values

adminDefaultFieldName

public static final String adminDefaultFieldName
Admin default field 'admindefault'

See Also:
Constant Field Values

PHRASE_SEPARATOR

public static final String PHRASE_SEPARATOR
String used to separate and preserve phrases indexed as text, includes leading and trailing white space.

See Also:
Constant Field Values
Constructor Detail

IndexingTools

public IndexingTools()
Method Detail

addToDefaultAndStemsFields

public static final void addToDefaultAndStemsFields(org.apache.lucene.document.Document myDoc,
                                                    String content)
Indexes the given text into the default and stems fields.

Parameters:
myDoc - Document to add to
content - Content to add

addToAdminDefaultField

public static final void addToAdminDefaultField(org.apache.lucene.document.Document myDoc,
                                                String content)
Indexes the given text into the admin default field.

Parameters:
myDoc - Document to add to
content - Content to add

makeSeparatePhrasesFromNodes

public static final String makeSeparatePhrasesFromNodes(List nodes)
Creates a String separated by the phrase separator term from the text of each of the Element or Attributes dom4j Nodes provided. The input list may be null.

A call to this method might look like:
String value = makeIndexPhrasesFromNodes(xmlDoc.selectNodes("/news-oppsRecord/topics/topic"));

Parameters:
nodes - List of Elements or Attributes
Returns:
A String or null

makeSeparatePhrasesFromStrings

public static final String makeSeparatePhrasesFromStrings(List strings)
Creates a String separated by the phrase separator term from each of the Strings provided. The input list may be null.

Parameters:
strings - List of Strings or null
Returns:
A String or null

makeSeparatePhrasesFromStrings

public static final String makeSeparatePhrasesFromStrings(String[] strings)
Creates a String separated by the phrase separator term from each of the Strings provided. The input list may be null.

Parameters:
strings - Array of Strings or null
Returns:
A String or null

extractSeparatePhrasesFromString

public static final String[] extractSeparatePhrasesFromString(String separatedPhrases)
Extracts the phrases from a String that was created using the method makeSeparatePhrasesFromNodes(List nodes) or makeSeparatePhrasesFromStrings(List strings).

Parameters:
separatedPhrases - String that contains the phrase separator to seperate phrases
Returns:
An array of phrase Strings or null if the imput is null

makeStringFromNodes

public static final String makeStringFromNodes(List nodes)
Creates a String separated by spaces from the text of each of the Element or Attributes dom4j Nodes provided. The input list may be null.

A call to this method might look like:
String value = makeStringFromNodes(xmlDoc.selectNodes("/news-oppsRecord/topics/topic"));

Parameters:
nodes - List of dom4j Nodes of Elements or Attributes
Returns:
A String or null

extractStringsFromString

public static final String[] extractStringsFromString(String separatedWords)
Extracts the words from a String that was created using the method makeStringFromNodes(List nodes).

Parameters:
separatedWords - DESCRIPTION
Returns:
An array of word Strings

tokenizeID

public static final String tokenizeID(String ID)
Tokenizes a DLESE ID by replacing the char - with a blank space.

Parameters:
ID - The ID String
Returns:
The tokenized ID

tokenizeURI

public static final String tokenizeURI(String uri)
Tokenizes a URI by replacing the unindexable chars with a blank space.

Parameters:
uri - A URL or URI
Returns:
The tokenized URI

encodeToTerm

public static final String encodeToTerm(String text)
Same as {org.dlese.dpc.index.SimpleLuceneIndex#encodeToTerm(String)}.

Parameters:
text - Text
Returns:
Encoded text

encodeToTerm

public static final String encodeToTerm(String text,
                                        boolean encodeWildCards)
Same as {org.dlese.dpc.index.SimpleLuceneIndex#encodeToTerm(String,boolean)}.

Parameters:
text - Text
encodeWildCards - True to encode the '*' wildcard char, false to leave unencoded.
Returns:
Encoded text

getAnalyzedTokens

public static final org.apache.lucene.analysis.Token[] getAnalyzedTokens(String textToParse,
                                                                         String field,
                                                                         org.apache.lucene.analysis.Analyzer analyzer)
Extracts all Tokens from a Lucene query using the given Analyzer.

Parameters:
textToParse - The text to analyze with the analyzer
analyzer - The analyzer to use
field - The field this Analyzer should interpret the text as, or null to use 'default'
Returns:
The Tokens generated by the analyzer

getAnalyzedTerms

public static final String[] getAnalyzedTerms(String textToParse,
                                              String field,
                                              org.apache.lucene.analysis.Analyzer analyzer)
Extracts all terms in any field from a Lucene query using the given Analyzer.

Parameters:
textToParse - The text to analyze with the analyzer
analyzer - The analyzer to use
field - The field this Analyzer should interpret the text as, or null to use 'default'
Returns:
The terms generated by the analyzer

getAnalyzerOutput

public static final StringBuffer getAnalyzerOutput(String textToParse,
                                                   String field,
                                                   org.apache.lucene.analysis.Analyzer analyzer)
Creates a StringBuffer to display the tokens created by a given analyzer. Output is of the form: [token1] [token2].

Parameters:
textToParse - The text to analyze with the analyzer
analyzer - The analyzer to use
field - The lucene field name, or null to use default
Returns:
The analyzerTokenOutput value

DLESE Tools
v1.6.0