DLESE Tools
v1.6.0

org.dlese.dpc.index.writer.xml
Class XMLIndexer

java.lang.Object
  extended by org.dlese.dpc.index.writer.xml.XMLIndexer

public class XMLIndexer
extends Object

Adds index fields to a Lucene Document from any well-formed XML. Individual field names are derived from the xPath to each element and attribute in the XML instance document. Fields are encoded to support text, keyword and stemmed search. Also creates standard fields for IDs, URLs, title, description and geospatial bounding box footprint. The 'default' and 'stems' fields are also indexed as text and stemmed text, respectively.

A XMLIndexerFieldsConfig may be supplied to configure specific search fields for given XML formats. If a field is defined in the XMLIndexerFieldsConfig, and content is avialable at the given xPath, it will override the value set for ids, urls, title or description. In addition, field values configured by schema override those configured by xmlFormat.

Author:
John Weatherley
See Also:
XMLIndexerFieldsConfig

Constructor Summary
XMLIndexer(Document localizedXmlDocument, String xmlFormat, XMLIndexerFieldsConfig xmlIndexerFieldsConfig)
          Constructor for the XMLIndexer object
XMLIndexer(String xmlString, String xmlFormat, XMLIndexerFieldsConfig xmlIndexerFieldsConfig)
          Constructor for the XMLIndexer object
XMLIndexer(URL urlToXml, String xmlFormat, XMLIndexerFieldsConfig xmlIndexerFieldsConfig)
          Constructor for the XMLIndexer object
 
Method Summary
 BoundingBox getBoundingBox()
          Returns the value of boundingBox.
 String getDescription()
          Returns the value of description.
 String getFullXmlAttributeContent()
          Gets the full content of each Attribute in the XML.
 String getFullXmlElementContent()
          Gets the full content of each Element in the XML.
 String[] getIds()
          Returns the value of ids.
 String[] getIdsEncoded()
          Returns unique IDs for the item being indexed encoded for indexing.
 List getRelatedIds()
          Gets the ids of related records.
 Map getRelatedIdsMap()
          Gets the ids of related records.
 List getRelatedUrls()
          Gets the urls of related records.
 Map getRelatedUrlsMap()
          Gets the urls of related records.
 String getTitle()
          Returns the value of title.
 String[] getUrls()
          Returns the value of urls.
 Document getXmlDocument()
          Gets the localized Dom4j Document for this XML instance.
 String getXPathFieldsPrefix()
          Returns the value of xPathFieldsPrefix, or null if none.
 void indexFields(org.apache.lucene.document.Document luceneDoc)
          Indexes the contents of the XML, adding fields to the Lucene Document that is supplied.
 boolean indexJavaBeanFields(org.apache.lucene.document.Document luceneDoc)
          Indexes Java Bean XML that was encoded with the java.beans.XMLEncoder class, using the bean properties as field names.
 void indexXpathFields(org.apache.lucene.document.Document luceneDoc)
          Indexes the content of each element and attribute in the source XML as individual search fields, using the xPath to the element or attribute as the field name.
 void setBoundingBox(BoundingBox boundingBox)
          Sets the value of boundingBox.
 void setDescription(String description)
          Sets the value of description.
 void setIds(String[] ids)
          Sets the value of ids.
 void setIndexDefaultAndStemsField(boolean indexDefaultAndStemsField)
          Sets whether to index the default, admindefault, and stems field for this record.
 void setTitle(String title)
          Sets the value of title.
 void setUrls(String[] urls)
          Sets the value of urls.
 void setXPathFieldsPrefix(String xPathFieldsPrefix)
          Sets the value of xPathFieldsPrefix, which is appended at the front of the xPath fields when indexed.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

XMLIndexer

public XMLIndexer(Document localizedXmlDocument,
                  String xmlFormat,
                  XMLIndexerFieldsConfig xmlIndexerFieldsConfig)
Constructor for the XMLIndexer object

Parameters:
localizedXmlDocument - A localized XML Document
xmlFormat - The XML format being indexed, for example adn or oai_dc
xmlIndexerFieldsConfig - The config, or null if not used

XMLIndexer

public XMLIndexer(String xmlString,
                  String xmlFormat,
                  XMLIndexerFieldsConfig xmlIndexerFieldsConfig)
           throws Exception
Constructor for the XMLIndexer object

Parameters:
xmlString - A valid XML string
xmlFormat - The XML format being indexed, for example adn or oai_dc
xmlIndexerFieldsConfig - The config, or null if not used
Throws:
Exception - If error

XMLIndexer

public XMLIndexer(URL urlToXml,
                  String xmlFormat,
                  XMLIndexerFieldsConfig xmlIndexerFieldsConfig)
           throws Exception
Constructor for the XMLIndexer object

Parameters:
urlToXml - URL to an XML document
xmlFormat - The XML format being indexed, for example adn or oai_dc
xmlIndexerFieldsConfig - The config, or null if not used
Throws:
Exception - If error
Method Detail

setIndexDefaultAndStemsField

public void setIndexDefaultAndStemsField(boolean indexDefaultAndStemsField)
                                  throws IllegalStateException
Sets whether to index the default, admindefault, and stems field for this record.

Parameters:
indexDefaultAndStemsField - The value to assign indexDefaultAndStemsField.
Throws:
IllegalStateException - If called after method #indexFields has been called

getTitle

public String getTitle()
                throws IllegalStateException
Returns the value of title.

Returns:
The title value
Throws:
IllegalStateException - If called prior to calling method #indexFields

setTitle

public void setTitle(String title)
              throws IllegalStateException
Sets the value of title.

Parameters:
title - The value to assign title.
Throws:
IllegalStateException - If called after method #indexFields has been called

getDescription

public String getDescription()
                      throws IllegalStateException
Returns the value of description.

Returns:
The description value
Throws:
IllegalStateException - If called prior to calling method #indexFields

setDescription

public void setDescription(String description)
                    throws IllegalStateException
Sets the value of description.

Parameters:
description - The value to assign description.
Throws:
IllegalStateException - If called after method #indexFields has been called

getUrls

public String[] getUrls()
                 throws IllegalStateException
Returns the value of urls.

Returns:
The urls value
Throws:
IllegalStateException - If called prior to calling method #indexFields

setUrls

public void setUrls(String[] urls)
             throws IllegalStateException
Sets the value of urls.

Parameters:
urls - The value to assign urls.
Throws:
IllegalStateException - If called after method #indexFields has been called

getIds

public String[] getIds()
                throws IllegalStateException
Returns the value of ids.

Returns:
The ids value
Throws:
IllegalStateException - If called prior to calling method #indexFields

setIds

public void setIds(String[] ids)
            throws IllegalStateException
Sets the value of ids.

Parameters:
ids - The value to assign ids.
Throws:
IllegalStateException - If called after method #indexFields has been called

getIdsEncoded

public String[] getIdsEncoded()
                       throws IllegalStateException
Returns unique IDs for the item being indexed encoded for indexing. If more than one ID is present, the first one is the primary.

Returns:
The id Strings encoded for indexing
Throws:
IllegalStateException - If called prior to calling method #indexFields
See Also:
getIds()

getRelatedIds

public List getRelatedIds()
                   throws IllegalStateException
Gets the ids of related records.

Returns:
The related ids
Throws:
IllegalStateException - If called prior to calling method #indexFields

getRelatedUrls

public List getRelatedUrls()
                    throws IllegalStateException
Gets the urls of related records.

Returns:
The related urls
Throws:
IllegalStateException - If called prior to calling method #indexFields

getRelatedIdsMap

public Map getRelatedIdsMap()
                     throws IllegalStateException
Gets the ids of related records. The Map key contains the relationship (isAnnotatedBy, etc.) and the Map value contains a List of Strings that indicate the ids of the target records.

Returns:
The related ids
Throws:
IllegalStateException - If called prior to calling method #indexFields

getRelatedUrlsMap

public Map getRelatedUrlsMap()
                      throws IllegalStateException
Gets the urls of related records. The Map key contains the relationship (isAnnotatedBy, etc.) and the Map value contains a List of Strings that indicate the urls of the target records.

Returns:
The related urls
Throws:
IllegalStateException - If called prior to calling method #indexFields

getXPathFieldsPrefix

public String getXPathFieldsPrefix()
Returns the value of xPathFieldsPrefix, or null if none.


setXPathFieldsPrefix

public void setXPathFieldsPrefix(String xPathFieldsPrefix)
                          throws IllegalStateException
Sets the value of xPathFieldsPrefix, which is appended at the front of the xPath fields when indexed. Set to null to use none (default).

Parameters:
xPathFieldsPrefix - The value to append to the xPath fields, or null for none
Throws:
IllegalStateException

getBoundingBox

public BoundingBox getBoundingBox()
Returns the value of boundingBox.


setBoundingBox

public void setBoundingBox(BoundingBox boundingBox)
Sets the value of boundingBox.

Parameters:
boundingBox - The value to assign boundingBox.

getFullXmlElementContent

public String getFullXmlElementContent()
                                throws IllegalStateException
Gets the full content of each Element in the XML. Attribute content is not included. If this is a Java Bean, gets the contnet of all Bean properties. Method #indexFields must be called prior to using this method.

Returns:
The full Element content
Throws:
IllegalStateException - If called prior to calling method #indexFields

getFullXmlAttributeContent

public String getFullXmlAttributeContent()
                                  throws IllegalStateException
Gets the full content of each Attribute in the XML. Element content is not included. Method #indexFields must be called prior to using this method.

Returns:
The full Attribute content
Throws:
IllegalStateException - If called prior to calling method #indexFields

getXmlDocument

public Document getXmlDocument()
Gets the localized Dom4j Document for this XML instance.

Returns:
The xml Document

indexFields

public void indexFields(org.apache.lucene.document.Document luceneDoc)
                 throws Exception
Indexes the contents of the XML, adding fields to the Lucene Document that is supplied.

Parameters:
luceneDoc - The Document to add fields to
Throws:
Exception - If error, provides an appropriate message to display in indexing reports.

indexXpathFields

public void indexXpathFields(org.apache.lucene.document.Document luceneDoc)
                      throws Exception
Indexes the content of each element and attribute in the source XML as individual search fields, using the xPath to the element or attribute as the field name. If an xPath field prefix has been indicated it will be inserted at the beginning of the field path.

Parameters:
luceneDoc - The Document to add fields to
Throws:
Exception - If error, provides an appropriate message to display in indexing reports.
See Also:
setXPathFieldsPrefix(java.lang.String)

indexJavaBeanFields

public boolean indexJavaBeanFields(org.apache.lucene.document.Document luceneDoc)
                            throws Exception
Indexes Java Bean XML that was encoded with the java.beans.XMLEncoder class, using the bean properties as field names. If this is not Java Bean encoded XML, nothing is done, returns false.

Parameters:
luceneDoc - The Document to add fields to
Returns:
True if this is a Java Bean and property fields were indexed.
Throws:
Exception - If error, provides an appropriate message to display in indexing reports.

DLESE Tools
v1.6.0