Frequently Asked Questions (FAQ)

This FAQ includes the following sections:

Data Provider FAQ

Can I create a custom search page for the items in the data provider?

Yes. The Search page can be customized and easily installed to another location or remote Web server as described in the ODL Search Specification page.

What is different between the 'Search' and 'Admin search' pages?

The Search page is intended to be accessed by the general public. It provides search over the full text found in all files that can be disseminated in oai_dc format. This includes files that reside natively in oai_dc format as well as files that can be converted to oai_dc as described in Providing files in multiple formats. In addition, only files that are enabled in the Metadata Files Configuration page are available for search.

The Admin search page resides in the administration portion of the software and its access is intended to be restricted to trusted users as described in the Configuring jOAI page. It provides search over the full text found in all XML files that are configured in the data provider. Files are searchable and viewable even when public access to them has been disabled in the Metadata Files Configuration page. Admin search also provides options to search by set, available format, and record attribute (not deleted, deleted, etc.).

If my records reside in a database, can I use jOAI to implement an OAI data provider for my repository?

The jOAI data provider allows XML files from a file system to be exposed as items in an OAI data repository. To expose records that reside in a database, write a routine to export the records to XML files at regular intervals such as once a day or once a week, depending on how often the records change. Then setup the data provider to monitor the directory or directories where the files are exported to. See also preparing files for serving.

After the initial set of records have been exported from the database, files should be modified or deleted only when the corresponding database record has been updated or deleted. jOAI will monitor the files and provide them to harvesters according to the OAI-PMH.

Does jOAI support selective or incremental harvesting?

Yes. After performing an initial full harvest of the repository, harvesters may use datestamps to request only those records that have changed or been deleted since that last time of harvest, which can greatly reduce the number of records transferred over the network over time. The data provider implements deleted records and datestamps in accordance with the OAI-PMH to support selective and incremental harvests.

If I remove a file, can I add it back at a later time?

When a file is removed from a directory that is being monitored by the data provider, it's record will be changed to status deleted, and harvesters will be notified that the record has been deleted the next time they harvest from the data provider. At a later time, if a file is added back with the same unique ID as a deleted record (regardless of the directory), the data provider will replace it with the new one, and it's status will no longer be deleted. Harvesters will then receive the new record the next time they harvest from the data provider.

What happens if I accidentally create two files with the same ID?

When jOAI imports a new or modified file into it's index, a check is performed to see if there is an existing record with the same unique ID. If the ID already exists, an error will be reported under 'Indexing Errors' in the the Metadata Files Configuration page, and the file will not be imported into the repository index. To fix the problem the file must either be removed from the directory or a unique ID should be assigned to the file, as described under preparing files for serving.

My records have indexing errors when I put them in the data provider. What's wrong?

XML files that contain text that was copied and pasted from tools such as Microsoft Word often contain invalid characters such as dashes or copyright symbols that are improperly encoded. These 'bad characters' can trigger the XML processors in the software to issue an error. Files must contain well-formed XML and should use UTF-8 encoding. Character references, rather than entity references, should also be used for special characters, as required by the OAI protocol XML response format.

How many records can the data provider scale to?

The jOAI data provider is designed for small to medium size data repositories. The software has been tested successfully with repositories up to 300,000 records. The number of records the software can support depends on the amount of memory available to the Java JVM, the speed of the host machine and the size of the individual records in the repository.

The baseURL that is shown uses the local machine name, but it should use the domain name for the server instead. How can I change it?

The base URL that is shown on the front page and Repository Information page of jOAI and elsewhere reflects the URL that was entered into the web browser when connecting to the software. For example, if a user accesses jOAI using the web address http://localhost:8080/oai, the baseURL will be shown as 'http://localhost:8080/oai/provider'. If the user connects to the same instance of jOAI using the Internet address http://myserver.somewhere.edu/oai, the baseURL will be shown as 'http://myserver.somewhere.edu/oai/provider'.

Harvester FAQ

Where are the harvested records and zip archives saved to?

The harvester saves the records that are harvested into individual files on the file system, one record per file. Files are saved to either a default directory (which is named based upon the name of the provider and optionally the set that is being harvested) or a specific directory that was specified when setting up the harvest. Each harvest is then packaged into a zip archive.

To determine where files and zip archives were saved to after a harvest has occurred, go to the Harvester Setup and Status page, then click on 'View harvest history' for a given harvest. This brings up the detailed history of harvests and shows the full directory path to the harvested files and zip archives.

Each time a harvest is performed for a given harvest configuration, files in the harvest directory may be added, updated or deleted by the harvester depending on the outcome of the harvest. If configured for zipping, at the conclusion of each harvest that results in a change to one or more files, a new zip archive is created, and a maximum of three zip archives are preserved at any given time. Each zip archive contains the exact time of the harvest in its name. The zip archives for each harvest may be downloaded directly from the Harvester Setup and Status page or accessed from the file system.

Can I use jOAI to harvest records into a database?

There are two ways in which the harvester may be used to import records into a database.

The first method, which uses the jOAI web application, requires two parts. First, configure the jOAI harvester to save files to a convenient directory at regular intervals, such as once a day. Then, write a routine to monitor the file directory and add, update or delete the corresponding records to the database when changes occur in the files.

Another method is to use the Harvester API from within native Java code to perform harvests and import metadata records directly into a database.

Does the harvester support selective or incremental harvesting?

Yes. When an automatic harvest is conducted at regular intervals, the harvester checks if the data provider supports deleted records. If deletions are supported, a selective harvest is performed by requesting and synchronizing only those records that have been added, modified or deleted since the previous harvest. If deletions are not supported, a full harvest is performed by deleting all previously harvested records and harvesting all records from scratch.

Similarly when a manual harvest is performed, clicking 'New' performs a selective harvest while clicking 'All' deletes all previously harvested files and performs a full harvest from scratch.

The files that are saved by the harvester include characters like '%3A' in their names. Why is that?

When the harvester saves records, it places each record in a single file, which is named using the OAI identifier associated with the record that was harvested. Reserved characters such as the colon ':' are encoded using hexadecimal values in order to ensure the file name is valid on the file system. For example, if the OAI identifier for a given harvested record is oai:dlese.org:123-ABC, the file will be named oai%3Adlese.org%3A123-ABC.xml. The hexadecimal characters can be converted back to the original form as needed.

How can I provide records that I harvest?

Currently it is a two step process to make records that are harvested available through the data provider. First, harvest the records to a convenient file directory. Second, configure the data provider to point to the same file directory. As new records are added, modified or deleted by the harvester, these changes will be reflected and passed along in the data provider.

Can I search over and view the records I harvest?

The harvester portion of the software does not currently support searching and viewing harvested records directly. However, by configuring harvested records in the data provider as mentioned above, the records will become searchable and viewable in the 'Search' and 'Admin Search' pages.

General FAQ

Is it possible to customize the CSS, HTML or other features of the jOAI user interface?

Yes. jOAI is rendered using CSS, HTML (which is generated by JSP), and JavaScript. Simply modify the .css, .jsp or .js files as desired to change the look-and-feel of the application. Tip: The struts-config.xml file found in WEB-INF describes what .jsp files are used to render certain portions of the application. The file head.jsp defines what items appear in the jOAI menus.

Can I configure jOAI to store my settings and data in a permanent location for backup or reinstallation purposes?

Yes. jOAI saves it's configuration files and stored data inside file directories. By default, these are located inside the WEB-INF directory of the jOAI installation. To store these in a global directory, set the repositoryData and harvesterData configuration parameters to a point to a directory of your choice. See the section titled 'Configure software settings' in the Configuring jOAI page for details.

When upgrading or reinstalling jOAI, how do I preserve the the settings, indexes and files for the data provider and harvester?

If you have previously configured jOAI to store it's configuration files and data in a global directory as described above, you can simply stop Tomcat, upgrade or reinstall the jOAI software (oai.war) and start Tomcat again. Then visit the jOAI admin and search pages to confirm that the settings and indexes have been preserved for the data provider and harvester. In some cases it may be necessary to re-index the files in the data provider for changes to be seen. Before upgrading or reinstalling, be sure to make a backup copy of your settings and data in case you need to revert back for any reason.

If you have not configured jOAI to store it's files in a global directory, follow these steps:

1. Stop Tomcat

2. Move and save the current oai installation to a location outside the webapps directory. (Backup and save a copy).

3. Install the new version of jOAI (put oai.war in webapps, start Tomcat, etc). Tomcat will unpack the new oai.war file.

Then, to restore the previous settings, indexes and files:

4. Stop tomcat again.

5. In the new webapps/oai/WEB-INF directory, replace the two directories 'repository_settings_and_data' and 'harvester_settings_and_data' with the ones saved from the previous installation.

6. Start Tomcat.

7. Visit the jOAI admin and search pages to confirm that the settings and indexes have been preserved for the data provider and harvester.

Can jOAI be configured to run through an Apache web server (httpd)?

Yes. Running jOAI through an Apache web server provides additional functionality that is not available through Tomcat alone. For example, Apache provides robust support for SSL, user authorization and authentication, access control by IP address, virtual host support, web logging, URL redirection, and other functionality. By configuring Tomcat to run through Apache, all of Apache's functionality becomes available. This may be especially convenient for web administrators who are already familiar with Apache.

One of two Apache modules may be used to connect Apache with Tomcat: mod_proxy or mod_jk. Choose one or the other:

mod_proxy - Information for setting up mod_proxy is provided in the Apache Module mod_proxy documentation, with additional configuration information specific to Tomcat provided in the Tomcat proxy how to documentation (proxyName and proxyPort attributes must be added to the non-SSL and SSL HTTP <Connector> elements in Tomcat's server.xml to ensure that URLs in jOAI and other Web applications that rely on the ServletRequest.getServerName() and related Java methods will resolve properly when using mod_proxy).
mod_jk - Information for setting up mod_jk is provided in the Apache Tomcat Connector documentation.

After setting up mod_proxy or mod_jk, a typical configuration scenario would be to use Apache to provide SSL encryption, user authorization and authentication for all pages that reside in the admin area of the software (e.g. https://oai.somewhere.edu/oai/admin*), while leaving all other public jOAI pages open. This scenario provides a relatively secure way to restrict access to the administrative functions of the software to trusted users while leaving access to the data provider, search and other public pages open.

Another scenario might be to restrict access to the software or portions of the software by requestors IP address.

See the Apache documentation for a list of available features and configuration information.

Can jOAI be integrated into an existing Web application?

Yes. It is recommended that jOAI be run as a stand-alone Web application, however it is possible to integrate it into an existing Web application. Either the data provider, the harvester or both can be configured. Here is a general outline of how this may be done:

1. Copy the configuration from web.xml:

All <servlet> elements OAIProviderServlet, OAIHarvesterServlet and action.

All <context-param> elements

All <filter> and <filter-mapping> elements

All <servlet-mapping> elements

All <taglib> elements

Optionally, copy over the <welcome-file-list> and <error-page> elements

2. Copy over files struts-config.xml, users.xml, validation.xml, validator-rules.xml from /WEB-INF to your application.

3. Copy over all JAR files from /WEB-INF/libs (some may not be required)

4. Copy over directories /WEB-INF/classes, /WEB-INF/tlds, /WEB-INF/xsl_files and /WEB-INF/conf. Optionally /WEB-INF/error_pages (if configured in web.xml).

5. Copy over all .jsp, .js and .css files from the root and the /oai_requests, /admin, /docs (optional), /images directories.

Optionally edit, add and remove jsp and css files as needed. The OAI protocol is handled by the pages found in /oai_requests. Administration is handled by the pages found in /admin. The WEB-INF/struts-config.xml file is used to configure URL paths to the JSP pages that handle them, via the Action.