Managing the Search index

Manage the Search index
The Search application uses a Lucene 2.4.0 index, supplemented by social facet information. The location of the Search index is mapped to an IBM WAS variable, SEARCH_INDEX_DIR. The value of this variable is set to <CONNECTIONS_DATA_DIRECTORY>/search/index by default.
The index is generated by retrieving all the necessary information from each IBM Connections application on an administrator-defined schedule. Each task defines which applications to crawl and whether to optimize the index at the end of the task. The following applications can be indexed: Activities, Blogs, Bookmarks, Communities, Files, Forums, Profiles, and Wikis.
Search uses the WAS scheduling service for creating and updating the Search index. The scheduling service is based on the Cron calendar, which uses predefined date algorithms to determine when a task should run. While the scheduling service supports the use of a Simple calendar, this is not currently supported for IBM Connections. For more information about the WAS scheduler, see Scheduling tasks.
IBM Connections applications maintain delete and access-control update information for a maximum of 30 days. If indexing is not performed on an index for 30 days, that index is considered to be out-of-date and reindexing is necessary. You must delete and recreate the index to ensure data integrity.
As the information is retrieved from each application, it is written into a temporary index. Saving the information to a temporary index allows the existing copy to be maintained in an unmodified state in the event that a failure occurs during the indexing process. After each of the applications listed in the task definition has been crawled, the consolidation of social information takes place and the temporary index is merged into the main index.
Note: When indexing on a Microsoftâ„¢ Windowsâ„¢ 2008 deployment, you might get the following error: java.io.IOException: Access is denied. This error is caused by an underlying Lucene issue and prevents the index from being updated. To resolve the problem, restart all the machines in the cluster.

Configure scheduled tasks
The SearchService MBean is used to access a service that provides an administrative interface for adding scheduled task definitions to the Home page database.
Running one-off tasks
The SearchService MBean provides commands that allow you to create an indexing optimize task that is scheduled to run once and only once, 30 seconds after being called.
Retrieve file content
Use SearchService commands to perform file content retrieval tasks.
Purging content from the index
Use the deleteFeatureIndex command to purge content for a specific application from the Search index.
Delete the index
From time to time, you might need to delete and rebuild the Search index. For example, if you change the context root of one of the IBM Connections applications, you then need to rebuild the index by deleting the current index. The index is automatically rebuilt the next time the indexing task runs.
Create a stand-alone index
Use the SearchService.startBackgroundIndex command to create a stand-alone index. Using this command helps you to remove inconsistencies from your Search index without the need for downtime while the index is rebuilt.
Remove a node from the index management table
When you are removing a node from a cluster, use the SearchService.removeIndexingNode wsadmin command to remove the node from the index management table and ensure that content from the node is no longer indexed.
Backup and restore
Create a backup of the Search index and save it to a secure location so that it can be used to restore the index in the event of loss or corruption.
Configure file attachment indexing settings
Edit settings in the search-config.xml file to configure Search for file attachments.
Configure temporary directories for storing files
You can configure a temporary directory on each node in your deployment for storing the files for indexing. The files are converted to plain text for indexing in the locations that you specify. If you do not specify a temporary directory for each node, the same directory as the index is used.
Configure the number of crawling threads
Edit settings in the search-config.xml file to specify the maximum number of seedlist threads used when crawling. The maximum number of threads that you should specify is the number of applications that you have installed in your deployment.
SearchCellConfig commands
The SearchCellConfig commands are used to configure the location of the Search index and the IBM LanguageWare dictionaries used by Search, and to configure the file download and conversion service used when indexing file attachments.
SearchService commands
The SearchService commands are used to create, retrieve, update, and delete scheduled task definitions for the indexing and optimization Search operations.

Parent topic
Administer Search
Related concepts
Scheduling tasks
Related reference
Add a list of bookmarks to a web page
Add a set of bookmarks to a web page
Search bookmarks

});

+
Search Tips | Advanced Search