Track Status for Each Item Crawl

About

The ability to track status for individual items when crawling gives you insights into different documents during crawls:

  • Which documents were picked up by the crawl?
  • What is the status of those documents? (Success, Fail, Warning, Delete)
  • What was the reason for failures?
  • Where did the failure happen (Connector, Target, CEWS)?
  • What was the error message?

Tip: Knowing the answers to some or all of these questions enables you to solve problems, troubleshoot issues, and manually set an individual document or folder to be re-crawled on the next (scheduled) incremental crawl.

User Interface

The crawl log is displayed on a separate page and can be opened from two places:

  • The Actions menu on the Content Sources page.
  • The Actions menu on the Tasks page.

If a page is opened for a content, then all items can be checked stored in the search index.

If it was opened for a job, then only those items will be displayed which were picked up by that crawl job.

Crawl Overview Information

  • Content: The selected content/job is shown in this field at the top of page.
  • Crawl: Crawl status is shown in this field.
  • Statistics 
    • Documents with errors
    • Documents with warnings
    • Documents without messages

Table - Actions

  • View Items: Opens the details tab and filters the items to the specific message.
  • Recrawl All: Sets all items containing the message to re-crawl on next incremental. This can include folders as well as documents.
  • Recrawl: Sets the selected document or folder to re-crawl on next incremental crawl. This can include folders as well as documents.
  • Test: Starts test bench for the selected item.

Summary Tab

An example of a successfully run crawl is shown below with the "Summary" tab open.

Displayed in the table columns includes the following:

  • Message: The status of the items captured by the crawl. This can include folders as well as documents.
    Example status messages include:
    • (Success): "Item was successfully processed without any errors or warnings"
    • (Time Out): "A call to the source system API timed out"
    • (No content): "Document has no content"
    • (Server not responding): "Elastic Server is not responding"
    • (Unknown users/groups): "Unknown users or groups in ACL"
    • (Processing failed): "Processing of some metadata failed"
  • Untitled: Icons in this column indicate whether the status message applies to a folder or file
  • Count: The number of items that correlate to each message status
  • Actions: Actions available for each the items contained in each message status.

Details Tab

An example of a successfully run crawl is shown below with the "Details" tab open.

Displayed in the table columns includes the following:

  • Status: Icon indicating success or failure.
  • Timestamp: Start of processing in date/time format.
  • Type: Indicated by an icon, such as document, folder, etc.
  • Url: Address or path of item
  • Change Type: Change to index: whether document was added, etc. 
  • Duration: How much time elapsed to process the item
  • Actions: Actions available the item. See "Table - Actions" above.