Entity Extraction extracts the required information from user input to accomplish the task. SearchAssist application uses NLP techniques to find named entities from the source field. 

The Entity Extraction stage in Index Pipeline can be used to identify a set of entities from a source field and save the identified entities in a target field so that it can be used to identify the intention of the search user better.

You can:

  • Add multiple entities to be extracted from the source field.
  • Define a condition for the entity extraction stage. The entities are extracted only from the documents that satisfy the given condition.
  • Re-order or delete entity extraction rules.
  • Simulate the changes before saving them.

Ensure to Train your app each time you make changes to any index configuration. This builds the index based on the updated configurations.

Configuration

To configure entity extraction, follow the below steps:

  1. Click the Indices tab on the top.
  2. On the left pane, under the Index Configuration section, click Workbench.
  3. On the Workbench (Index Configuration) page, on the Stages column, click the + icon.
  4. On the right column, select Entity Extraction from the Stage Type drop-down list.
  5. Enter a name in the Stage Name field.
  6. Enter a condition in the Condition field. You can add multiple conditions using the AND/OR connectors. Documents that satisfy the condition will be executed as part of the stage. See below for details.
  7. Select the field you want to extract entity from as Source Field.
  8. Define where you want to store the extracted entity as Target Field. This field is created by the application.
  9. Select an entity from the Add Entities field. These are the entity types that need to be identified from the source field. The supported entities are Date, Time, Date-Time, Date Period, URL, Email, Location, City, Country, Color, Company Name, Currency, Person Name, Number, Percentage, Phone Number, Zip Code, Quantity, Address, and Airport.
  10. Click Simulate to verify the configurations. The simulator displays the Source and the number of documents to which the mapping was applied, and the result. You can change the Source (if not mentioned in the condition) and the number of documents.
  11. Once done, click Save Configuration on the top-right.

Conditions

Condition is of the following format: ctx.fieldtype==value or ctx.fieldtype!=value. The field_name can be obtained from the Fields table under Index Configuration.

For example, ctx.contentType=="web" to restrict the extraction to the content from a web source.

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed