Entity Extraction extracts the required information from user input to accomplish the task. SearchAssist application uses NLP techniques to find named entities from the source field.
The Entity Extraction stage in Index Pipeline can be used to identify a set of entities from a source field and save the identified entities in a target field so that it can be used to identify the intention of the search user better.
Configuration
To configure entity extraction, follow the below steps:
- Click the Indices tab on the top.
- On the left pane, under the Index Configuration section, click Workbench.
- On the Workbench (Index Configuration) page, on the Stages column, click the + icon.
- On the right column, select Entity Extraction from the Stage Type drop-down list.
- Enter a name in the Stage Name field.
- Enter a condition in the Condition field. You can add multiple conditions using the AND/OR connectors. Documents that satisfy the condition will be executed as part of the stage. See below for details.
- Select the field you want to extract entity from as Source Field.
- Define where you want to store the extracted entity as Target Field. This field is created by the application.
- Select an entity from the Add Entities field. These are the entity types that need to be identified from the source field. The supported entities are Date, Time, Date-Time, Date Period, URL, Email, Location, City, Country, Color, Company Name, Currency, Person Name, Number, Percentage, Phone Number, Zip Code, Quantity, Address, and Airport.
- Click Simulate to verify the configurations. The simulator displays the Source and the number of documents to which the mapping was applied, and the result. You can change the Source (if not mentioned in the condition) and the number of documents.
- Once done, click Save Configuration on the top-right.
Conditions
Condition is of the following format: ctx.fieldtype==value
or ctx.fieldtype!=value
. The field_name can be obtained from the Fields table under Index Configuration.
For example, ctx.contentType=="web"
to restrict the extraction to the content from a web source.