Entity Extraction refers to the required information expected from the user as input to accomplish the task. SearchAssist application uses NLP techniques to find named entities from the source field.
The Entity Extraction stage in Indexing Pipeline is used to identify a set of entities from a source field and save the identified entities in a target field. SearchAssist uses the target field to identify the intention of the search user more accurately.
- Add multiple entities to be extracted from the source field.
- Define a condition for the entity extraction stage. The entities are extracted only from the documents that satisfy the given condition.
- Re-order or delete entity extraction rules.
- Simulate the changes before saving them.
To configure for entity extraction, take the following steps:
- Click the Indices tab on the top.
- On the left pane, under the Index Configuration section, click Workbench.
- On the Workbench (Index Configuration) page, on the Stages column, click the + icon.
- Select Entity Extraction from the Stage Type dropdown.
- Enter a name in the Stage Name field.
- Enter a condition in the Condition field. You can add multiple conditions using the AND/OR connectors/extensions. SearchAssists considers only the documents that satisfy the conditions as part of the stage. See the following details.
- Select the field you want to extract the entity from as Source Field.
- Name the Target Field where you want to store the extracted entity. The application creates the target field.
- Select an entity from the Add Entities field. These are the entity types that need to be identified from the source field. The supported entities are Date, Time, Date-Time, Date Period, URL, Email, Location, City, Country, Color, Company Name, Currency, Person Name, Number, Percentage, Phone Number, Zip Code, Quantity, Address, and Airport.
- Click Simulate to verify the configurations. The simulator displays the Source and the number of documents to which the mapping was applied, and the result. You can change the Source and the number of documents if there is no mention of it in the conditions.
- Once ready, click Save Configuration on the upper-right.
Conditions to Extract Entities
Condition is of the following format: ctx.fieldtype==value or ctx.fieldtype!=value. Obtain the field_name from the Fields table under Index Configuration.
For example, ctx.contentType==”web” to restrict the extraction to the content from a web source.