Keyword Extraction is a technique to automatically detect important and relevant words from the text stored in a field.
The Keyword Extraction stage in Indexing Pipeline to identify a set of keywords from a source field and save the identified keywords in a target field. Use the target field to identify the intention of the search user better.
SearchAssist applies various NLP algorithms to extract keywords.
- Define a condition for the keyword extraction stage. SearchAssist extract keywords only from the documents that satisfy the given condition.
- Re-order or delete keyword extractions.
- Simulate the changes before saving them.
To configure for keyword extraction, take the following steps:
- Click the Indices tab on the top.
- On the left pane, under the Index Configuration section, click Workbench.
- On the Workbench (Index Configuration) page, on the Stages column, click the + icon.
- Select Keyword Extraction from the Stage Type dropdown.
- Enter a name in the Stage Name field.
- Enter a condition in the Condition field. You can add multiple conditions using the AND/OR connectors. SearchAssist considers only the documents that satisfy the conditions part of the stage. Refer Conditions for Keyword Extraction.
- Select the field you want to extract keyword from as Source Field
- Name the Target Field where you want to store the extracted keywords. The application creates this field. The following models are supported:
- Topic Rank – it’s a method to extract keyphrases from the most important topics of a document.
- Position Rank – it’s a method to capture both highly frequent words or phrases and their position in a document.
- Multi-partite Rank – it’s a keyphrase extraction method that encodes topical information within a multi-partite graph structure.
- Click Simulate to verify the configurations. The simulator displays the Source and the number of documents to which the mapping was applied, and the result. You can change the Source and the number of documents if there is no mention in the conditions.
- Once ready, click Save Configuration on the upper-right.
Conditions for Keyword ExtractionCondition is of the following format: ctx.field_name==value or ctx.field_name!=value.
Obtain the field_name from the Fields table under Index Configuration.
For example, ctx.contentType==”web” to restrict the extraction from the content from a web source.