発話テスト

Botが関連するタスクでユーザーの発話にBotが応答することを確認するには、さまざまなユーザー入力でBotをテストすることが重要です。予想されるユーザー入力の大規模なサンプルを使用してBotを評価することによって、Botの応答に関する洞察が得られるだけでなく、多様な人間の言い回しを解釈するBotをトレーニングする絶好の機会が得られます。

Botのトレーニング関連のすべてのアクティビティは、発話テストモジュールから実行することができます。ここでは、テストとトレーニングの記事全体で例として、次のタスクで構成されるフライト予約Botのサンプルを使用します。

Botのテスト

簡単に言えば、BotのテストとはBotがユーザーの発話に最も関連性の高いタスクで応答できるかどうかを確認することです。言語の柔軟性を考えると、ユーザーは同じインテントを表現するために幅広いフレーズを使用することになります。

たとえば、1月1日のサンフランシスコからロサンゼルスへのチケットを変更したいを旅行日を変更したいと言い換えることができます。1月1日には間に合いません。コツとしては、これらの発話の両方を同じインテントまたは予約の変更タスクとしてとらえるようにBotをトレーニングすることです。

したがって、Botのテストを開始する最初のステップとしては、Botの応答をテストするためのユーザーの発話の代表的なサンプルを特定することです。サポートチャットログ、オンラインコミュニティ、関連するポータルのFAQページなど、実際の言語の使用状況を反映したデータソースを探します。

Botのテスト方法

Botをテストするには次の手順に従ってください。

テストするBotを開きます。
左側のナビゲーションパネルからテストにカーソルを合わせ、発話テストをクリックします。
複数インテントのモデルがある場合、発話をテストしたいインテントを選択することができます。機械学習エンジンは、選択したモデルからのみインテントを検出します。
ユーザーの発話を入力フィールドの中に、テストしたい発話を入力します。例：予定を再スケジュールする。LA行きのチケットをキャンセルする。
結果は、単一,複数のインテントがある場合、または一致するインテントがない場合に表示されます。

テスト結果の種類

Botに対してユーザーの発話をテストすると、NLPエンジンはそのインテントに一致するBotタスクを見つけようとします。NLPエンジンは、機械学習、ファンダメンタルミーニング、ナレッジグラフ（Botがある場合）モデルを使用したハイブリッドアプローチを使用して、一致するインテントを関連性でスコア化します。このモデルは、ユーザーの発話を一致の可能性または完全一致のいずれかに分類します。

完全一致では、確信度の高いスコアが得られ、ユーザーの発話に完全に一致すると見なされます。公開済みのBotでは、ユーザー入力が単一の完全一致と一致する場合、Botはタスクを直接実行します。発話が複数の完全一致と一致する場合、エンドユーザーが選択できるオプションとして送信されます。

一方、一致の可能性はユーザー入力に対して適度にスコアが高いものの、完全一致と呼ぶには十分な確信度が足りないインテントのことです。内部的には、システムはスコアに基づいて、一致の可能性を優れた一致と不確かな一致に分類します。エンドユーザーの発話が公開済みのBotで一致の可能性を生成していた場合、Botはこれらの一致をエンドユーザーへの提案としてもしかして？と送信します。

以下は、ユーザーの発話テストで考えられる結果です。

単一の（一致の可能性または完全一致）：NLPエンジンは、単一のインテントまたはタスクのあるユーザーの発話に一致するものを見つけます。インテントは、ユーザー発話フィールドの下に表示されます。適切な一致である場合は、次の発話のテストに進むか、タスクをさらにトレーニングしてスコアを向上させることができます。不適切な一致である場合は、不適切としてマークして適切なインテントを選択できます。
複数の一致（一致の可能性または完全一致またはその両方）：NLPエンジンは、ユーザーの発話と一致する複数インテントを特定します。結果から、一致するタスクのラジオボタンを選択してトレーニングします。
未特定のインテント：ユーザー入力は、リンクされたBotのいずれのタスクとも一致しませんでした。インテントを選択し、ユーザーの発話に一致するようにトレーニングします。

エンティティの一致

Botのテスト中に、一致したエンティティが表示されます。発話からのエンティティは、最初にNERとパターンエンティティ、次に残りのエンティティの順序で処理されます。

プラットフォームのリリース8.0以降では、エンティティがどのように一致したか、およびどの確信スコアと一致したかという詳細も表示されます。詳細は次のとおりです。

特定エンジン-機械学習またはファンダメンタルミーニング
トレーニングタイプ-NER、パターントレーニング、エンティティ名、システムコンセプトなどから一致を行うことができます。パターンが一致する場合は、行をクリックすると同様に詳細が表示されます。
NERトレーニングを使用して機械学習エンジンによって特定された確信度スコアです（条件付きランダムフィールドがNERモデルとして選択されている場合のみ）。

テスト結果の分析

ユーザーの発話をテストすると、一致するインテントに加えてNLP分析ボックスも表示され、候補リストのインテント、候補リストに挙がったインテントを使用したNLPモデル、対応するスコア、最終的な勝者の概要が表示されます。

ファンダメンタルミーニングタブの下では、候補リストに含まれていなくてもすべてのインテントのスコアを確認できます。

前述のように、Kore.ai NLPエンジンは、機械学習、ファンダメンタルミーニング、ナレッジグラフ（存在する場合）のモデルを使用してインテントを一致させます。NLPエンジンが単一の完全一致 を見つけた場合、基礎となるモデルの1つを介して、そのタスクは一致するインテントとして表示されます。テストで複数の完全一致が検出された場合、適切なインテントを選択するためのオプションとして受け取ります。

モデルが複数の一致の可能性を候補リストに挙げた場合、リストアップされたすべてのインテントは、ファンダメンタルミーニングモデルを使用して、ランキングおよび解決によって再スコアリングされ、最終的な勝者が決定されます。

場合によっては、複数の一致の可能性が再スコアリング後も同じスコアを確保することがあります。その場合は複数の一致として表示され、開発者は1つを選択することができます。NLP分析ボックスの学習モデル名が付いたタブをクリックすると、インテントスコアが表示されます。

注意：NLPスコアは絶対値であり、同じ入力を持つ他のタスクと比較するためにのみ使用できます。タスクスコアは、異なる発話間で比較することはできません。

各モデルダイアログで右上のアイコンをクリックすると、対応するエンジンの設定としきい値が表示されます。

機械学習モデル

機械学習モデルは、ユーザー入力をタスクラベルおよび各タスクのトレーニング発話と一致させようとします。ユーザー入力が複数の文から構成されている場合、各文はタスク名とタスクの発話に対して個別に実行されます。

機械学習モデルボタンをクリックして、NLP分析の機械学習モデルセクションを開きます。これは、ポジティブスコアを確保したタスクの名前のみを表示します。一般に、タスクに追加するトレーニング発話の数が多いほど発見される可能性が高くなります。詳細については、機械学習を参照してください。

FMモデル

機械学習モデルとは別にBot内の各タスクは、タスク名、同義語、パターンのさまざまな組み合わせを含む包括的なカスタムNLPアルゴリズムを使用して、ユーザー入力に対してスコアリングされます。

ファンダメンタルミーニング（FM）モデルタブには、Bot内のすべてのインテントの分析が表示されます。タブをクリックすると各タスクのスコアが表示されます。

処理済みの発話をクリックすると、ユーザーの発話がどのように分析および処理されたかが表示されます。

リリース7.2以降では、FMエンジンはBotの言語に応じて2つの方法でモデルを生成します。

アプローチ1：ドイツ語とフランス語でサポートされています。

単語分析要素では、原語、普遍的な品詞、従属関係、関連語に関して詳しく説明しています。

次に、処理済みの各インテントごとにスコア詳細が表示されます。スコアリングされたインテント（一致または削除）を選択すると、各単語のスコアリングの詳細が表示されます。これには、発話された単語と、依存関係の解析に基づいてそれぞれに割り当てられたスコアが含まれます。

アプローチ2：上記以外の言語でサポートされています。

単語分析要素では、原語、文中での役割、処理済みの単語（スペル修正の場合）に関して詳しく説明しています。

次に、処理済みの各インテントごとにスコア詳細が表示されます。スコアリングされたインテント（一致または削除）を選択すると、各単語のスコアリングの詳細が表示されます。内訳の詳細は以下のとおりです。

単語一致：タスク名またはトレーニングされたタスクの発話に一致するユーザー入力の単語数に与えられるスコアです。
単語の範囲：タスク名、フィールド名、発話、同義語など、タスク内の全体的な単語と一致する単語の割合を示すスコアです。
正確な単語：同義語ではなく、完全に一致した単語の数に対して与えられるスコアです。
ボーナス
- 文の構造：ユーザー入力の文構造と一致した場合にボーナスが発生します。
- 単語の位置：文中での位置に基づいて単語に与えられるスコアは、文の先頭の方に来る個々の単語が優先してスコアを得られます。単語が文頭に近い場合は、追加的なクレジットが得られます。
- オーダーボーナス：タスクラベルと同じオーダーの単語数にボーナスが与えられます。
- 役割ボーナス：一次および二次的役割（主語/動詞/目的語）が一致している数に応じてボーナスが与えられます。
- スプレッドボーナス：パターン内の最初と最後に一致した単語の位置の差に与えられるボーナスです。差が大きいほど、スコアは大きくなります。
ペナルティ：タスク名の前に複数のフレーズがある場合、またはタスクラベルの途中に接続詞がある場合にペナルティが発生します。

ナレッジグラフ

Botがナレッジグラフで構成されている場合、ユーザーの発話は用語を抽出するために処理され、関連するパスを取得するためにナレッジグラフにマッピングされます。事前設定されたしきい値を超える用語数を含むすべてのパスは、さらにスクリーニングするために候補リストに入れられます。100％の用語がカバーされ、パス内に類似のFAQがあるパスは、完全に一致すると見なされます。

発話がダイアログをトリガーする場合（ナレッジグラフでダイアログオプションを実行するなど）、一致したインテントおよび一致した発話として同じものが表示されます。以下のトレーニングセクションで詳しく説明しているように、機械学習またはFMエンジンからのインテントの場合と同じように、Botをさらにトレーニングすることができます。

ここからナレッジグラフトレーニングの詳細をご覧ください。

Botのトレーニング

トレーニングとは、ユーザー入力に基づいてあるBotのタスクまたはユーザーのインテントを別のBotタスクよりも優先させるために、NLPエンジンのパフォーマンスを向上させる方法です。考えられるすべてのユーザーの発話と入力についてBotをテストし、必要に応じてトレーニングを行う必要があります。

Botのトレーニング

ユーザー発話を入力した後、テスト結果に応じて次のいずれかを実行してトレーニングオプションを開きます。
- 一致のないインテントの場合：インテントを選択ドロップダウンリストから、ユーザーの発話と一致させるインテントを選択します。
- 複数の一致したインテントがある場合：一致させたいインテントのラジオボタンを選択します。
- 単一の一致したインテントがある場合：一致したインテントの名前をクリックします。
入力したユーザーの発話が機械学習発話セクション下のフィールドに表示されます。発話をインテントに追加するには、保存してトレーニングするをクリックします。必要な数の発話を次々に追加することができます。詳細については、機械学習を参照してください。
インテントの同義語セクションでは、タスク名の各単語が個別の項目として表示されます。単語の同義語を入力して、NLPインタープリターが正しいタスクを認識する精度を最適化します。詳細については、同義語の管理をご覧ください。
インテントパターンセクションで、インテントのタスクパターンを入力します。詳細については、パターンの管理をご覧ください。
関連するトレーニング項目の作成が完了したら、発話の再実行をクリックして、高い信頼スコアを取得するためにインテントが改善されたかどうかを確認します。

FAQでのトレーニング

Botがユーザーの発話にFAQで応答するようにしたい場合は、次の2つの方法があります。

FAQページから用語、用語設定、クラスを設定し、ナレッジグラフをトレーニングして発話を再テストします。
ナレッジグラフページから選択したFAQに代替の質問として発話を追加し、ナレッジグラフをトレーニングして発話を再テストします。

ナレッジグラフトレーニングについてさらに知る。

不適切な一致をマークする

ユーザー入力が誤ったタスクと一致する場合は、次の手順を実行して正しいインテントと一致させます。

一致したインテント名の上で、不適切な一致としてマークリンクをクリックします。
一致したインテントドロップダウンリストが開き、別のインテントが選択されます。
ユーザー入力に対応するインテントを選択し、Botをトレーニングします。

Botとの会話

バッチテスト

On this Page

Utterance Testing

To make sure your bot responds to user utterances with related tasks, it is important that you test the bot with a variety of user inputs. Evaluating a bot with a large sample of expected user inputs not only provides insights into bot responses but also gives you a great opportunity to train the bot in interpreting diverse human expressions.

You can perform all the training-related activities for a bot from the Utterance Testing module. We will use a sample Flight Booking bot consisting of the following tasks for use as examples across the Test and Train article.

Testing the Bot

Simply put, testing a bot refers to checking if the bot can respond to a user utterance with the most relevant task. Given the flexibility of language, users will use a wide range of phrases to express the same intent.

For example, you can rephrase I want to change my ticket from San Francisco to Los Angeles on Jan 1 as Please change my travel date. Can’t make it on Jan 1. The trick is to train the bot to map both of these utterances with the same intent or Modify Booking task.

So, the first step to start testing a bot is to identify a representative sample of user utterances to test the bot responses. Look for sources of data that reflect real-world usage of the language, such as support chat logs, online communities, FAQ pages of relevant portals.

How to test the bot

Follow these steps to test a bot:

Open the bot that you want to test.
Select the Build tab from the top menu.
From the left menu click Testing -> Utterance Testing.
In the case of a multiple intent model, you can select the Intent Model against with you want to test the utterance. The ML Engine will detect the intents only from the selected model.
In the Type a user utterance field, enter the utterance that you want to test. Example: Book a flight.
The result appears with a single, multiple, or no matching intents.

Types of Test Results

When you test a user utterance against a bot, the NLP engine tries to find the bot tasks that match the intent. The NLP engine uses a hybrid approach using Machine Learning, Fundamental Meaning, and Knowledge Graph (if the bot has one) models to score the matching intents on relevance. The model classifies user utterances as either being Possible Matches or Definitive Matches.

Definitive Matches get high confidence scores and are assumed to be perfect matches for the user utterance. In published bots, if user input matches with a single Definitive Match, the bot directly executes the task. If the utterances match with multiple Definitive Matches, they are sent as options for the end-user to choose one.

On the other hand, Possible Matches are intents that score reasonably well against the user input but do not inspire enough confidence to be termed as exact matches. Internally the system further classifies possible matches into good and unsure matches based on their scores. If the end-user utterances were generating possible matches in a published bot, the bot sends these matches as “Did you mean?” suggestions for the end-user.

Below are the possible outcomes of a user utterance test:

- Single Match (Possible or Definitive): The NLP engine finds a match for the user utterance with a single intent or task. The intent is displayed below the User Utterance field. If it is a correct match, you can move on to test the next utterance or you can also further train the task to improve its score. If it is an incorrect match, you can mark it as incorrect and select the appropriate intent.
- Multiple Matches (Possible or Definitive or Both): NLP engine identifies multiple intents that match with the user utterance. From the results, select the radio button for the matching task and train it.

Unidentified Intent: The user input did not match any task in any of the linked bots. Select an intent and train it to match the user utterance.

Entity Match

During testing of the bot, the matched entities are displayed. The entities from the utterance are processed in the following order – first NER and pattern entities and then the remaining entities.

Post rel8.0 of the platform, the details of how the entity was matched, and with what confident scores are also displayed. The details include:

Identification Engine – Machine Learning or Fundamental Meaning;
Training Type – match can be from NER, pattern training, entity name, system concept, etc.. In case of pattern match, click the row to get the details for the same;
Confidence Score identified by the ML engine using NER training (only when Conditional Random Field is selected as the NER model)

Analyzing the Test Results

When you test a user utterance, in addition to the matching intents you will also see an NLP Analysis box that provides a quick overview of the shortlisted intents, the NLP models using which they were shortlisted, corresponding scores, and the final winner.

Under the Fundamental Meaning tab, you can see the scores of all the intents even if they aren’t shortlisted.

As mentioned above, the Kore.ai NLP engine uses Machine Learning, Fundamental Meaning, and Knowledge Graph (if any) models to match intents. If the NLP engine finds a single Definitive Match through one of the underlying models, you will see the task as the matching intent. If the test identifies more than one definitive match, you will receive them as options to pick the right intent.

If the models shortlist more than one possible match, all the shortlisted intents are re-scored by the Ranking and Resolver using the Fundamental Meaning model to determine the final winner.

Sometimes, multiple Possible Matches secure the same score even after the rescoring in which case they are presented as multiple matches to the developer to select one. You can click the tab with the name of the learning model in the NLP Analysis box to view the intent scores.

Note: The NLP score is an absolute value and can only be used to compare against other tasks with the same input. Task scores cannot be compared across different utterances.

From each model dialog, clicking the icon on the top right will display the configurations and thresholds in place for the corresponding engines.

ML Model

The ML model tries to match the user input with the task label and the training utterances of each task. If the user input consists of multiple sentences, each sentence is run separately against the task name as well as the task utterances.

Click on the Machine Learning Model button to open the Machine Learning Model section of NLP Analysis. This shows only the names of the tasks that secure a positive score. In general, the more the number of training utterances that you add to a task, the greater are its chances for discovery. For more information, read Machine Learning.

FM Model

Apart from the ML model, each task in the bot is also scored against the user input using a comprehensive custom NLP algorithm that involves different combinations of task names, synonyms, and patterns.

The Fundamental Meaning (FM) Model tab shows the analysis for all the intents in the bot. Click the tab to view the scores of each task.

Clicking the Processed Utterance shows how the user utterance was analyzed and processed.

From rel 7.2, the FM engine generates the model in two ways, depending upon the language of the Bot.

Approach 1: Supported for German and French languages.

The word analysis factors pertaining to Original Word, Universal Parts of Speech, Dependency Relation and Related Word are elaborated.

Next, the score breakup for each of the intents processed is displayed. Selecting a scored intent (matched or eliminated) displays the details of the scoring for each word. This includes the words from the utterance and score assigned to each based upon the dependency parsing.

Approach 2: Supported for languages, other than the ones mentioned above.

The word analysis factors pertaining to Original Word, Role in the sentence and Processed word (in case of spell correction) are elaborated.

Words Matched: The score given for the number of words in the user input that matched words in the task name or a trained utterance for the task.
Word Coverage: The score given for the ratio of the words matched with that of the overall words in the task, including task name, field names, utterances, and synonyms.
Exact Words: The score given for the number of words that matched exactly and not by synonyms.
Bonus
- Sentence Structure: Bonus for the sentence structure match to the user input.
- Word Position: Score given to a word based on its position in a sentence Individual words towards the start of the sentence are given higher preference. Extra credit if the word is near to the sentence start.
- Order Bonus: Bonus for the number of words in the same order as the task label.
- Role Bonus: Bonus for the number of primary and secondary roles (subject/verb/object) matched.
- Spread Bonus: Bonus for the difference between the position of first and last matched words in a pattern. The higher the difference, the greater the score.
Penalty: Penalty if there are several phrases before the task name or if there is a conjunction in the middle of the task label.

Knowledge Graph

If the bot consists of a Knowledge Graph, the user utterances are processed to extract the terms and are mapped with the Knowledge Graph to fetch the relevant paths. All the paths containing more than a preset threshold of the number of terms get shortlisted for further screening. Path with 100% terms covered and having a similar FAQ in the path is considered a perfect match.

In case the utterance triggers a dialog (as per run a dialog option in KG), the same is displayed as matched intent and matched utterance. You can further train the bot as you would for an intent from ML or FM engine as detailed in the train section below.

Know more about Knowledge Graph Training from here.

Ranking and Resolver

Ranking and Resolver determines the final winner of the entire NLP computation. If either the ML model or the Knowledge Graph find a perfect match, the ranking and resolver doesn’t re-score the intent and presents it as a matched intent. Even if there are multiple perfect matches, they will be presented as options to the developers from which they can choose.

The Ranking and Resolver re-scores all the other good and unsure matches identified by the three models using the Fundamental Meaning model. After re-scoring, if the final score of an intent crosses a certain threshold, it too is considered as a match.

Selecting the Ranking and Resolver tab gives the details.

The ranking and details for each match can be viewed by selecting the matched utterance.

Depending upon the Bot language the scoring model can be:

based on a mixture of word roles, sentence/word positions, and word order; or
based on dependency parsing (supported for German and French languages)

The basis for intent elimination by Ranking & Resolver when the three engines return different definite/possible matches is as follows:

Intents matched based upon entity values like date, number etc., by the Machine Learning Model are eliminated.
All possible matches identified by any of the three engines are eliminated if a definitive match was found.
Definitive match eliminated if another definitive match was found prior to this in the user utterance – case when the user utterance includes two intents. For example, “Book me a flight and then book a cab” would match “Book Flight” and “Book Cab” but “Book Cab” is eliminated over “Book Flight”.
Intent pattern matches following a definitive intent match are eliminated. For example, user utterance “create a task to send an email” can match the intents “create task” and “send email”, in such cases the “send email” will be eliminated since it follows the intent “create task”
Intents with scores below the minimum value set in the Threshold and Configurations section are eliminated.
Definitive matches which match a Negative Pattern.
Intents for which the pre-conditions, in case defined, are not met are eliminated.
If the definitive match was from Knowledge Graph Engine by Search In Answer and there is another matched intent.

Training the Bot

Training is how you enhance the performance of the NLP engine to prioritize one bot task or user intent over another based on the user input. You should test and, if needed, train your bot for all possible user utterances and inputs.

Train the bot

After you enter a User Utterance, depending on the test result do one of the following to open the training options:
- For an unmatched intent: From the Select an Intent drop-down list, select the intent that you want to match with the user utterance.
- For multiple matched intents: Select the radio button for the intent you want to match.
- For a single matched intent: Click the name of the matched intent.
The user utterance that you entered gets displayed in the field under the ML Utterances section. To add the utterance to the intent, click Save. You can add as many utterances as you want, one after another. For more information, read Machine Learning.
Under the Intent Synonyms section, each word in the task name appears as a separate line item. Enter the synonyms for the words to optimize the NLP interpreter accuracy to recognize the correct task. For more information, read Managing Synonyms.
Under the Intent Patterns section, enter task patterns for the intent. For more information, read Managing Patterns.
When you are done making the relevant training entries, click Re-Run Utterance to see if you have improved the intent to get a high confidence score.

Train with FAQ

In case you want the Bot to respond to user utterance with FAQs there are two ways to do it:

set the terms, term configuration, or classes from the FAQ page, train the KG and retest the utterance.
add the utterance as an alternate question to the selected FAQ from the Knowledge Graph page, train the KG and retest the utterance.

Know more about Knowledge Graph Training.

Mark an Incorrect Match

When a user input matches an incorrect task, do the following to match it with the right intent:

Above the matched intent name, click the Mark as incorrect match link.
It opens the Matched Intent drop-down list to select another intent.
Select the corresponding intent for the user input and train the bot.

Talk to Bot

Batch Testing

On this Page