Testing Conversational AI

Parts of this guide have been published in the book ACCELERATING SOFTWARE QUALITY - Machine Learning & Artificial Intelligence in the Age of DevOps by Eran Kinsbruner.

Conversational Flow Testing

This section starts with some technical background on Botium and then demonstrates a methodology to identify and formalize test cases for the conversational flow of a chatbot. The conversational flow, often called “user stories”, can be visualized in a flow chart.

Hello, World! The Botium Basics

The most basic test case in Botium consists of

submitting a phrase possibly entered by a real user to the chatbot
checking the response of the chatbot with the expected outcome

BotiumScript

In Botium, the test cases are described by conversational flows the chatbot is supposed to follow. For a sample “greeting” scenario, the Botium test case looks like this — also known as “BotiumScript”:

#me
hello bot!

#bot
Hello, humanoid! How can I help you ?

You can write BotiumScript in several file formats

plain text file with Notepad or any other text editor
Excel file
CSV file (comma separated values)
JSON
YAML

Convos and Utterances

So, let’s elaborate the “Hello, World!”-example from above. While some users will say “hello”, others maybe prefer “hi”:

#me
hi bot!

#bot
Hello, humanoid! How can I help you ?

Another user may enter the conversation with “hey dude!”:

#me
hey dude

#bot
Hello, humanoid! How can I help you ?

And there are plenty of other phrases we can think of. For this most simple use case, there are now at least three or more BotiumScripts to write. So let’s rewrite it. We name this file hello.convo.txt:

TC01 - Greeting

#me
HELLO_UTT

#bot
Hello, humanoid! How can I help you ?

You may have noticed the additional lines at the beginning of the BotiumScript. The first line contains a reference name for the test case to make it easier for you to locate the failing conversation within your test case library. And we add another file hello_utt.utterances.txt:

HELLO_UTT
hello bot!
hi bot!
hey dude
good evening
hey are you here
anyone at home ?

The first BotiumScript is a convo file — it holds the structure of the conversation you expect the chatbot to follow.
The second BotiumScript is an utterances file — it holds several phrases for greeting someone, and you expect your chatbot to be able to recognize every single one of them as a nice greeting from the user.

Botium will take care that the convo and utterances files are combined to verify every response of your chatbot to every greeting phrase. So now let’s assume that your chatbot uses several phrases for greeting the user back. In the morning it is:

#me
HELLO_UTT

#bot
Good morning, humanoid! How can I help you this early ?

And in the evening it is:

#me
HELLO_UTT

#bot
Good evening, humanoid! How can I help you at this late hour ?

Let’s extract the bot responses to another utterances file:

BOT_GREETING_UTT
Good evening
Good morning
Hello
Hi

And now comes the magic, we change the convo file to:

#me
HELLO_UTT

#bot
BOT_GREETING_UTT

Utterances files can be used to verify chatbot responses as well. To summarize:

An utterance referenced in a #me-section means: Botium, send every single phrase to the chatbot and check the response
An utterance referenced in a #bot-section means: Botium, my chatbot may use any of these answers, all of them are fine

Identification of Test Cases

If the flow chart is available, identification of the test cases is actually straightforward: Each path through the flow chart from top to bottom is a test case. Here is the path for the user story “User composes customized bouquet”.

And here is the path for “User selects anniversary bouquet”.

Writing Test Cases for a conversational flow

In BotiumScript, the conversational flow for user story “User composes customized bouquet” can be expressed like this:

#me
I want to buy a bouquet

#bot
OK, do you want to compose a bouquet yourself ?

#me
Yes

#bot
OK, what kind of flowers would you like to add first ?

#me
Please add 5 red roses

#bot
Alright, I put 5 red roses in your lovely bouquet. Should I add anything else ?

#me
No, thanks.

#bot
Super!

As soon as a chatbot doesn’t respond as expected, the test case is considered as failed and reported.

Writing Utterance Lists

What the flow charts don’t show are the endless possibilities for a user to express an intent. For each node in the flow chart, there are various input and output utterances to consider. The flow chart typically pictures a “happy path” in the conversation, in a real-world scenario the same conversation path and test case should be satisfied with most usual utterances and utterance combinations.

For the “I want to buy a bouquet”, there are plenty of other ways for a user to express this intent:

“Give me some flowers”
“To the flower shop, please”
“purchase a bouquet”
…

All of these user examples are valid input for the same test case, and in Botium these user examples are collected within an utterance list in a text file:

UTT_USER_ORDER_FLOWERS
I want to buy a bouquet
Give me some flowers
To the flower shop, please purchase a bouquet

What the flow charts don’t show as well are the utterances used on the other side, by the chatbot itself: a well-designed chatbot provides some variance in conversation responses.

For example instead of “Okay! Would you like to compose a bouquet yourself” the chatbot might as well respond with:

“Do you want me to suggest a composition?”
“Is it for a special occasion” ?
…

These utterances can be collected in another utterance list and used in the test cases to allow the chatbot all responses matching one in this list. The first part of the user story “User composes customized bouquet” would then look like this:

#me
UTT_USER_ORDER_FLOWERS

#bot
UTT_BOT_COMPOSE_YN

The conversational flow remains the same, but there are many user examples and chatbot responses allowed now.

Dealing with Uncertainty

When using Botium, there are many options for asserting the chatbots behaviour - the most simple one, assertion of the text response, is shown above.

Asserting the presence of user interface elements, such as quick response buttons, media attachments, form input elements
Asserting with regular expressions and utterance lists
Asserting tone with a tone analyzer
- Validation that the chatbot tone matches the intended brand communication style
Asserting availability of hyperlinks presented to the user
Asserting custom message payload with JSONPath queries
Asserting business logic with API and data storage queries

Generating a Test Report

There are several frontends available for generating a test report with Botium.

Option 1: Botium CLI

Run Botium CLI like this:

botium-cli run

Botium CLI will build up a communication channel with your chatbot and run all of your test cases. Status information and a summary are displayed in the command line window.

Option 2: Botium Bindings

With Botium Bindings an established test runner like Mocha, Jest or Jasmine can be used for running Botium test cases.:

mocha ./spec

Option 3: Botium Box

Use the Quickstart Wizard to connect your chatbot to your test sets and run them.

Utterance/Convo Expansion

The process of merging the utterance files with the convo files is called Expansion. See this convo file:

#me
HELLO_UTT

And this utterances file:

HELLO_UTT
hello
hi
what's up

In case the convo expansion is disabled, the literal text HELLO_UTT is sent to the bot (which is most likely not what you want).
In case the convo expansion is enabled, the user examples from the utterances file are sent to the bot (which is most likely what you want).

Enabling Convo Expansion

In Botium CLI, it is enabled by default
In Botium Bindings, set the expandConvos option (see Botium Bindings)
In Botium Box, use the Scripting Settings of the test set

Utterance Expansion

There are cases when it makes sense to have utterance files only, without any convo files.

Testing for incomprehension (chatbot does not understand)
NLU/NLP Testing

Botium can then create convo test cases out of utterance files dynamically:

For incomprehension testing, it is possible to define a special INCOMPREHENSION utterance file - see SCRIPTING_UTTEXPANSION_INCOMPREHENSION
For NLU/NLP testing, Botium can check the returned NLU/NLP intent - see SCRIPTING_UTTEXPANSION_USENAMEASINTENT

Enabling Utterance Expansion:

In Botium CLI, use the –expandutterances yes command line switch
In Botium Bindings, set the expandUtterancesToConvos option (see Botium Bindings)
In Botium Box, use the Scripting Settings of the test set

End-2-End Testing (Chatbot User Interfaces)

Testing the user experience end-to-end has to be part of every test strategy. Apart from the conversation flow, which is best tested on API level, it has to be verified that a chatbot published on a company website works on most used end user devices.

The special challenges when doing E2E tests for a chatbot are the high amount of test data needed (> 100.000 utterances for a medium sizes chatbot) and the slow execution time - in an E2E scenario tests are running in real time. The good news are that for testing device compatibility, a small subset of test cases is sufficient.

Safe Assumptions when testing a chatbot user interface

When testing a chatbot with Selenium, there are some safe assumptions you can rely on to reduce effort when coding test cases:

The chatbot is accessible on a website and there maybe is some kind of click-through to actually open the chatbot window. The procedure to navigate and open the chatbot window is always the same for all test cases.
Somewhere in the chatbot window there is an input field for text messages. When hitting “Enter” or clicking on a button besides the input field the text will be sent to the chatbot.
Somewhere in the window the chatbot responds in some kind of list view. The text sent from the user is mirrored there as well.
The chatbot response contains text, pictures, hyperlinks and maybe quick response buttons to click

Based on these assumptions an experienced Selenium developer will build a page object model to reuse for all of the chatbot test cases.

Botium Webdriver Connector

If you ever worked with Selenium, you are aware that writing an automation script usually is a time-consuming task. Botium helps you in writing automation scripts for a chatbot widget embedded on a website and speeds up the development process by providing a parameterizable, default configuration for adapting it to your actual chatbot website with Selenium selectors and pluggable code snippets:

Website address to launch for accessing the chatbot
Selenium selector for identification of the input text field
Selenium selector for identification of the “Send”-Button (if present, otherwise message to the chatbot is sent with “Enter” key)
Selenium selector for identification of the chatbot output elements
Selenium capabilities for device or browser selection or any other Selenium specific settings

Note: Botium can work with any Selenium or Appium endpoint available - either with a virtual browser like PhantomJS, an integrated standalone Selenium service, your own custom Selenium grid, or with cloud providers like Perfecto Labs.

If there are additional steps (mouse clicks) to do on the website before the chatbot is accessible, you will have to extend the pre-defined Selenium scripts with custom behaviour as Javascript code.:

module.exports = async (container, browser) => {
  const ccBtn = await browser.$('#onetrust-accept-btn-handler')
  await ccBtn.waitForClickable({ timeout: 20000 })
  await ccBtn.click()

  const startChat = await browser.$('#StartChat')
  await startChat.waitForClickable({ timeout: 20000 })
  await startChat.click()
}

This code snippet does the following:

Waiting for a “Cookie Consent” button to appear on the website
Clicking this button to make the website usable
Waiting for a “Start Chat” button to appear and clicking it when available
Waiting until the basic chatbot interaction elements are visible

The full Botium configuration for this scenario looks like this:

{
  "botium": {
    "Capabilities": {
      "PROJECTNAME": "WebdriverIO Plugin Sample",
      "CONTAINERMODE": "webdriverio",
      "WEBDRIVERIO_OPTIONS": {
        "capabilities": {
          "browserName": "chrome"
        }
      },
      "WEBDRIVERIO_URL": "https://www.my-company.com",
      "WEBDRIVERIO_OPENBOT": "./snippets/openbot",
      "WEBDRIVERIO_INPUT_ELEMENT": "//input[@id='textInput']",
      "WEBDRIVERIO_INPUT_ELEMENT_SENDBUTTON": "//button[contains(@class,'bot__send')]",
      "WEBDRIVERIO_OUTPUT_ELEMENT": "//div[contains(@class,'from-watson')]"
    }
  }
}

With this configuration, all of your convo and utterances files can be used to run test cases with Botium and Selenium.

Voice Testing (Voice-Enabled Chatbots)

When testing voice apps, all of the principles from the previous sections apply as well. Some of the available voice-enabled chatbot technologies natively support both text and voice input and output, such as Google Dialogflow or Amazon Lex. Others are working exclusively with voice input and output, such as Alexa Voice Service. And all the other technologies can be extended with voice capabilities by inserting speech-to-text and text-to-speech engines in the processing pipeline.

For doing serious tests at least the chatbot response has to be available as text to use text assertions. Botium supports several text-to-speech and speech-to-text engines for doing the translations.

In addition to the well-known cloud services from Google and Amazon, Botium also has its own free and open source speech service included - Botium Speech Processing.

There is one good reason for using voice instead of text as input to your test cases, if there are historic recordings available when transitioning from a legacy IVR system. Such libraries often are a valuable resource for test data.

Continue to read about Voice App Testing in the Botium Wiki.