DO's and DON'Ts for Conversational Design

Creating a robust set of intents for a successful skill requires a lot of attention. Here are some best practices to keep in mind.

Intent Design and Training

DO DON'T
DO plan to add utterances until you get results you expect. Generally speaking, models perform well as you add more quality training utterances. The number of utterances you need depends on the model, the training data, and the level of accuracy that is realistic for your model. DON'T over-train individual intents. Don’t add excessive training data to some intents to make them work "perfectly". If intent resolution is not behaving as expected, evaluate your intent structure for overlap between intents. Intent resolution will NEVER be 100% accurate.
DO use real world data. Using the actual language that your skill is most likely to encounter is critical. Fabricated utterances can only take you so far and will not prepare your skill for real-world engagement. DON'T use just keywords in training data. While it is acceptable to use single words/short phrases for training, the training data should have the same structure as the user’s inputs. The fewer the words in utterances, the less successful classification will be.
DO use whole sentences to train intents. While it’s OK to use short training utterances, be sure to match the conversational style of your users as closely as possible. DON'T inadvertently skew intents. Be careful of words which add no specific meaning (e.g. "please" and "thanks") or entity values within utterances as they can inadvertently skew intent resolution if they are heavily used in one intent but not in another.
DO use similar numbers of utterances per intent. Some intents (e.g., "hello", "goodbye") may have fewer utterances in their training sets. However, ensure that your main intents have a similar number of utterances to avoid biasing your model. DON’T rely ONLY on intent resolution. Use entities to disambiguate common intents. If there’s linguistic overlap between intents, consider using entities to disambiguate the user’s intentions (and corresponding unique conversational path).
DO handle small talk. Users will make requests that are not relevant to the skill's purpose, such as for jokes and weather reports. They may also do things like ask if the skill is human. Ensure that you have a small talk strategy and aggressively test how the skill responds at all steps of your conversational flow. DON’T overuse unresolvedIntent. Create “out-of-scope" intents for the things you know you don't know (that you may or may not enable the skill to do later).
DO consider multiple intents for a single use case. Customers may express the same need in multiple ways, e.g. in terms of the solution they desire OR the symptom of their problem. Use multiple intents that all resolve to the same "answer". DON’T ignore abusive interactions. Similar to small talk, have a plan for abuse. This plan may need to include measures to ensure any abusive input from the user is not reflected back by the skill, as well as provisions for immediate escalation.

Conversational User Experience

DO DON'T
DO give indications of most likely responses (including help and exit). For example, "Hey, I'm Bob the Bot. Ask me about X, Y, or Z. If you run into any problems, just type 'help'." DON'T delay conversational design until "later in the project". For all but the simplest skills, conversational design must be given the same priority and urgency as other development work. It should start early and proceed in parallel with other tasks.
DO consider a personality for your bot. You should consider the personality and tone of your bot. However, be careful of overdoing human-like interaction (humor and sympathy often don't resonate well from a bot) and never try to fool your users into thinking that they are interacting with a human. DON'T say that the skill "is still learning". While well-intended, this bad practice signals to the user (consciously or subconsciously) that the skill is not up to the task.
DO guide the user on what is expected from them. The skill should try to guide the user toward an appropriate response and not leave questions open ended. Open-ended questions make the user more likely to fall off the happy path. DON'T use "cute" or "filler" responses. See "DO guide the user on what is expected from them".
DO break up long responses into individual chat bubbles and/or use line breaks. Large blobs of text without visual breaks are hard to read and can lead to confusion. DON'T say "I’m sorry, I don’t understand. Would you please rephrase your question?" This lazy error-handling approach is, more often than not, inaccurate. No matter how many times a user rephrases an out-of-scope question, the skill will NEVER have anything intelligent to say.
-- DON'T overuse "confirmation" phrases. Confirmation phrases have their place. However, don’t overuse them. Consider dialog flows that are able to take confidence levels into account before asking users to confirm.

Test Strategies

DO DON'T
DO develop utterances cyclically. Developing a robust training corpus requires multiple iterations and testing cycles and ongoing monitoring and tuning. Use a cyclical "build, test, deploy, monitor, update" approach. DON'T neglect the need for a performance measurement and improvement plan. Lacking a plan for measuring and improving your skill, you'll have no way of knowing whether it’s really working.
DO test utterances using the 80/20 rule. Always test the robustness of your intents against one another by conducting multiple 80/20 tests, where 80% of newly harvested utterances are used to train the model and 20% are added to your testing data. DON'T test only the happy path. "Getting it working" is 20% of the work. The remaining 80% is testing and adjusting how the skill responds to incorrect input and user actions.
DO test skill failure. Aggressively try to break your skill to see what happens. Don’t rely solely on positive testing. DON'T ignore processing out of order messages. Users will scroll back in conversation history and click on past buttons. Testing the results need to be part of your 80% work (as noted in DON'T test only the happy path).
-- DON’T forget to re-test as you update your intents. If you add more training data (e.g., as you bot gets more real-world usage) and/or you add new intents for new use cases, don’t forget to retest your model.

Project Considerations

DO DON'T
DO select use cases that are enhanced by conversational UI (CUI). Enabling conversational UI (via skills and digital assistants) is work. Make sure that the use case will be truly enhanced by adding CUI. DON'T fail to have an escalation path. Even if you don’t plan on allowing escalation to a human, you must have a strategy for those interactions where the skill can’t help.
DO anticipate the first day being the worst day. Even the best-tested skills and digital assistants require tuning on day 1. DON'T disband the project team immediately after launch. When scheduling your skill project, ensure that you keep the skill’s creators (Conversational Designer, Project Manager, Tech Lead, etc.) on the project long enough for adequate tuning and, ultimately, knowledge transfer.