eWyzard Inc. - Developer News

Developer View contd...

The Server Side

An informal poll (6 people) revealed that 100 percent of all Americans despise voice menus as heard on typical telephone systems. Ok, so our sample is small, but really, have you ever heard a computerized telephone response system that you liked?

Several companies have just introduced software that finally makes telephone conversations with a computer the kind of easy interchange we would all like it to be. The software processes sentences and parses out "meaning" from them. The "meaning" can be then passed to some application which can use that meaning as a command to run some program for the telephone user's benefit. The dictation engine macros described elsewhere on this site do some single task based on one phrase only for a desktop user. The scope of server side "meaning" software is wider: they do multiple things derived from arbitrary phrases and sentences. And, typically, these things could also be done for simultaneous users who are remotely accessing an application via communication lines.

There is extraordinary flexibility in this type of software for constructing elaborate computer to user dialogues. These dialogues need not follow a strictly linear, fill in the blank sequence. The dialogues may have branches, loops and random meanderings just as would normal conversation.

Consider the following computer/person dialogue:
Computer: Would you like a fixed point mortgage or a variable rate loan?
Person: What's a point?
Computer: A point is (remainder of banking explanation...). Would you like a fixed point mortgage or a variable rate loan?
Person: How about variable?
Computer: We have variable rate loans starting at 10000 dollars in increments of 10000 all the way up to 100 thousand dollars. What amount would you like?
Person: As much as I can get.
Computer: Your credit rating would permit 70 thousand dollars maximum.
Person: Make it so.

In the above conversation the human's response was reasonable and would be easily interpreted by a native English listener. However, even though the computer asked questions that were essentially of the form "fill in the blank please," if you examine the responses carefully you will notice that the person never really answered any of the computer's questions directly and seldom filled the blank completely. To the first question, in fact, the person posed a question back to the computer! Yet, the computer landed on its feet and recovered. Imagine trying that on a typical touch tone menu.

Software from UNISYS (Natural Language Speech Assistant) allows construction of exactly this kind of robust, flexible dialogue. From the developer perspective, some new skills, or at least some new development activities, are required to get this flexibility.

Although, the software will map responses to the correct application command, it is the developer's responsibility to define two things: the range of possible responses that a human might give (called "grammars") and the command equivalents that are to be associated with those responses.

It turns out that these two activities are very non-trivial. A complete set of dialogues for a slightly fuller version of the above mortgage conversation, a version suitable for commercial use, might take something close to six months to design, approve, code, and test. And, the designer will need to know subtleties of English to the level of a sophisticated native speaker-- something of a surprise to programmers whose computer science degrees were short changed by substituting FORTRAN compiler writing for Creative Writing Workshop 101. The time estimate of six months presumes that the underlying database connectivity for credit checking exists and the hardware for telephone interaction exists. In other words, the estimate is only for the conversation design, and does not include resources needed for application or network or telephony support.

Back to top: client side...
Back to top: the server side...
More: developer SDK...

Choose from menu...