The Server Side
An informal poll (6 people) revealed that 100 percent of all
Americans despise voice menus as heard on typical telephone systems.
Ok, so our sample is small, but really, have you ever heard a
computerized telephone response system that you liked?
Several companies have just introduced software that finally
makes telephone conversations with a computer the kind of easy
interchange we would all like it to be. The software processes
sentences and parses out "meaning" from them. The "meaning" can
be then passed to some application which can use that meaning
as a command to run some program for the telephone user's benefit.
The dictation engine macros described elsewhere on this site do
some single task based on one phrase only for a desktop user.
The scope of server side "meaning" software is wider: they do
multiple things derived from arbitrary phrases and sentences.
And, typically, these things could also be done for simultaneous
users who are remotely accessing an application via communication
lines.
There is extraordinary flexibility in this type of software
for constructing elaborate computer to user dialogues. These dialogues
need not follow a strictly linear, fill in the blank sequence.
The dialogues may have branches, loops and random meanderings
just as would normal conversation.
Consider the following computer/person dialogue:
Computer: Would you like a fixed
point mortgage or a variable rate loan?
Person: What's a point?
Computer: A point is (remainder of
banking explanation...). Would you like a fixed point mortgage
or a variable rate loan?
Person: How about variable?
Computer: We have variable rate loans
starting at 10000 dollars in increments of 10000 all the way up
to 100 thousand dollars. What amount would you like?
Person: As much as I can get.
Computer: Your credit rating would
permit 70 thousand dollars maximum.
Person: Make it so.
In the above conversation the human's response was reasonable
and would be easily interpreted by a native English listener.
However, even though the computer asked questions that were essentially
of the form "fill in the blank please," if you examine the responses
carefully you will notice that the person never really answered
any of the computer's questions directly and seldom filled the
blank completely. To the first question, in fact, the person posed
a question back to the computer! Yet, the computer landed on its
feet and recovered. Imagine trying that on a typical touch tone
menu.
Software from UNISYS
(Natural Language Speech Assistant) allows construction of exactly
this kind of robust, flexible dialogue. From the developer perspective,
some new skills, or at least some new development activities,
are required to get this flexibility.
Although, the software will map responses to the correct application
command, it is the developer's responsibility to define two things:
the range of possible responses that a human might give (called
"grammars") and the command equivalents that are to be associated
with those responses.
It turns out that these two activities are very non-trivial.
A complete set of dialogues for a slightly fuller version of the
above mortgage conversation, a version suitable for commercial
use, might take something close to six months to design, approve,
code, and test. And, the designer will need to know subtleties
of English to the level of a sophisticated native speaker-- something
of a surprise to programmers whose computer science degrees were
short changed by substituting FORTRAN compiler writing for Creative
Writing Workshop 101. The time estimate of six months presumes
that the underlying database connectivity for credit checking
exists and the hardware for telephone interaction exists. In other
words, the estimate is only for the conversation design, and does
not include resources needed for application or network or telephony
support.
Back to top: client side...
Back to top: the server side...
More: developer SDK...