There is a formidable collection of knowledge to be mastered
for the developer working in the speech technology arena. We'll
define two important categories: the client side and the server
side.
The client side is the side which is most visible in the magazine
ads and consumer software stores. That side consists of what is
known as continuous speech dictation software. The server side
is what has also been historically called telephony.
The Client Side
While the dictation aspect currently attracts the most attention,
the modern developer should be aware that the "command and control"
aspect of these packages is the place which gives startling new
user productivity benefits. "Command and control" is the ability
to bop around the Windows environment without using any mouse
clicks or keystrokes, just voice commands. It is implemented to
some degree in all of the major dictation products.
The startling benefits come by using the "macro" facility,
also incorporated in these products, to extend both the dictation
and command/control aspects in significant ways. The macro facility
is the primary avenue of new user utility for the developer.
The macro facility allows the developer to encapsulate an arbitrary
collection of mouse clicks and keystrokes into one audio phrase.
The phrase can have any number of syllables, although the phrase
should be a short one for human factors reasons.
Take e-mail preparation as an example. Suppose the developer
observed that a user frequently sends e-mail to a certain group
of people. The developer could attach to the audio phrase "e-mail
the closing report to the East Coast reps" all of the following:
the Windows menu bar operations for report retrieving, the mouse
clicks starting the Internet connection, the arrow key movements
and menu closing keystrokes necessary for that message composition
and sending. Once these sequences are attached to the macro by
the developer, thereafter, in the specified application context,
saying the phrase would get the operation done.
The macro facility has the property of being able to capture
the graphical user interface state changes which we have all become
used to, but which are nevertheless complicated, onerous and time
consuming.
More: client side...
More: the server side...
More: developer SDK...