Please have a look at this clip (1) of a speaker who produces a sequence of three syllables, one of which is made more prominent. Note that, even when the sound is removed as in this example (2), one can still observe the more prominent syllable. This task becomes a bit more difficult in the case of incongruent stimuli, such as this one (3) or this exaggerated one (4), i.e., artificially created stimuli where the visual and auditory cues to prominence do not occur on the same syllable. We have also experimented with a synthetic talking head (collaboration with CWI, Amsterdam), that speaks Dutch (5) or Italian (6), to test the cue value of eyebrow movements on the perception of accents.
We show you two clips of the same speaker who utters the word shakespeare, one where he is certain (1) and one in which he is rather uncertain (2). These clips were taken from an experiment where subjects were asked to repond to a series of factual questions; see this example (3).
Here we show clips of users interacting with a spoken dialogue system which is not always very helpful. You can tell from the following realizations of a city name whether a person has given this information for the first time (1), or after a recognition error (2). Also, you can see a user's increasing frustration about the system's performance in this sequence of productions of the word nee (3). Finally, the feedback is also clear from a user who is not talking at all, but just listens to a system prompt, which could give correct (4) or incorrect (5) information. Finally, you can synthetize this type of negative (6) or positive (7) feedback in a synthetic talking head (These clips were generated in a joint project with the CTT competence center at KTH (Sweden), using the Visualplayer software).
© 2003 - 2006 FOAP, Tilburg University. All rights reserved.