Starting from our knowledge about auditive communicative signals, the goal of the FOAP research project is to create an empirically based, computational model of audiovisual prosody that can be used both for production and perception. In different substudie, we try to address one or more specific instances of the following more general questions:
- Speaker-related: Which factors determine the distribution of cues across different modalities for signaling different communicative functions? Is visual prosody an equally flexible parameter as verbal prosody, or are there particular factors, e.g. physiological constraints, that limit the variation space in the visual domain?
- Listener-related: How do listeners integrate and interpret audiovisual cues coming from different modalities? To what extent are these different cues complementary, reinforcing, or contradictory? How natural and/or functional are specific cue combinations?
- Relation to other linguistic features: Are there specific dependencies between audiovisual prosody and other linguistic devices (e.g. word order, lexical cues) for marking various functions? How effective are audiovisual cues in larger discourse contexts?
- Crosslinguistic: Taking observations on Dutch as a starting point, are there important differences in the way languages exploit audiovisual prosody to signal various communicative functions, given that these may be very distinct regarding other linguistic features as well?
The computational model to be built from the empirical results will compute the communicative function of an utterance by integrating the contribution of cues from different sources (visual, verbal and lexico-syntactic).
In order to address the questions mentioned above, the different studies make use of one of the following techniques, alone or in combination
Analysis-by-observation: One set of investigations is based on an analysis-by-observation technique. This means that we explore speaker and listener behaviour in real interactions. To this end, we use a particular experimental paradigm which we have successfully applied in previous studies of verbal prosody. In particular, we use different forms of task-oriented types of descriptive language usage, such as particular dialogue games in which participants need to give each other verbal instructions. The technique often yields relatively spontaneous speech data, but has as a main advantage that the experimenter has explicit control over both the content of the spoken messages and the order in which information is dealt with by a speaker.
Analysis-by-synthesis: The results of the analysis-by-observation technique, in combination with reported findings in the literature, are used as input for an analysis-by-synthesis technique, in which we perceptually test the functional validity of audiovisual prosody using synthetic stimuli whose verbal and visual parameters are explicitly controlled in an orthogonally designed setup. The perceptual testing is done in an experimental room or via the internet through a web browser. To this end, we make use of dynamically changing talking heads whose visual and auditive parameters are explicitly controlled, or artificially manipulated movies of real human speakers.