Hand-annotated data collection

Volunteers were provided with a printed dialogue transcript marked only with the spoken text itself, and codenames for the speakers of the form $s1, s2, \dots sn$ identifying the speaker of each utterance. The volunteers were asked to draw a line on the page wherever they felt the topic of the dialogue had changed, dividing the text of one topic from the text of the other. This line was allowed to be place anywhere, but had to divide the text unambiguously at that location, even if this was in the middle of a speaker turn or sentence (early trials with insufficient instructions resulted in some volunteers marking a line indicating an approximate area of topic change (such as within a turn) without placing its location exactly). These hand-annotated breaks were then re-integrated into a version of the original dialogue XML file for automatic processing as described in chapter 3.

The volunteers were given no more information than this--specifically, they were not told how many topic changes they should find in the document, whether the pattern of their appearance would be regular or irregular, or indeed whether any topic changes existed in the document they had received. They were also given no more information on the definition of a topic, leaving the interpretation to them. This was considered to be the best way to avoid the problem of `self-fulfilling prophecy', in which the subjects provide answers they expect the system to agree with. In this way, the purpose of the research--to investigate methods of segmenting topics in a meaningful way--is better preserved.

The ten test dialogues are as follows (with the exception of the MICASE dialogues, they are not explicitly described, and their content is described informally for the convenience of the reader):

This sample of dialogues is biased towards university recording environments (students and staff) but is otherwise fairly broad. The MICASE dialogues take place between students and advisers. The rest occur between two or more peers in a somewhat `guided' (that is, purposeful) meeting-type discussion.

Each of these dialogues was marked up by one of the group of volunteers to show topic changes. Where a volunteer marked up multiple dialogues, this fact was recorded to allow analysis of commonalities in personal style. These data were entered into the graphing system to produce plots showing the correlation between the hand-marked topic breaks and the system's automatically detected breaks (these graphs can be seen in section 5).

James Ballantine 2005-02-19