Cue phrases

Within the literature on automatic topic segmentation, the most widely cited research is Hirschberg and Litman's Empirical Studies on the disambiguation of cue phrases [7].

Cue phrases are words and phrases such as now and well (along with many others described in section 2.1.3) which serve primarily to indicate document structure or flow, rather than to impart semantic information about the current topic. Of particular interest to this literature review is their capacity to indicate imminent or recent topic change, especially in dialogue where speakers may use cue phrases to indicate desire to change topic.

Hirschberg's research aims to address the problem that most cue phrases are ambiguous; depending on context, they may be a true cue phrase, or may have a purely sentence-semantic role. Consider the two examples in figure 2.2:

Figure 2.2: Ambiguous cue-phrases
\begin{figure}\texttt{Nobody really expects it to work. Incidentally, the last t...
...er arrives incidentally. It's not the sort of thing you can plan.}\end{figure}

In The first sentence, the phrase ``incidentally'' is used to introduce a digression in the story--clearly, here it's used as a cue phrase. In contrast, the second sentence uses ``incidentally'' as an adverb; in Hirschberg's terminology, this use is sentential, meaning it has only semantic information and no structural relevance.

Another example, provided by Hirschberg, shows the word ``now'' used as a cue phrase and sententially in the same utterance:
Now now that we have all been welcomed here it's time to get on
with the business of the conference.

Hirschberg performs an empirical analysis of the appearance of both cue-phrase and sentential forms of ambiguous phrases to determine methods of disambiguation. Her conclusion is that prosodic information--pitch curvature and pause duration--is the most important available feature to disambiguate cue phrases, where this information is available. However, of more relevance to this literature review, she considers the case of distinguishing cue phrase use in transcriptions of speech. Using her extensive collection of phrases disambiguated using her optimal prosodic analysis technique (which will not be examined here) to inform the investigation, she examines the textual transcriptions of prosodically analysed speech to discover orthographic disambiguation cues.

Making use of simple orthographic information--the presence of punctuation immediately before the phrase, punctuation immediately after it, a turn-change indicator immediately before the phrase (that is, the phrase occurring at the start of a turn), or a combination of these--Hirschberg correlates these specific orthographic features to disambiguated phrases from predetermined prosodic information. Using the phrase ``now'', of the occurrences where it was in the first position intonationally, it was preceded by punctuation 56.7% of the time, and preceded by a speaker turn 28.3% of the time. This means that 85% of cases where prosodic information determines that ``now'' is introducing an intonational phrase, it is indicated by punctuation or other information available in a (properly punctuated) transcript. More interestingly, in the test set there was a 100% precision rate--no instances of ``now'' were preceded by punctuation or a speaker turn that were not prosodically marked as the start of an intonational phrase. Given that in Hirschberg's corpus, 93.7% of true cue-phrase (discourse) occurrences of ``now'' were in fact first in their intonational phrase, this simple metric seems highly effective in disambiguating at least the cue-phrase ``now'' using only transcription. Hirschberg deduces that in total, 80% of discourse uses of ``now'' may be determined through orthographic means alone. Again, this assumes a punctuationally correct transcript--a feature lacking in many transcripts in practice.

Secondarily, Hirschberg examined the correlation of pairs of potential cue phrases occurring adjacent to one another and their status as true cue phrases. Although she admits the data is sparse, out of 26 discourse uses of cue phrases preceded by other cue phrases, 20 (76.9%) were also discourse uses. Correspondingly, 21 out of 29 sentential uses preceded by a potential cue phrase (72.4%) were in fact preceded by a sentential use.

A small boost to accuracy is given by Hirschberg's use of a part-of-speech tagger: Based on a human judged corpus, potential cue-phrases taking particular parts of speech which are usually tagged as sentential are designated sentential by the algorithm, and vice versa. This simple metric alone can tag 63.9% of potential cue phrases as sentential or discourse correctly.

Hirschberg's research shows that certain cue phrases can be relatively accurately determined from transcriptions of speech alone. This does not directly translate to a topic segmentation solution, but it means that certain aspects of discourse structure, a necessary element of true topic understanding as shown in [9], can be determined from transcripts.

James Ballantine 2005-02-19