Audio topic segmentation (in the case of [14] working on broadcast news recordings) often makes use of information not available in a transcript, such as prosodic and pitch-change cues in the recorded voice signal. This is an alternative approach to topic segmentation in spoken dialogue, and while complimentary to present research it is outside its scope.
Multimedia segmentation looks for different kinds of segments entirely; in [21], Wilcox and Boreczky work on audio and video data, using Hidden Markov Models to detect camera shot changes, transitions, fades, and dissolves. While useful, this is a different kind of segmentation, analogous to using paragraph breaks as topic markers in text: it relies on the original document containing markers hoped to coincide with topic change, and does not attempt to find topic change in the content itself.