TextTiling for visual clarity

In [17], Hearst works with Rao, et al., to implement an interactive query-based information retrieval system referred to as an ``information workspace''. This system makes use of TextTiling to make text easier to read at a glance.

The information workspace system is designed to be a workstation facilitating easy and fast access to heterogenous data. It allows users to use iterative query refinement to locate pertinent data from multiple data sources. It allows better visualisation of interlinked documents, and supports parallel simultaneous tasks along with a framework for different kinds of activity (such as research and subsequent analysis of data located) without having to change the working environment.

Large bodies of text on a screen are intimidating and difficult to navigate quickly. Once the system has returned a document, the TextTiling algorithm is used to provide a visual indication of topics within the document by placing boxes around each topic on the page. This theoretically allows the user to skip ahead `topic at a time' in trying to locate the section of the document he or she is interested in. Secondarily, it also provides a visual aid when reading a particular topic of interest, to let the reader know how long the current topic is, and when it has ended.

It could be argued, however, that this application of TextTiling is inappropriate--if the author had intended there to be topic markings within the document purely for the purpose of clarity, would he or she not have inserted subject headings or section numbers initially? This manner of automatic insertion of metadata may possibly distort the meaning of certain documents: Topic segmentation is a subjective affair, with no absolute inter-annotator agreement [6], and as such the insertion of authoritative-seeming topic breaks within a document for the purpose of clarity may not be appropriate.

James Ballantine 2005-02-19