Having completed the population of all internal data structures, the system is capable of outputting the following information:
- Similarity values (
)
- Smoothed similarity values (
)
- Detected topic breaks (integers representing pseudosentence numbers)
- Human-annotated topic breaks
- Dialogue in XML format, with pseudosentence numbers marked, and optionally with automatic or hand-transcribed topic breaks marked
Each data type (except for the XML format) is output in a tab-separated flat-file, with a comment (signified by ``#'') naming the columns. For example, a smoothed-similarity-values output begins:
#Pseudosentence SmoothSimilarity
5 0.622067
6 0.634649
7 0.615261
8 0.566583
9 0.501763
...
James Ballantine
2005-02-19