next up previous
Next: System Overview Up: The Haiku Visualisation System Previous: Abstract

Introduction

Real world knowledge discovery takes place in databases containing high dimensional and structured data of mixed types. Previous systems for visualisating this high dimensional data have concentrated on numeric data or made use of domain knowledge to generate specialised visualisations of discoveries.

The case for integration of visualisation and knowledge discovery has been made both implicitly and explicitly in many papers [2] [3]. This is because the human visual processing system is highly parallel and possesses good pattern detection abilities when data is presented in the correct format. However, the insights gained by visualisation are normally expressed in an informal way, and not verifiable. Machine based systems for pattern detection can be thorough and make statistically verifiable discoveries, but are generally serial and lack the flexibility of humans. For these reasons encouraging a fusion between human and machine discoveries abilities is very important. For this to be achieved, two way communication between visualisation and machine discovery data mining techniques need to be provided.

It is important to allow both the raw data and the discoveries made about that data to be visualised concurrently. Within this visualisation, the relationships between discovered knowledge and the original data are then clear.

It is also necessary for the human data miner to be able to pass on information about their visual discoveries to their computer based tools This allows the tools to focus on areas of particular interest, provide statistical verification of visual discoveries, or generate formal descriptions of patterns observed. One way to achieve this is to select one or more subsets of the data in the visualisation (this is known as ``brushing'') and pass these subsets back to the machine discovery component. The subsets would be accompanied with information about the type of response required, for example, production of a discriminatory rule.

Interactivity, and hence speed of visualisation, is an important factor in symbiotic knowledge discovery. Informative visualisations which require several minutes to render do not encourage the investigation of different views, or user-lead exploration of the data set. Long rendering times also rule out real time animation, which can add a valuable extra dimension to all visualisation techniques.

This paper describes the Haiku (Hybrid Assistant for Interactive KDD and User-lead-discovery) system for the visualisation of data, discoveries and their relationships. The system make use of a machine discovery system but is independent of the discovery method used. It is designed to be used in an iterative, exploratory discovery process involving both machine-based and human-lead discovery.

Much previous work has concentrated on the visualisation of either raw data [4] [5] [6] [2] or discovered knowledge [3] rather than the relationships between the two. One of the key aims of KDD is that the discoveries made should be understandable. However, the textual presentation of discoveries, often with associated statistics omits some valuable information about the interaction between data and discovered knowledge. This includes information about overlapping discoveries, in which multiple discoveries refer to the same data. It is this information that the Haiku system aims to make visible. Haiku aims to complement traditional representations of discoveries, and not to replace them.



next up previous
Next: System Overview Up: The Haiku Visualisation System Previous: Abstract



Andy N Pryke
Tue May 14 17:02:46 BST 1996