Definition: Knowledge Discovery

Knowledge discovery is a technique used for data mining in databases. The term ‘data mining’ here includes preparation and selection of data, cleaning of data, applying prior knowledge on sets of data and analysing the solutions from the observed results. Hence data discovery is essentially a process of finding hidden knowledge from large volumes of data. This knowledge can be utilized to better the decision making process and thereby the operational process of the organization.

The primary goal of knowledge discovery is to take out high level knowledge from a low level data set. There are a variety of methods to this discovery such as semantic query optimization, inductive learning, information theory, acquisition of knowledge for expert systems etc. Knowledge discovery process in databases is also supported by artificial intelligence which discovers empirical laws by observations and experiments. The patterns discovered in the data are called new knowledge and must be valid on the new data as well, having a certain degree of uncertainty.

Knowledge discovery in databases (KDD) follows some steps. They are:

a) Identification of the goal of KDD process from the end user’s perspective.

b) Understanding of domains and the knowledge required for the process.

c) Selection of sample data set.

d) Cleaning of data and strategy formulation to handle to handle the missing information and alteration of data as per requirement.

e) Simplification of data and analysis of useful features.

f) Matching of the goal with the data mining techniques to discover patterns.

g) Searching of patterns of interest of representational form (e.g. regression, clustering, summarization, classification)

h) Analysing useful knowledge from the mined patterns.

i) Incorporation of the knowledge extracted into another system.

j) Documentation and report making for the clients.

KDD is iterative and inherently interactive. Knowledge discovery has been applied in many areas like manufacturing (network management, controlling and scheduling), marketing (sales data analysis), scientific information (sky survey cataloging) etc.



