RSDM represents the architecture of a system that adds KDD capabilities to RDBMS. It also provides an API to easily add new Data Mining capabilities to the system in a way transparent to the final user. As a result, RSDM acts as a generic engine of Data Mining algorithms. Three main goals guided the design development of the system:
As it has been already mentioned, RSDM keeps the power of the relational system while adding Data Mining capabilities Rough Set methodology, generalization and relational database techniques among others have been integrated to provide the following features:
In order to get the functionalities just mentioned, the following architecture has been developed.
The third of goals of our design is to obtain an architecture hat allows efficient management of extra large volumes data to help companies to handle and analyse their data. Parallelism techniques have been used for the system to efficiently deal with extra large volumes of data.
In particular, Light Weight Process (LWP) technique has been used in the implementation of RSDM. The mentioned technique allows for execution of different processes in a concurrent way, running in a environment of shared memory. Applying LWP makes it possible to execute the set atomic operations in witch each algorithm is structured in a parallel fashion. Each of these operators can be applied to different set of target data.
A prototype of the generic engine is already working. Algorithms for the calculation of positive region and reduct are also working and it is possible to extract characteristic rules with the aid of them. Association rules algorithms are under development as well as graphical user interface.
RSDM has been conceived as an engine of KDD algorithms instead of a system that adds some particular capabilities. This approach has its advantages as well as disadvantages. On the one hand, the idea of building an engine of algorithms in contrast to all the exiting Data Mining system will allow to add new capabilities witch the only task of building the module that will execute such capability, avoiding the complex process of codifying of the algorithm but also for the communication, storing of intermediate results and so on. One the other hand, the process of construction of the architecture is more complex, and that is the reason why some data Mining capabilities are not yet available such as the association rules, prediction, generalization tasks to name a few. However it has to be remarked once again that adding any of these capabilities is a straightforward task once the architecture has been finished.
Integration with RDBMS, RSDM provides an API to integrate different commercial and non-comercial database management systems. Up to the present moment the system interfaces Postgres and Oracle.
Methodologies that has been applied in the algorithms. Rough ser theory as well as relational theory have been integrated by the discovery algorithms that have already been implemented in the system. Rough set operations have been translated first to relation algebra (when possible) and then to SQL in order to improve the efficiency. For a derailed of the algorithms see.
The architecture as well as the main properties of the system RSDM have been explained. Methodology used as well as advantages and disadvantages in comparison with other system have been discussed. We are currently working on the implementation and testing of tightly-coupled release of the algorithms as well as on the implementation of association rules, extraction and discretization modules. The design of proper Data Warehouse to help the mining tasks, is under development. As a result the data dictionary will be enhanced to support those data about the data that are necessary for the efficient mining of the database.
We are very much indebted for inspiration to Dr. Ziarko, Dr. Pawlak and Dr. Skowron. Thanks are due to Dr. Wasilewska and Dr. Hadjimichael for several helpful comments.