Rosetta is a software system for knowledge discovery and date mining within the framework of rough set theory. More than a flexible collection of algorithms, ROSETTA also offers a user-friendly GUI environment in which objects can be interactively manipulated and processed. The system is designed to support the overall knowledge discovery process - from initial browsing and preprocessing of the data, via reduct computation and rule generation, to validation and analysis of the extracted rules.
As with all fields concerning themselves with empirical modeling, knowledge discovery and data mining have a high experimental convent. The modeling process thus necessitates a set of tools that are both very flexible and user - friendly. Generally available software systems for this have been scare, and rough set oriented ones even more so. In response to this, the ROSETTA is toolkit for knowledge discovery and data mining within the framework of rough set theory. Using tables with historical data, its basic purpose is to compute relevant feature subsets and generate classification rules. An extensive support environment is included around this - both in the form of large base of algorithms, and by setting the tools in highly intuitive GIU environment such that intermediate results can be viewed and analyzed, knowledge of rough set concepts, although the user - friendly GUI lowers this threshold. Also, the system can be configured to cater for less experienced users by allowing scripts to be run that partially automate the modeling process.
ROSETTA is not tired up to any particular application domain, but has already served as a research tool in different fields. A restricted version of the system is made publicly available on the Internet for non - commercial use. ROSETTA runs on 32-bit Windows platforms.
As its basic input, ROSETTA takes flat data tables. Intergeneration with a diverse range of data sources is possible, as ROSETTA can interface directly with such by means of ODBC. This means that tables and/or views in e.g. a spreadsheet or a relation DBMS may be analyzed directly.
Since a fundamental premise of rough set theory objects are perceived only through the information that is available about them, any background knowledge is assumed incorporated into the tables to analyze if such is to be used. In the current version ROSETTA does not support type hierarchies, although some simple metadata can be supplied.
Many structural objects are output from ROSETTA that are presented in the GUI, e.g. tables, reducts, rules, confusion matrices, partition and set approximations. Also, very detailed output may be generated and output ASCII log files and HTML documents.
Most structural object are exportable to alien formats, e.g. to Prolog. This opens up a connection to other more advanced inference engines, where also any available domain theories can be utilized.
ROSETTA was designed for extensibility, and the list features given is likely to grow.
Knowledge discovery and data mining the within the framework of rough sets covers several issues, most of with are implemented in ROSETTA. Features currently offered by computation kernel include algorithms for:
The ROSETTA GUI is a user-friendly environment for interactively manipulating data and triggering computations. With is, the user may control the flow of structures in the knowledge discovery pipeline; from selection of target data, preprocessing and transformation, through the actual data mining step, to interpretation and evaluation of discovered patters. Some of the features currently offered by the ROSETTA GUI include:
In its present form, the ROSETTA GUI does offer support for advanced graphical presentation and other visual techniques for knowledge discovery and data mining.
The space of rules considered by ROSETTA consists of if-them rules with a conjunctive antecedent and a disjunctive consequent. As a rules is trivially generated from a table once a suitable attribute (feature) subset is found, the major computation effort lies in calculating reducts, or approximations of such. A reduct is a minimal attribute (feature) subset that preserves an indiscernibility relation. Such a relation may be formulated either for the full system or relative to a particular class of object.
Computing reducts is equivalent to computing prime implicants of a Boolean function, an NP-hard problem. An exhaustive search is thus not suitable for large tables. ROSETTA therefore offers heuristics for search and approximation based on both resampling techniques and genetic algorithms. Also, one may view discretization as a preprocessing step that may potentially significantly ease this search.
Discovered patterns should also be interesting and useful, and filtering of generated structures may be performed based on quantities such as e.g. support counts, probabilities and user-supplied information about attribute costs.
The development of ROSETTA was supported in part by the European Union 4th Framework Telematics project CARDIASSIST, by the Human Capital and Mobility Norwegian research Council (NFR) contract #101341/410, by NFR grant #74467/410, by NFR grant for Cooperation with Central Europe, by National Committee for Scientific Research in Poland under grant #8T11C01011 and by the ESPRIT project 20288 CRIT-2.