Error message

  • Notice: Undefined variable: _SESSION in tracking_init() (line 27 of C:\xampp\htdocs\rsds\sites\all\modules\rsds\tracking\tracking.module).
  • Warning: file_get_contents(http://user-agent-string.info/rpc/get_data.php?key=free&format=ini&ver=y): failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in UASparser->get_contents() (line 247 of C:\xampp\htdocs\rsds\sites\all\modules\rsds\tracking\UASparser\UASparser.php).
  • Warning: file_get_contents(http://user-agent-string.info/rpc/get_data.php?key=free&format=ini): failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in UASparser->get_contents() (line 247 of C:\xampp\htdocs\rsds\sites\all\modules\rsds\tracking\UASparser\UASparser.php).

SOFTWARE

Browse software:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Rosetta
Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, NORWAY
Institute of Mathematics, Warsaw University, Warsaw, POLAND
Polish-Japanese Institute of Computer Techniques, Warsaw, POLAND


Abstract

Rosetta is a software system for knowledge discovery and date mining within the framework of rough set theory. More than a flexible collection of algorithms, ROSETTA also offers a user-friendly GUI environment in which objects can be interactively manipulated and processed. The system is designed to support the overall knowledge discovery process - from initial browsing and preprocessing of the data, via reduct computation and rule generation, to validation and analysis of the extracted rules.

Introduction

As with all fields concerning themselves with empirical modeling, knowledge discovery and data mining have a high experimental convent. The modeling process thus necessitates a set of tools that are both very flexible and user - friendly. Generally available software systems for this have been scare, and rough set oriented ones even more so. In response to this, the ROSETTA is toolkit for knowledge discovery and data mining within the framework of rough set theory. Using tables with historical data, its basic purpose is to compute relevant feature subsets and generate classification rules. An extensive support environment is included around this - both in the form of large base of algorithms, and by setting the tools in highly intuitive GIU environment such that intermediate results can be viewed and analyzed, knowledge of rough set concepts, although the user - friendly GUI lowers this threshold. Also, the system can be configured to cater for less experienced users by allowing scripts to be run that partially automate the modeling process.

ROSETTA is not tired up to any particular application domain, but has already served as a research tool in different fields. A restricted version of the system is made publicly available on the Internet for non - commercial use. ROSETTA runs on 32-bit Windows platforms.

Input and output

As its basic input, ROSETTA takes flat data tables. Intergeneration with a diverse range of data sources is possible, as ROSETTA can interface directly with such by means of ODBC. This means that tables and/or views in e.g. a spreadsheet or a relation DBMS may be analyzed directly.

Since a fundamental premise of rough set theory objects are perceived only through the information that is available about them, any background knowledge is assumed incorporated into the tables to analyze if such is to be used. In the current version ROSETTA does not support type hierarchies, although some simple metadata can be supplied.

Many structural objects are output from ROSETTA that are presented in the GUI, e.g. tables, reducts, rules, confusion matrices, partition and set approximations. Also, very detailed output may be generated and output ASCII log files and HTML documents.

Most structural object are exportable to alien formats, e.g. to Prolog. This opens up a connection to other more advanced inference engines, where also any available domain theories can be utilized.

System features

ROSETTA was designed for extensibility, and the list features given is likely to grow.

Knowledge discovery and data mining the within the framework of rough sets covers several issues, most of with are implemented in ROSETTA. Features currently offered by computation kernel include algorithms for:
  • Preprocessing of data tables with missing values.
  • Discretization of numerical attributes
  • Computation of (approximate or absolute) reducts and rules
  • Filtering of reduct and reducts and rules according to specified evaluation criteria
  • Classification of new object with synthesized rules using voting schemes
  • Computing rough set approximations
Also, ROSETTA can execute command scripts, hence enabling automation of lengthy and repetitive user-specified command sequences without to use of the GUI. Examples of such are algorithmic pipelines and n-fold crossvalidation.

The ROSETTA GUI is a user-friendly environment for interactively manipulating data and triggering computations. With is, the user may control the flow of structures in the knowledge discovery pipeline; from selection of target data, preprocessing and transformation, through the actual data mining step, to interpretation and evaluation of discovered patters. Some of the features currently offered by the ROSETTA GUI include:
  • Full Windows GUI conformance
  • Organization of project items in trees in order to retain data-navigation abilities
  • Viewing of all structures in intuitive grid environments using terms from the modeling domain
  • Context-sensitive pop-up menus and drag-and-drop functionality
  • Automatic generation of annotations that document the steps taken in a modeling session
The tree organization of a project and the automatic generation of annotations facilitate experimenting with steps and parameter settings in knowledge discovery process. For every step it is straightforward to create an alternative development, represented by a new branch in the tree. By allowing branching, more flexibility is offered than with a traditional line-oriented modeling paradigm.

In its present form, the ROSETTA GUI does offer support for advanced graphical presentation and other visual techniques for knowledge discovery and data mining.

Search for knowledge

The space of rules considered by ROSETTA consists of if-them rules with a conjunctive antecedent and a disjunctive consequent. As a rules is trivially generated from a table once a suitable attribute (feature) subset is found, the major computation effort lies in calculating reducts, or approximations of such. A reduct is a minimal attribute (feature) subset that preserves an indiscernibility relation. Such a relation may be formulated either for the full system or relative to a particular class of object.

Computing reducts is equivalent to computing prime implicants of a Boolean function, an NP-hard problem. An exhaustive search is thus not suitable for large tables. ROSETTA therefore offers heuristics for search and approximation based on both resampling techniques and genetic algorithms. Also, one may view discretization as a preprocessing step that may potentially significantly ease this search.

Discovered patterns should also be interesting and useful, and filtering of generated structures may be performed based on quantities such as e.g. support counts, probabilities and user-supplied information about attribute costs.

Acknowledgements

The development of ROSETTA was supported in part by the European Union 4th Framework Telematics project CARDIASSIST, by the Human Capital and Mobility Norwegian research Council (NFR) contract #101341/410, by NFR grant #74467/410, by NFR grant for Cooperation with Central Europe, by National Committee for Scientific Research in Poland under grant #8T11C01011 and by the ESPRIT project 20288 CRIT-2.