RoughFuzzyLab is a data mining, knowledge discovery software system based on rough and fuzzy sets theory. The system is specially predisposed to process databases containing images and decision tables. The system has been used for image recognition, hand-writen character recognition, prediction of time-series, and in the medical area of breast cancer detection.
The RoughFuzzyLab software system was developed in 1995 in Roman Swiniarski scientific group at San Diego State University. The system was designed, based on rough sets theory (Pawlak, 1991, Skowron, 1990) and fuzzy sets (Zadeh, 1965), for data mining and knowledge discovery. Two of major functions of the system are: extraction of important features and design and classifiers from a given data set. RoughFuzzyLab provides two different approaches in a design of classification rules from data. First technique uses rough sets based rule design (with an idea of minimum concept description). Second technique provides fuzzy sets methodology for rule design (using features selected by rough sets). System is specially effective for image recognition.
Input data to RoughFuzzyLab can be an ASCI data set containing:
The system allows to extract important features from images applicable for compression and recognition. For decision tables and classification task, system allows to find strongly relevant attribute set called core. Additionally, RoughFuzzyLab provides finding sets of weakly relevant attributes called reducts. A reduct is a minimal set of attributes describing all concept in a decision table. Eventually system finds rough sets and fuzzy sets based classification rules.
The software system was designed to be user friendly, effective and efficient. The graphical user interface provided by the system is the PC Windows interface which is very friendly and easy for the user. Results can be shown graphically and statically. A decision table is designed in the form of spreadsheet, so the user can understand and modify data quickly.
There are five main functions provided in the system:
Data editing and basic preprocessing provides variety of operations on data sets including powerful Image preprocessing functions. Images can be preprocessed, displayed, extensively edited, including noise adding, thinning, etc.
For the input data having continuous attributes (real-valued), RoughFuzzyLab provides several methods for attribute discretization including cluster analysis and statistical methods.
The major Feature extraction function relates to extraction of invariant features from images based on theory of complex Zernike moments (Swiniarski, 1993). This function also provides feature editing, pattern forming, and labeling patterns by classes.
Basic Rough sets functions are provided by the system for discovering: dependencies, set approximations, classification accuracy, feature importance, etc. (Swiniarski, 1995). Rough sets based classifier design and testing is also implemented in this main function. A user can build a rough sets classification rule base by Build Rough Rules. Then the user can load a test data file to do the classification by either Classify Current Case or Classify All Cases. Fuzzy Sets function provides users with a fuzzy recognition system. To build a fuzzy recognition system, a user builds a fuzzy rule case and membership functions with Build Fuzzy Rule which will generate a fuzzy sets file. It is important to note that a fuzzy rule base would be built based on a chosen reduct set in a decision table. Therefore, a user must select a reduct set to build a fuzzy rule base. Membership functions are automatically created by to provided methods. Fuzzy rules can be shown by Show Rules and membership functions can be show by Show Membership Functions. Besides, fuzzy rules can be shown graphically by graph of Rules and Classification. This function can also show the degree of membership functions when an object classified.
The software system was written in the “ANSI C” programming language and consists of a Windows interface and several “engine” programs as well as several include files. The interface and programs were complied with Borland C++ 3.1 (or Microsoft C/C++ 7.0) and run on a PCs with a Microsoft Windows 3.1 operating system. Although RoughFuzzyLab is a fully interface system with advanced graphical user interface, some batch processing possibilities are also available.
In RoughFuzzyLab it is assumed that the classification rules should be used automatically. Every new input object can be classified by this system. The constructed rules are also available in comprehensible from for the user.
Advantages of the system are: a family of image preprocessing functions, discretization methods, and invariant feature extraction from images. Another advantage is a merging rough sets and fuzzy sets fuzzy sets functionalities in a classifier design. A disadvantage of this version of the system is lack of tools for preprocessing of raw time-series. An extensive interactive graphical user interface is also an advantage of the system. The RoughFuzzyLab was in variety of applications (Swiniarski, 1996a, 1996b): including image, recognition, handwritten character recognition, breast cancer detection, prediction, etc.