The Templates system is a software system, to support a kind of analysis of temporal data. During the analysis temporal templates are discovered in temporal input data as well as regularities between them expressed in the form of production rules IF THEN. It is possible to simulate both unsupervised and supervised lerning processes. To estimate quality of discovered knowledge a classifier is implemented.
One of the aspects of data mining process is the analysis of the data that change in time, that is so-called temporal data. The need of knowledge discovering from such type of data sets arises in many fields of human activity, e.g. in computer science, economy, medicine, etc. The Templates system supports such kind of data analysis by means of temporal templates finding and reasoning from temporal templates what has a form of producion rules IF THEN. The system is prepared to make possible new algorithms implementation and verification of their relevance in case of real-life data. The Templates system is implemented in Delphi 7.0 language. It is running on IBM PC platform under MS Windows operating system.
The Templates system supports analysis of data collected in the form of information table or decision table including temporal information table and temporal decision table. Temporal table is a table whose rows (containing information about values of attributes of one or more objects) are ordered in time. The main use of the Templates system is the exploration of temporal tables and discovering temporal templates or sequential dependencies between those templates. Sequential dependency is understood as describing order of occurrence. The system makes also possible the analysis of static data. The dependencies discovered by the system have a form of production rules IF THEN. A rule-based classifier is implemented to enable checking usefulness of the rules in the process of the prediction of new cases. The system presents the outcome of rules testing in the form of a confusion matrix. Generally, the system allows to execute automatically the following types of data exploration:
This kind of data exploration is accessory in the system. Hence only an inconspicuous set of tools is available. They are limited to the algorithm of decision rules generation which benefits from the rough set methodology. It is an exhaustive algorithm based on Boolean reasoning. Furthermore, it is possible to check (estimate) the usefulness of the rules in predicting new objects or describing known ones. The outcome is presented in the form of a confusion matrix. Additionally, generation of decision rules for incomplete decision systems is available, however lacking values are then treated as not belonging to the domains of attributes. The lacking values do not occur in the rules discovered. The algorithm of decision rules induction for the type of data described works correctly if the number of columns in the input file does not exceed 31. However, its speed is its advantage in comparison with the algorithms generating the same set of rules which are implemented in the Rosetta system or the RSES system.
Process of unsupervised learning from temporal data
The kind of data analysis considered here is appropriate for data describing values of attributes of one or more objects changing in time. In the latter case, recorded states of individual objects have to be ordered according to global time, i.e., time common for all objects. That means that the first m rows of the input file with data collected in the time interval [ts, te] and concerning m objects include information about each of the m objects recorded at time ts. Similarly, the next m rows include information about m objects recorded at time ts+1 and so on until te. In the form it is implemented in the system one can distinguish two steps of the process of unsupervised learning from temporal data: discovering temporal templates among input data and discovering knowledge about templates discovered. The result of the former stage is a time series of templates discovered by the system in the relevant parts of input data. The user influences the outcome of the stage by establishing values of some parameters that characterize discovered templates. During the second stage of the process described the IF THEN rules are generated. They reflect regularities that occur in the sequence of temporal templates. At this stage of the learning process the user has to decide how many consecutive terms of the time series of templates should be considered during rules induction. The system makes it possible to check the quality of the rules induced. The quality coefficient expresses their usefulness to predict a type of template following some sequences of templates. It is estimated using of the test data chosen by the user. The usefulness is presented in the form of confusion matrix. Rules generated during an unsupervised learning process may predict template occurrence correctly, incorrectly or partly correctly. A situation is regarded as a partly correct prediction if the predicted template is included in the actual template. The confusion matrix implemented for this kind of data analysis presents both correct and partly correct prediction. If a need arises conflicts of templates prediction are solved with the use of support and match coefficients that characterize every rule found.
Process of supervised learning from temporal data
The input file has to contain information about objects from different decision classes. In contrast to the format of the input file for the many-object version of the unsupervised learning process, in this case the states (changing in time) of objects should be ordered in a decision table according to the local time for every object. It means that data in the input file concerning individual objects has to be placed sequentially for every object. The penultimate column of the input file must contain numbers of objects and the last column must contain values of decision. The process of learning starts in the same way as in the case of unsupervised learning, i.e., with discovering temporal templates among input data and ordering them as a time series. The system abbreviates the templates according to the following rule:
Advantages of the Templates system include an intuitive and friendly interface patterned on the popular Rosetta system.
The system enables loading data from .txt files containing an information or decision table. The data have to be integer numbers. Moreover the input files have to contain the following information enabling the system to choose a proper way of analysis:
The Templates system is primarily intended to research and education purposes.
The Templates system will be extended on possibility of rules or templates filtering, to increase interactive character of the system. Moreover, it is planned to make possibility to work with the system in the off-line mode what is understood as designing of a tree of consecutive algorithms and then single execution.
Development of the Templates system has been partially supported by the grant No. 3 T11C 005 28 from Ministry of Scientific Research and Information Technology of the Republic of Poland.