Optimized feature construction methods for data summarizations of relational data
Many approaches have been developed to discover knowledge (i.e. useful information) from data stored in multiple tables in a relational database. Dynamic Aggregation of Relational Attributes (DARA) algorithm is one of the approaches to summarize data stored in a target table that has a one-to-many r...
Saved in:
| Main Author: | |
|---|---|
| Format: | Thesis |
| Language: | en en |
| Published: |
2014
|
| Subjects: | |
| Online Access: | https://eprints.ums.edu.my/id/eprint/42534/1/24%20PAGES.pdf https://eprints.ums.edu.my/id/eprint/42534/2/FULLTEXT.pdf https://eprints.ums.edu.my/id/eprint/42534/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Many approaches have been developed to discover knowledge (i.e. useful information) from data stored in multiple tables in a relational database. Dynamic Aggregation of Relational Attributes (DARA) algorithm is one of the approaches to summarize data stored in a target table that has a one-to-many relationship with data stored in a non-target table. DARA transforms the data relational representation into a vector space representation and a clustering process is applied to group the data based on their characteristics similarity. The summarized data will then be fed to any classification algorithm to perform the classification task. A classification task is commonly performed to discover frequent patterns in the data that can be used to classify new unknown data. In DARA, the predictive accuracy of the classification task can be affected by the descriptive accuracy of the summarized data, DARA. The descriptive accuracy of the DARA summarized data is highly influenced by the representation of non-target records in the vector space model. Feature construction has shown being able to enrich the representation of non-target records and thus, to improve the descriptive accuracy of the summarized data. However, the existing feature construction method does not explore all possible potential representation of records. In this thesis, novel feature construction methods are introduced and a question of whether or not the descriptive accuracy of the summarized data can benefit from the novel feature construction methods is investigated. The proposed framework involves the application of genetic algorithm which incorporates several feature scoring measures to optimize the process of feature construction. This thesis also presents the study of a method to improve the descriptive accuracy of DARA algorithm by generating multi-instances of summarized data. The empirical results show that the predictive accuracy can be improved and thereby the descriptive accuracy of the summarized data can benefit from the proposed methods. The proposed methods provide wider search space of valuable way to represent records in non-target table. |
|---|
