Fuzzy C-Means-based Isolation Forest
Artykuł w czasopiśmie
MNiSW
200
Lista 2021
Status: | |
Autorzy: | Karczmarek Paweł, Kiersztyn Adam, Pedrycz Witold, Czerwiński Dariusz |
Dyscypliny: | |
Aby zobaczyć szczegóły należy się zalogować. | |
Rok wydania: | 2021 |
Wersja dokumentu: | Drukowana | Elektroniczna |
Język: | angielski |
Wolumen/Tom: | 106 |
Numer artykułu: | 107354 |
Strony: | 1 - 10 |
Impact Factor: | 8,263 |
Web of Science® Times Cited: | 23 |
Scopus® Cytowania: | 32 |
Bazy: | Web of Science | Scopus | Google Schoolar |
Efekt badań statutowych | NIE |
Finansowanie: | Funded by the National Science Centre, Poland under CHIST-ERA programme (Grant no. 2018/28/Z/ST6/00563). |
Materiał konferencyjny: | NIE |
Publikacja OA: | NIE |
Abstrakty: | angielski |
Theproblemoffindinganomalies(outliers)indatabasesisoneofthemostimportantissuesinmoderndata analysis. One of the reasons is the occurrence of this issue in almost every type of database,includingnumerical,categorical,time,mixed,orgraphicdata.Therearecurrentlymanymethodsoftendedicated to specific data analysis. Finally, this topic is extremely interesting per se, as a researchproblem that intrigues researchers. One of the classic methods of data analysis dedicated to findingthe anomalies in the data is Isolation Forest. However, this method, with a few exceptions, has notbeen modified from the time of its first publication, and, in particular, it has not yet appeared incombinationwiththetypicalfuzzymethodsusedforgroupingsuchasFuzzyC-Means(FCM)clustering.In this study, we thoroughly analyze this approach, as well as several related ones. We examine thepossibilities of this technique and analyze it in detail for characteristics of data (database size, numberof attributes, records, their type, etc.). It is worth noting that FCM allows to obtain membership gradesof elements forming Isolation Forest nodes to clusters on the basis of which these nodes are built.Hence, at the stage of calculating the anomaly scores, this information is effectively used, in particulartoexpresshowmuchagivenelementmaybelongtoagroupofsimilarelements,whichcanbeinferredfrom the characteristics of the cluster in which it lies. In this study, we propose a set of methodsenhancing the Isolation Forest on a basis of Fuzzy C-Means. The results of numerical experimentscarried using 27 various datasets and reported in this paper lead us to the conclusion that FCM canplayapivotalroleinanenhancementofIsolationForestapproachandraisesupthevaluesofparticularmeasures of effectiveness of the anomaly detection methods. |