Deterministic attribute selection for isolation forest
Artykuł w czasopiśmie
MNiSW
140
Lista 2024
Status: | |
Autorzy: | Gałka Łukasz, Karczmarek Paweł |
Dyscypliny: | |
Aby zobaczyć szczegóły należy się zalogować. | |
Rok wydania: | 2024 |
Wersja dokumentu: | Drukowana | Elektroniczna |
Język: | angielski |
Wolumen/Tom: | 151 |
Numer artykułu: | 110395 |
Strony: | 1 - 18 |
Impact Factor: | 7,5 |
Web of Science® Times Cited: | 0 |
Scopus® Cytowania: | 0 |
Bazy: | Web of Science | Scopus | Google Scholar |
Efekt badań statutowych | NIE |
Finansowanie: | This work has been supported by the internal grants FD-20/IT-3/047 and FD-20/IT-3/004. |
Materiał konferencyjny: | NIE |
Publikacja OA: | NIE |
Abstrakty: | angielski |
Modern data mining techniques have been gained importance in recent years. In particular, anomaly detection algorithms, applied in key sectors of information technology, have been growing in popularity. One of the efficient and fast algorithms is Isolation Forest. The method consists of two separated stages: Forest formation and evaluation of elements. The first stage relies on forming a forest of isolation trees. Each tree is built in the same manner according to drawn samples and random divisions of data attributes. In this study, an innovative deterministic attribute selection method is proposed, maintaining its random value. New ideas based on imbalance, clustering, and a dispersion of values through non-linear transformation of elements are introduced and thoroughly analyzed. These novel anomaly detection approaches are applied to 25 real datasets, as well as our own artificially generated databases. The Area Under the ROC Curve and the Area Under the PR Curve are used as a measure of the outliers classification quality. The results of the numerical experiment have proven high efficiency and competitive evaluation speed of the proposals in comparison to other Isolation Forest-based approaches, as well as several other popular techniques. |