Improving Anomaly Detection Methods Through Attribute Exclusion Using Isolation Forest
Fragment książki (Rozdział monografii pokonferencyjnej)
MNiSW
20
Poziom I
| Status: | |
| Autorzy: | Rachwał Albert, Karczmarek Paweł |
| Dyscypliny: | |
| Aby zobaczyć szczegóły należy się zalogować. | |
| Wersja dokumentu: | Drukowana | Elektroniczna |
| Język: | angielski |
| Strony: | 339 - 350 |
| Web of Science® Times Cited: | 0 |
| Scopus® Cytowania: | 0 |
| Bazy: | Web of Science | Scopus |
| Efekt badań statutowych | NIE |
| Materiał konferencyjny: | TAK |
| Nazwa konferencji: | 23rd International Conference on Artificial Intelligence and Soft Computing |
| Skrócona nazwa konferencji: | ICAISC 2024 |
| URL serii konferencji: | LINK |
| Termin konferencji: | 16 czerwca 2024 do 20 czerwca 2024 |
| Miasto konferencji: | Zakopane |
| Państwo konferencji: | POLSKA |
| Publikacja OA: | NIE |
| Abstrakty: | angielski |
| The study presents a novel method of anomaly detection which is an extension of contemporary Isolation Forest algorithm. The proposed method is based on aggregation of results obtained by performing a series of exclusions of single attributes and their pairs from datasets. In the experimental series compared are the results of the original method, the average result excluding single attributes and the average result excluding two attributes, each performed over one hundred iterations. The method is tested on various anomaly detection datasets, mostly yielding positive results judged by five different metrics. During the calculations, a popular unsupervised anomaly detection algorithm is used, namely Isolation Forest, but it is intuitively appealing that using the proposed method for other anomaly detection algorithms could also result in improved metrics. When examining datasets with attribute exclusions, it is possible to see that the different features of the studied datasets have different effects on the classification result. It can be expected that this method will yield improvements especially when the datasets are unbalanced and most of the impact on the result is distributed among a small number of features. |