Zgadzam się
Nasza strona zapisuje niewielkie pliki tekstowe, nazywane ciasteczkami (ang. cookies) na Twoim urządzeniu w celu lepszego dostosowania treści oraz dla celów statystycznych. Możesz wyłączyć możliwość ich zapisu, zmieniając ustawienia Twojej przeglądarki. Korzystanie z naszej strony bez zmiany ustawień oznacza zgodę na przechowywanie cookies w Twoim urządzeniu.
Modern data mining techniques have been gained importance in recent years. In particular, anomaly detection algorithms, applied in key sectors of information technology, have been growing in popularity. One of the efficient and fast algorithms is Isolation Forest. The method consists of two separated stages: Forest formation and evaluation of elements. The first stage relies on forming a forest of isolation trees. Each tree is built in the same manner according to drawn samples and random divisions of data attributes. In this study, an innovative deterministic attribute selection method is proposed, maintaining its random value. New ideas based on imbalance, clustering, and a dispersion of values through non-linear transformation of elements are introduced and thoroughly analyzed. These novel anomaly detection approaches are applied to 25 real datasets, as well as our own artificially generated databases. The Area Under the ROC Curve and the Area Under the PR Curve are used as a measure of the outliers classification quality. The results of the numerical experiment have proven high efficiency and competitive evaluation speed of the proposals in comparison to other Isolation Forest-based approaches, as well as several other popular techniques.