2020.ICDM.LP-Explain: Local Pictorial Explanation for Outliers
- paper
- [main idea](#main idea)
- contribution
- method
-
- [step 1:X, SQ Generation](#step 1:X, SQ Generation)
- [step 2:Outlier Clustering](#step 2:Outlier Clustering)
-
- [Relation Quantification of Feature Pairs](#Relation Quantification of Feature Pairs)
- [Similarity of Outliers](#Similarity of Outliers)
- [Spectral Clustering on Outliers](#Spectral Clustering on Outliers)
- [Feature Pair Selection](#Feature Pair Selection)
- experiment
paper
main idea
tries to identify the set of best Local Pictorial explanations (defined as the scatter plots in the 2-D space of the feature pairs) that can Explain the behavior for cluster of outliers.
Different from lookout in:
lookout chose top subspaces represent a compromise among all the outliers, thus they may not include the optimal subspace for each individual outlier.
Ours: cluster outliers, then explain the behavior for each cluster.
contribution
1、We propose a new pictorial explanation method to provide visualized descriptions for clusters of outliers.
2、We design an outlier clustering method specifically for our pictorial explanation task. The method first quantifies the relationship among feature pairs, then leverages a proposed rank similarity method to measure the distance between top feature pairs of outliers.
3、We formulate the feature pairs selection problem as a multi-task learning problem where a hyperparameter indicating the localization level is adopted to provide explanation towards individual cluster or all the outliers.
4、We conduct experiments on six public datasets and demonstrate the effectiveness of the proposed LP-Explain by the explanation performance.
method
1、define an effective measure to quantify the similarity between outliers, and then cluster outliers into different groups based on their abnormal feature pairs.
2、weigh the importance of feature pairs within each cluster through a
multi-task learning framework to select the set of top feature
pairs that best explain various outlier clusters.
step 1:X, SQ Generation
each row in X represents the anomaly score detected on feature pairs of outlier i. For example, the first in the first row means the anomaly score of outlier 1 in feature pair 1(i.e. fp1).
Then for each outlier, the scores are sorted to get feature pair sequence.
step 2:Outlier Clustering
Relation Quantification of Feature Pairs
construct a fully connected graph G =(V,E) to indicate feature pairs relationship:
compute the weight of the edge:
the edge weight between two feature pairs is large when most of the outliers obtain similar outlier scores in these two 2-D spaces.
To measure the structural similarity between two feature pairs, we need to learn a vector representation V =(v1, v2, ..., vn) of each node in the graph G.
Similarity of Outliers
a rank similarity method to quantitatively measure the relation between two outliers according to their ranked feature pair sequences.
Spectral Clustering on Outliers
use Self-Tuning Spectral Clustering method to produce clusters C