Personnel: Yutong Sun, Mohit Prabhushankar
Goal: To assess the implicit saliency of trained discriminative neural network models in a top-down fashion
Challenges: Neural networks are trained to only attend to relevant portions of the image implicitly. However, this implicit attention is driven by the task that they are trained for. Hence, existing attention-based explanatory methods like Grad-CAM and others are insufficient to mimic human saliency.
Our Work: In this work, we show that existing recognition and localization deep architectures, that have not been exposed to eye tracking data or any saliency datasets, are capable of predicting human visual saliency. We term this as implicit saliency in deep neural networks. We calculate this implicit saliency using expectancy-mismatch hypothesis in an unsupervised fashion. Our experiments show that ex- tracting saliency in this fashion provides comparable performance when measured against the state-of-art supervised algorithms. Also, we show that semantic features contribute more than low-level features for human visual saliency detection. Based on these properties and performances, our method greatly lowers the threshold for saliency detection in terms of required data and bridges the gap between human visual saliency and model saliency.
References: