What EXACTLY are We Looking at?: Investigating for Discriminance in Ultra-Fine-Grained Visual Categorization Tasks

Published in 2023 International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2023

Comparing discriminative features at different levels of granularity is an inherent part of the human object recognition process and AI-based models mimic such behaviour with significant success in a variety of applications. Beyond fine-grained visual categorization (FGVC), ultra-fine-grained visual categorization (Ultra-FGVC) has attracted early interest despite overfitting issues, however, human interpretability issues still remain. This study explores and draws for the first time, qualitative insights into Ultra-FGVC images through saliency-based explanation methods that provide intuitive hints on where the models are looking at the images. We present empirical evidence on the accuracy of CNN models on three Ultra-FGVC datasets whereby the ResNet50 achieves the maximum performance with a top-1 accuracy of 50.3% on one of the datasets while the AlexNet achieves the least performance with a top-1 accuracy of 22.5% on the same dataset. The findings in this study are likely to spur further research in the domain towards standardizing the acceptance of AI tools for assisted object recognition at ultra-fine granularities..

Recommended citation: U. E. Akpudo, X. Yu, J. Zhou and Y. Gao, "What EXACTLY are We Looking at?: Investigating for Discriminance in Ultra-Fine-Grained Visual Categorization Tasks," 2023 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Port Macquarie, Australia, 2023, pp. 129-136, doi: 10.1109/DICTA60407.2023.00026.
Download Paper