A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis
dc.contributor.author | Paul, Dipanjyoti | en |
dc.contributor.author | Chowdhury, Arpita | en |
dc.contributor.author | Xiong, Xinqi | en |
dc.contributor.author | Chang, Feng-Ju | en |
dc.contributor.author | Carlyn, David | en |
dc.contributor.author | Stevens, Samuel | en |
dc.contributor.author | Provost, Kaiya | en |
dc.contributor.author | Karpatne, Anuj | en |
dc.contributor.author | Carstens, Bryan | en |
dc.contributor.author | Rubenstein, Daniel I. | en |
dc.contributor.author | Stewart, Charles V. | en |
dc.contributor.author | Berger-Wolf, Tanya Y. | en |
dc.contributor.author | Su, Yu | en |
dc.contributor.author | Chao, Wei-Lun | en |
dc.date.accessioned | 2024-02-27T13:16:22Z | en |
dc.date.available | 2024-02-27T13:16:22Z | en |
dc.date.issued | 2023 | en |
dc.description.abstract | We present a novel usage of Transformers to make image classification interpretable. Unlike mainstream classifiers that wait until the last fully-connected layer to incorporate class information to make predictions, we investigate a proactive approach, asking each class to search for itself in an image. We realize this idea via a Transformer encoder-decoder inspired by DEtection TRansformer (DETR). We learn “class-specific” queries (one for each class) as input to the decoder, enabling each class to localize its patterns in an image via cross-attention. We name our approach INterpretable TRansformer (INTR), which is fairly easy to implement and exhibits several compelling properties. We show that INTR intrinsically encourages each class to attend distinctively; the cross-attention weights thus provide a faithful interpretation of the prediction. Interestingly, via “multi-head” cross-attention, INTR could identify different “attributes” of a class, making it particularly suitable for fine-grained classification and analysis, which we demonstrate on eight datasets. Our code and pre-trained model are publicly accessible at https://github.com/Imageomics/INTR. | en |
dc.description.version | Submitted version | en |
dc.format.mimetype | application/pdf | en |
dc.identifier.orcid | Karpatne, Anuj [0000-0003-1647-3534] | en |
dc.identifier.uri | https://hdl.handle.net/10919/118170 | en |
dc.identifier.volume | abs/2311.04157 | en |
dc.language.iso | en | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.title | A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis | en |
dc.title.serial | CoRR | en |
dc.type | Article | en |
dc.type.dcmitype | Text | en |
pubs.organisational-group | /Virginia Tech | en |
pubs.organisational-group | /Virginia Tech/Engineering | en |
pubs.organisational-group | /Virginia Tech/Engineering/Computer Science | en |
pubs.organisational-group | /Virginia Tech/All T&R Faculty | en |
pubs.organisational-group | /Virginia Tech/Engineering/COE T&R Faculty | en |