A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis

dc.contributor.authorPaul, Dipanjyotien
dc.contributor.authorChowdhury, Arpitaen
dc.contributor.authorXiong, Xinqien
dc.contributor.authorChang, Feng-Juen
dc.contributor.authorCarlyn, Daviden
dc.contributor.authorStevens, Samuelen
dc.contributor.authorProvost, Kaiyaen
dc.contributor.authorKarpatne, Anujen
dc.contributor.authorCarstens, Bryanen
dc.contributor.authorRubenstein, Daniel I.en
dc.contributor.authorStewart, Charles V.en
dc.contributor.authorBerger-Wolf, Tanya Y.en
dc.contributor.authorSu, Yuen
dc.contributor.authorChao, Wei-Lunen
dc.date.accessioned2024-02-27T13:16:22Zen
dc.date.available2024-02-27T13:16:22Zen
dc.date.issued2023en
dc.description.abstractWe present a novel usage of Transformers to make image classification interpretable. Unlike mainstream classifiers that wait until the last fully-connected layer to incorporate class information to make predictions, we investigate a proactive approach, asking each class to search for itself in an image. We realize this idea via a Transformer encoder-decoder inspired by DEtection TRansformer (DETR). We learn “class-specific” queries (one for each class) as input to the decoder, enabling each class to localize its patterns in an image via cross-attention. We name our approach INterpretable TRansformer (INTR), which is fairly easy to implement and exhibits several compelling properties. We show that INTR intrinsically encourages each class to attend distinctively; the cross-attention weights thus provide a faithful interpretation of the prediction. Interestingly, via “multi-head” cross-attention, INTR could identify different “attributes” of a class, making it particularly suitable for fine-grained classification and analysis, which we demonstrate on eight datasets. Our code and pre-trained model are publicly accessible at https://github.com/Imageomics/INTR.en
dc.description.versionSubmitted versionen
dc.format.mimetypeapplication/pdfen
dc.identifier.orcidKarpatne, Anuj [0000-0003-1647-3534]en
dc.identifier.urihttps://hdl.handle.net/10919/118170en
dc.identifier.volumeabs/2311.04157en
dc.language.isoenen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.titleA Simple Interpretable Transformer for Fine-Grained Image Classification and Analysisen
dc.title.serialCoRRen
dc.typeArticleen
dc.type.dcmitypeTexten
pubs.organisational-group/Virginia Techen
pubs.organisational-group/Virginia Tech/Engineeringen
pubs.organisational-group/Virginia Tech/Engineering/Computer Scienceen
pubs.organisational-group/Virginia Tech/All T&R Facultyen
pubs.organisational-group/Virginia Tech/Engineering/COE T&R Facultyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2311.04157.pdf
Size:
7.96 MB
Format:
Adobe Portable Document Format
Description:
Submitted version
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Plain Text
Description: