Hypergraph-based Zero-shot Multi-modal Product Attribute Value Extraction

dc.contributor.authorHu, Jiazhenen
dc.contributor.authorGong, Jiayingen
dc.contributor.authorShen, Hongdaen
dc.contributor.authorEldardiry, Hodaen
dc.date.accessioned2025-08-13T11:51:02Zen
dc.date.available2025-08-13T11:51:02Zen
dc.date.issued2025-04-28en
dc.date.updated2025-08-01T07:49:08Zen
dc.description.abstractIt is essential for e-commerce platforms to provide accurate, complete, and timely product attribute values, in order to improve the search and recommendation experience for both customers and sellers. In the real-world scenario, it is difficult for these platforms to identify attribute values for the newly introduced products given no similar product history records for training or retrieval. Besides, how to jointly learn the product representation given various product information in multiple modalities, such as textual modality (e.g., product titles and descriptions) and visual modality (e.g., product images), is also a challenging task. To address these limitations, we propose a novel method for extracting multi-label product attribute-value pairs from multiple modalities in the zero-shot scenario, where labeled data is absent during training. Specifically, our method constructs heterogeneous hypergraphs, where product information from different modalities is represented by different types of nodes, and the text and image nodes are embedded and learned through CLIP encoders to effectively capture and integrate multi-modal product information. Then, the complex interrelations among these nodes are modeled through the hyperedges. By learning informative node representations, our method can accurately predict links between unseen product nodes and attribute-value nodes, enabling zero-shot attribute value extraction. We conduct extensive experiments and ablation studies on several categories of the public MAVE dataset and the results demonstrate that our proposed method significantly outperforms several state-of-theart generative model baselines in multi-label, multi-modal product attribute value extraction in the zero-shot setting.en
dc.description.versionPublished versionen
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.1145/3696410.3714714en
dc.identifier.urihttps://hdl.handle.net/10919/137485en
dc.language.isoenen
dc.publisherACMen
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.holderThe author(s)en
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.titleHypergraph-based Zero-shot Multi-modal Product Attribute Value Extractionen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
3696410.3714714.pdf
Size:
2.18 MB
Format:
Adobe Portable Document Format
Description:
Published version
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: