Hypergraph-based Zero-shot Multi-modal Product Attribute Value Extraction

Hu, Jiazhen; Gong, Jiaying; Shen, Hongda; Eldardiry, Hoda

Hypergraph-based Zero-shot Multi-modal Product Attribute Value Extraction

dc.contributor.author	Hu, Jiazhen	en
dc.contributor.author	Gong, Jiaying	en
dc.contributor.author	Shen, Hongda	en
dc.contributor.author	Eldardiry, Hoda	en
dc.date.accessioned	2025-08-13T11:51:02Z	en
dc.date.available	2025-08-13T11:51:02Z	en
dc.date.issued	2025-04-28	en
dc.date.updated	2025-08-01T07:49:08Z	en
dc.description.abstract	It is essential for e-commerce platforms to provide accurate, complete, and timely product attribute values, in order to improve the search and recommendation experience for both customers and sellers. In the real-world scenario, it is difficult for these platforms to identify attribute values for the newly introduced products given no similar product history records for training or retrieval. Besides, how to jointly learn the product representation given various product information in multiple modalities, such as textual modality (e.g., product titles and descriptions) and visual modality (e.g., product images), is also a challenging task. To address these limitations, we propose a novel method for extracting multi-label product attribute-value pairs from multiple modalities in the zero-shot scenario, where labeled data is absent during training. Specifically, our method constructs heterogeneous hypergraphs, where product information from different modalities is represented by different types of nodes, and the text and image nodes are embedded and learned through CLIP encoders to effectively capture and integrate multi-modal product information. Then, the complex interrelations among these nodes are modeled through the hyperedges. By learning informative node representations, our method can accurately predict links between unseen product nodes and attribute-value nodes, enabling zero-shot attribute value extraction. We conduct extensive experiments and ablation studies on several categories of the public MAVE dataset and the results demonstrate that our proposed method significantly outperforms several state-of-theart generative model baselines in multi-label, multi-modal product attribute value extraction in the zero-shot setting.	en
dc.description.version	Published version	en
dc.format.mimetype	application/pdf	en
dc.identifier.doi	https://doi.org/10.1145/3696410.3714714	en
dc.identifier.uri	https://hdl.handle.net/10919/137485	en
dc.language.iso	en	en
dc.publisher	ACM	en
dc.rights	Creative Commons Attribution 4.0 International	en
dc.rights.holder	The author(s)	en
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	en
dc.title	Hypergraph-based Zero-shot Multi-modal Product Attribute Value Extraction	en
dc.type	Article - Refereed	en
dc.type.dcmitype	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 3696410.3714714.pdf
Size:: 2.18 MB
Format:: Adobe Portable Document Format
Description:: Published version

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.5 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Journal Articles, Association for Computing Machinery (ACM)
Scholarly Works, Computer Science