Handle the problem of ample label space by using the Image-guided Feature Extractor on the MUSTI dataset

Feb 1, 2024·

Duc-Le Ngoc

Hung-Le Minh

Dinh-Vinh Quang

· 0 min read

PDF Cite Code

Architecture

Abstract

Among multimodal tasks, olfactory perception remains a largely unexplored field. The two most significant difficulties that need to be overcome are that the label space is ample while the data set size is generally of too small volume. The second is the imbalanced nature of labels in the data set. In this paper, we develop and evaluate our model in the task of predicting the congruence of olfactory experiences between an image and a corresponding text passage on the MUSTI dataset. To solve the label imbalance problem and optimize the process of extracting multimedia images and text with large feature spaces, we propose a model that selectively selects the text features based on image features. By selecting texts that need attention, our model outperforms existing baselines on training and testing data sets.

Type

Conference paper

Publication

In Multimedia Evaluation Workshop

Last updated on Feb 1, 2024

Vison Language

← 3-D Reconstruction from Consecutive Endoscopic Images Using Gaussian Splatting Mar 28, 2025

A Novel Method in Mobile Subscriber Location Estimation for Enhancement of Paging Procedure in Mobile Cellular Networks Jan 23, 2024 →