Gated Fusion Transformer for English-Hindi Multimodal Translation

Suram, Priyanka; Patro, Pramoda

doi:10.66224/jist.50948.14.1.1

Manuscript ID : 2025072650948 Visit : 1614 Page: 1 - 8

10.66224/jist.50948.14.1.1

Article Type: Original Research

Gated Fusion Transformer for English-Hindi Multimodal Translation

Subject Areas : Natural Language Processing

Priyanka Suram ¹ , Pramoda Patro ²

1 -
2 -

Received: 2025-07-26 Accepted : 2025-11-02 Published : 2026-05-05

Keywords: Machine Translation, Domain Specific Translation, Multimodal Machine Translation, Multi Modal Fusion Mechanisms, Gated Fusion Transformer, Agricultural Translation,

Abstract :

Machine translation is fundamental in closing the gap between different languages, especially in the areas of con- cern and expertise such as agriculture. With the increase of digital tool usages in the agricultural practice, such an accurate and context-sensitive translation is increasingly significant. Proper delivery of agricultural information, including farm methods, weather advisories, and crop suggestions is essential among farmers, farm laborers, policymakers and researchers. Nevertheless, typical text-based translation frameworks tend to be less than optimal because of uncertainness and a restricted knowledge of context. To address these shortcomings, the proposed study refers to Multimodal Machine Translation (MMT) to incorporate textual and visual information to enhance accuracy. Gated Fusion Transformer (GFT) model has been customized to the agricultural field so that the problem of ambiguity in contexts and inconsistencies in translation can be eliminated. Training and evaluation were done using the multilingual benchmark dataset known as FLORES-200. Two commonly employed measures of performance were used, i.e. BLEU and METEOR. The system under proposal produced a BLEU of 58.2; METEOR score of 0.71, a high level and contextually relevant translation indicator. Besides benchmarking the GFT model in agricultural terms, this work adds value to the research community by offering a basis on which future development of multimodal translation systems in low-resource settings with domain-specific applications may be done.

References:

[1] Navarro A, Casacuberta F. Exploring multilingual pretrained ma- chine translation models for interactive translation. InProceedings of Machine Translation Summit XIX, Vol. 2: Users Track 2023 Sep (pp. 132-142).
[2] Nimma D, Srinivas VS, Gupta SS, Nair H, Devi RL, Bala BK. Comparative Analysis of Deep Learning Models for Multilingual Language Translation. In2024 8th SLAAI International Conference on Artificial Intelligence (SLAAI-ICAI) 2024 Dec 18 (pp. 1-6). IEEE.
[3] Zaki MZ. Revolutionising Translation Technology: A Compar- ative Study of Variant Transformer Models–BERT, GPT and T5. Computer Science and Engineering–An International Journal. 2024;14(3):15-27.
[4] Raunak V, Sharaf A, Wang Y, Awadallah HH, Menezes A. Leveraging GPT-4 for automatic translation post-editing. arXiv preprint arXiv:2305.14878. 2023 May 24.
[5] Barrault L, Chung YA, Meglioli MC, Dale D, Dong N, Duquenne PA, Elsahar H, Gong H, Heffernan K, Hoffman J, Klaiber SeamlessM4T: Massively Multilingual Multimodal Machine Translation. arXiv preprint arXiv:2308.11596. 2023 Aug 22.
[6] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. 2014 Sep 1.
[7] Caglayan O, Aransa W, Wang Y, Masana M, Garc´ıa-Mart´ınez M, Bougares F, Barrault L, Van de Weijer J. Does multimodality help human and machine for translation and image captioning?. arXiv preprint arXiv:1605.09186. 2016 May 30.
[8] Laskar SR, Khilji AF, Pakray P, Bandyopadhyay S. Multimodal neural machine translation for English to Hindi. InProceedings of the 7th Workshop on Asian Translation 2020 Dec (pp. 109-113).
[9] Hatami A, Banerjee S, Arcan M, Chakravarthi B, Buitelaar P, Mccrae J. English-to-low-resource translation: A multimodal approach for hindi, malayalam, bengali, and hausa. InProceedings of the Ninth Conference on Machine Translation 2024 Nov (pp. 815-822).
[10] Singh TD, i Bonet CE, Bandyopadhyay S, van Genabith J. Proceed- ings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021). InProceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021) 2021 Sep.
[11] Dash A, Gupta HR, Sharma Y. Bits-p at wat 2023: Improving indic language multimodal translation by image augmentation using diffusion models. InProceedings of the 10th Workshop on Asian Translation 2023 Sep (pp. 41-45).
[12] Laskar SR, Singh RP, Pakray P, Bandyopadhyay S. English to Hindi multi-modal neural machine translation and Hindi image captioning. InProceedings of the 6th Workshop on Asian Transla- tion 2019 Nov (pp. 62-67).
[13] Liu Y, Gu J, Goyal N, Li X, Edunov S, Ghazvininejad M, Lewis M, Zettlemoyer L. Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics. 2020 Nov 1;8:726-42.
[14] Fan A, Bhosale S, Schwenk H, Ma Z, El-Kishky A, Goyal S, Baines M, Celebi O, Wenzek G, Chaudhary V, Goyal N. Beyond english- centric multilingual machine translation. Journal of Machine Learning Research. 2021;22(107):1-48.
[15] Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research. 2020;21(140):1-67.
[16] Su W, Zhu X, Cao Y, Li B, Lu L, Wei F, Dai J. Vl-bert: Pre- training of generic visual-linguistic representations. arXiv preprint arXiv:1908.08530. 2019 Aug 22.
[17] Tan H, Bansal M. Lxmert: Learning cross-modality encoder rep- resentations from transformers. arXiv preprint arXiv:1908.07490. 2019 Aug 20.
[18] Gain B, Bandyopadhyay D, Ekbal A. IITP at WAT 2021: System description for English-Hindi multimodal translation task. arXiv preprint arXiv:2107.01656. 2021 Jul 4.
[19] Goyal N, Gao C, Chaudhary V, Chen PJ, Wenzek G, Ju D, Krishnan S, Ranzato MA, Guzma´n F, Fan A. The flores-101 evaluation benchmark for low-resource and multilingual machine translation. Transactions of the Association for Computational Linguistics. 2022 May 4;10:522-38.
[20] Yuan J, Shi X, Niu Y, Niu Y, Wang X. Multimodal Machine Translation with Fusion of Generated Visual Information. InIn- ternational Conference on Computer Engineering and Networks 2023 Nov 3 (pp. 150-156). Singapore: Springer Nature Singapore.
[21] Lu Y, Lu X, Zheng L, Sun M, Chen S, Chen B, Wang T, Yang J, Lv C. Application of multimodal transformer model in intelligent agricultural disease detection and question-answering systems. Plants. 2024 Mar 28;13(7):972.
[22] Artetxe M, Labaka G, Agirre E. Unsupervised statistical machine translation. arXiv preprint arXiv:1809.01272. 2018 Sep 4.
[23] Mehta S, Ghazvininejad M, Iyer S, Zettlemoyer L, Hajishirzi H. Delight: Deep and light-weight transformer. arXiv preprint arXiv:2008.00623. 2020 Aug 3.
[24] Sennrich R, Haddow B, Birch A. Edinburgh neural machine translation systems for WMT 16. arXiv preprint arXiv:1606.02891. 2016 Jun 9.
[25] Sennrich R, Haddow B. Linguistic input features improve neural machine translation. arXiv preprint arXiv:1606.02892. 2016 Jun 9.
[26] Wu L, Wang Y, Xia Y, Qin T, Lai J, Liu TY. Exploiting Aug 24. monolingual data at scale for neural machine translation. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) 2019 Nov (pp. 4207-4216).
[27] Gupta K, Gautam D, Mamidi R. ViTA: Visual-linguistic translation by aligning object tags. arXiv preprint arXiv:2106.00250. 2021 Jun 1.
[28] Tiedemann J. The tatoeba translation challenge–realistic data sets for low resource and multilingual MT. arXiv preprint arXiv:2010.06354. 2020 Oct 13.
[29] Zhou M, Cheng R, Lee YJ, Yu Z. A visual attention grounding neural model for multimodal machine translation. arXiv preprint arXiv:1808.08266. 2018 .
[30] Shan B, Han Y, Yin W, Wang S, Sun Y, Tian H, Wu H, Wang H. Ernie-unix2: A unified cross-lingual cross-modal framework for understanding and generation. arXiv preprint arXiv:2211.04861. 2022 Nov 9.
[31] Peng R, Zeng Y, Zhao J. Distill the image to nowhere: Inversion knowledge distillation for multimodal machine translation. arXiv preprint arXiv:2210.04468. 2022 Oct 10.
[32] Lin H, Meng F, Su J, Yin Y, Yang Z, Ge Y, Zhou J, Luo J. Dynamic context-guided capsule network for multimodal machine transla- tion. InProceedings of the 28th ACM international conference on multimedia 2020 Oct 12 (pp. 1320-1329).
[33] Sulubacak U, Caglayan O, Gro¨nroos SA, Rouhe A, Elliott D, Specia L, Tiedemann J. Multimodal machine translation through visuals and speech. Machine Translation. 2020 Sep;34:97-147.
[34] Liu P, Cao H, Zhao T. Gumbel-attention for multi-modal machine translation. arXiv preprint arXiv:2103.08862. 2021 Mar 16.
[35] Yao S, Wan X. Multimodal transformer for multimodal machine translation. InProceedings of the 58th annual meeting of the association for computational linguistics 2020 Jul (pp. 4346-4350).
[36] Long Q, Wang M, Li L. Generative imagination elevates machine translation. arXiv preprint arXiv:2009.09654. 2020 Sep 21.
[37] Ive J, Madhyastha P, Specia L. Distilling translations with visual awareness. arXiv preprint arXiv:1906.07701. 2019 Jun 18.
[38] Calixto I, Rios M, Aziz W. Latent variable model for multi-modal translation. arXiv preprint arXiv:1811.00357. 2018 Nov 1.
[39] Calixto I, Liu Q, Campbell N. Incorporating global visual features into attention-based neural machine translation. arXiv preprint arXiv:1701.06521. 2017 Jan 23.
[40] Calixto I, Liu Q, Campbell N. Doubly-attentive decoder for multi-modal neural machine translation. arXiv preprint arXiv:1702.01287. 2017 Feb 4.
[41] Kim K, Ji B, Yoon D, Hwang S. Self-knowledge distillation with progressive refinement of targets. InProceedings of the IEEE/CVF international conference on computer vision 2021 (pp. 6567-6576).

Share To

Article Url

Gated Fusion Transformer for English-Hindi Multimodal Translation

Rimag

Links

Related Centers

Technical Support

Official pages