Fusion-Based AI for Sentiment and Emotion Understanding in Social Media

Ravi Shanker Shanker

Abstract


Analyzing user sentiment and emotions in digital conversations is essential for understanding online behavior. This study introduces
a novel AI-driven framework that integrates multimodal deep learning techniques to enhance sentiment, emotion, and desire
classification from social media content. By fusing text and image-based features using transformer-based architectures, our approach
outperforms traditional unimodal models in accuracy and robustness. Extensive evaluations on diverse datasets demonstrate
the effectiveness of our fusion strategy, paving the way for improved sentiment analytics in social media research and real-time
emotion tracking.


Keywords


Multimodal learning; Sentiment analysis; Emotion recognition; Transformer models; Deep learning; Social media.

Full Text:

PDF

References


Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs

up? Sentiment classification using machine learning techniques.

Proceedings of the ACL-02 Conference on Empirical

Methods in Natural Language Processing, 79–86.

DOI: 10.3115/1118693.1118704

Kim, Y. (2014). Convolutional neural networks for sentence

classification. Proceedings of the 2014 Conference

on Empirical Methods in Natural Language Processing

(EMNLP), 1746–1751. DOI: 10.3115/v1/D14-1181

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K.

(2019). BERT: Pre-training of deep bidirectional transformers

for language understanding. Proceedings of

NAACL-HLT 2019, 4171–4186. DOI: 10.18653/v1/N19-

Lu, J., Batra, D., Parikh, D., & Lee, S. (2019).

ViLBERT: Pretraining task-agnostic visiolinguistic representations

for vision-and-language tasks. Advances

in Neural Information Processing Systems, 32. DOI:

48550/arXiv.1908.02265

Tan, H., & Bansal, M. (2019). LXMERT: Learning crossmodality

encoder representations from transformers. Proceedings

of the 2019 Conference on Empirical Methods

in Natural Language Processing and the 9th International

Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 5100–5111. DOI: 10.18653/v1/D19-

Kim, W., Son, B., & Kim, I. (2021). ViLT: Visionand-

language transformer without convolution or region

supervision. Proceedings of the 38th International

Conference on Machine Learning, 5583–5594. DOI:

48550/arXiv.2102.03334

Xu, P., Zhu, X., & Clifton, D. A. (2021). Multimodal

learning with transformers: A survey. IEEE Transactions

on Pattern Analysis and Machine Intelligence, 45(2), 121–

DOI: 10.1109/TPAMI.2022.3227357

Inoue, H. (2019). Multi-sample dropout for accelerated

training and better generalization. arXiv preprint

arXiv:1905.09788. DOI: 10.48550/arXiv.1905.09788




DOI: http://dx.doi.org/10.52155/ijpsat.v55.2.7843

Refbacks

  • There are currently no refbacks.


Copyright (c) 2026 Ravi Shanker Shanker

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.