Fusion-Based AI for Sentiment and Emotion Understanding in Social Media
Abstract
Analyzing user sentiment and emotions in digital conversations is essential for understanding online behavior. This study introduces
a novel AI-driven framework that integrates multimodal deep learning techniques to enhance sentiment, emotion, and desire
classification from social media content. By fusing text and image-based features using transformer-based architectures, our approach
outperforms traditional unimodal models in accuracy and robustness. Extensive evaluations on diverse datasets demonstrate
the effectiveness of our fusion strategy, paving the way for improved sentiment analytics in social media research and real-time
emotion tracking.
Keywords
Full Text:
PDFReferences
Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs
up? Sentiment classification using machine learning techniques.
Proceedings of the ACL-02 Conference on Empirical
Methods in Natural Language Processing, 79–86.
DOI: 10.3115/1118693.1118704
Kim, Y. (2014). Convolutional neural networks for sentence
classification. Proceedings of the 2014 Conference
on Empirical Methods in Natural Language Processing
(EMNLP), 1746–1751. DOI: 10.3115/v1/D14-1181
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K.
(2019). BERT: Pre-training of deep bidirectional transformers
for language understanding. Proceedings of
NAACL-HLT 2019, 4171–4186. DOI: 10.18653/v1/N19-
Lu, J., Batra, D., Parikh, D., & Lee, S. (2019).
ViLBERT: Pretraining task-agnostic visiolinguistic representations
for vision-and-language tasks. Advances
in Neural Information Processing Systems, 32. DOI:
48550/arXiv.1908.02265
Tan, H., & Bansal, M. (2019). LXMERT: Learning crossmodality
encoder representations from transformers. Proceedings
of the 2019 Conference on Empirical Methods
in Natural Language Processing and the 9th International
Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 5100–5111. DOI: 10.18653/v1/D19-
Kim, W., Son, B., & Kim, I. (2021). ViLT: Visionand-
language transformer without convolution or region
supervision. Proceedings of the 38th International
Conference on Machine Learning, 5583–5594. DOI:
48550/arXiv.2102.03334
Xu, P., Zhu, X., & Clifton, D. A. (2021). Multimodal
learning with transformers: A survey. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 45(2), 121–
DOI: 10.1109/TPAMI.2022.3227357
Inoue, H. (2019). Multi-sample dropout for accelerated
training and better generalization. arXiv preprint
arXiv:1905.09788. DOI: 10.48550/arXiv.1905.09788
DOI: http://dx.doi.org/10.52155/ijpsat.v55.2.7843
Refbacks
- There are currently no refbacks.
Copyright (c) 2026 Ravi Shanker Shanker

This work is licensed under a Creative Commons Attribution 4.0 International License.

















