MasonTigers at SemEval-2024 Task 10: Emotion Discovery and Flip Reasoning in Conversation with Ensemble of Transformers and Prompting
Abstract
In this paper, we present MasonTigers’ participation in SemEval-2024 Task 10, a shared task aimed at identifying emotions and understanding the rationale behind their flips within monolingual English and Hindi-English code-mixed dialogues. This task comprises three distinct subtasks – emotion recognition in conversation for Hindi-English code-mixed dialogues, emotion flip reasoning for Hindi-English code-mixed dialogues, and emotion flip reasoning for English dialogues. Our team, MasonTigers, contributed to each subtask, focusing on developing methods for accurate emotion recognition and reasoning. By leveraging our approaches, we attained impressive F1-scores of 0.78 for the first task and 0.79 for both the second and third tasks. This performance not only underscores the effectiveness of our methods across different aspects of the task but also secured us the top rank in the first and third subtasks, and the rank in the second subtask. Through extensive experimentation and analysis, we provide insights into our system’s performance and contributions to each subtask.
MasonTigers at SemEval-2024 Task 10: Emotion Discovery and Flip Reasoning in Conversation with Ensemble of Transformers and Prompting
Al Nahian Bin Emran, Amrita Ganguly, Sadiya Sayara Chowdhury Puspo, Nishat Raihan, Dhiman Goswami George Mason University, USA abinemra@gmu.edu
1 Introduction
Emotion Recognition in Conversation (ERC) has emerged as a crucial area of research within Natural Language Processing (NLP). Its primary objective is to understand and replicate human emotions, thereby placing it at the forefront of NLP research endeavors. The focus on ERC arises from the crucial role of understanding human emotions in the broader field of artificial intelligence. The notable contributions of researchers like Ekman (1992); Picard (1997); Hazarika et al. (2018a, b); Zhong et al. (2019); Ghosal et al. (2019); Jiao et al. (2019) have played a pivotal role in advancing this field. Their work has contributed to shaping the landscape of ERC research, underscoring its importance and providing valuable insights into the intricate mechanisms of human emotion recognition within conversational contexts. Recognizing emotions in conversations is also crucial, especially when emotions suddenly change. However, just spotting these changes is not enough; we need to understand what causes them, like in customer service situations. This understanding helps make dialogue systems better at handling emotions, which improves the experience for users. Emotion-Flip Reasoning (EFR), as described by Kumar et al. (2022b), is a groundbreaking effort to figure out what triggers emotional flips in group conversations. This task is not just about spotting emotional changes but also understand the words that lead up to those changes.
This shared task is designed to advance research in Emotion Recognition in Conversation (ERC) and Emotion-Flip Reasoning (EFR). It consists of three distinct subtasks aimed at exploring these areas comprehensively.
The goal of the first subtask is to perform ERC in Hindi-English code-mixed conversations. Participants are tasked with tagging each utterance in a multiparty code-mixed conversation with one of the eight emotion labels: anger, disgust, fear, sadness, surprise, joy, contempt, and neutral. This task is challenging due to the nature of code-mixed language, which combines linguistic elements from both Hindi and English, leading to unique syntactic and semantic structures. The ability to accurately recognize emotions in such mixed-language settings is critical for applications in social media monitoring, customer service, and human-computer interaction, where code-mixing is prevalent. The dataset for this task includes dialogues from various contexts, reflecting the spontaneous and informal use of language, which adds another layer of complexity to emotion recognition.
The second subtask focuses on identifying trigger utterances for emotion flips in Hindi-English code-mixed conversations, involving understanding speaker interactions and context. This task is vital for applications like customer service and user experience enhancement. The dataset reflects cultural and linguistic dynamics. The third subtask is similar but in monolingual English, emphasizing the identification of triggers in English dialogues. Our approach secured first place in Subtasks A and C, and second place in Subtask B, with F1-scores of 0.83, 0.81, and 0.86 on evaluation sets, respectively, demonstrating our methodology’s effectiveness.
2 Related Work
Research into identifying emotions has been an ongoing endeavor, initially focusing on understanding emotions in standalone contexts. There are multiple research that paves the way in this domain by exploring basic emotion theories and effective computing. Ekman (1992)’s foundational work on facial expressions and basic emotions laid the groundwork for emotion recognition systems, while Picard (1997)’s introduction of affective computing highlighted the importance of emotional interaction between humans and machines. Moreover, Cui et al. (2020) extended these concepts by integrating EEG signals into emotion recognition, demonstrating the potential of multimodal approaches.
Recognizing the importance of contextual cues, the spotlight shifted towards emotion detection within conversations, particularly in Emotion Recognition in Conversation (ERC). Early attempts at ERC involved heuristic methods and traditional machine learning techniques. Li et al. (2007) focused on rule-based systems to identify emotions from textual cues, while Fitrianie et al. (2003) used pattern recognition techniques for emotion classification in spoken dialogues. However, these methods had limitations in capturing the complexities of human emotions in conversations.
The advent of deep learning has revolutionized ERC, with various models leveraging neural networks to enhance emotion detection. Zhong et al. (2019) introduced knowledge-enriched attention networks for ERC, incorporating external knowledge to improve emotion classification. Hazarika et al. (2018a) proposed conversational memory networks, which utilize hierarchical attention mechanisms to model speaker-specific information. Recent studies have explored transformer-based architectures and graph neural networks to capture contextual dependencies and speaker interactions in dialogues Li et al. (2022); Yang et al. (2022); Tu et al. (2022); Ma et al. (2022).
While recent studies have examined emotion analysis in code-mixed language, the focus remains primarily on social media texts Ilyas et al. (2023); Wadhawan and Aggarwal (2021); Sasidhar et al. (2020) and reviews Zhu et al. (2022); Suciati and Budi (2020). These works have highlighted the unique challenges of code-mixed text, such as language switching and informal expressions. Despite some exploration into aspects such as sarcasm Kumar et al. (2022a, 2023b), offense Madhu et al. (2023), and humor Bedi et al. (2021) within code-mixed conversations, the field of emotion analysis remains largely unexplored, lacking sufficient literature. This shared task aims to bridge this gap by delving into the under-explored domain of ERC, specifically within Hindi-English code-mixed dialogues. Additionally, recent developments have introduced valuable code-mixed datasets such as OffMix-3L Goswami et al. (2023a), SentMix-3L Raihan et al. (2023a), EmoMix-3L Raihan et al. (2024) and TB-OLID Raihan et al. (2023c), facilitating advancements in this area.
The exploration of emotions within linguistic contexts remains an area of research with limited exploration. Few studies have ventured into this domain, aiming to unravel the underlying causes of expressed emotions in text, often referred to as Emotion-Cause Analysis. Lee et al. (2010) focused on identifying textual cues that trigger specific emotions, while Wang et al. (2022) integrated multimodal data to enhance emotion-cause analysis. While the concept of emotion-cause analysis and emotion-flip reasoning tasks may appear interconnected in theory, they diverge significantly in practice. Emotion-cause analysis focuses on identifying phrases within text that trigger specific emotions, whereas the Emotion Flip Reasoning (EFR) task involves extracting causes or triggers behind emotional flips in conversational dialogues involving multiple speakers Kumar et al. (2023a). These triggers comprise one or more utterances from the dialogue history, highlighting the dynamic nature of emotions in multi-party interactions.
3 Datasets
Disgust | Joy | Surprise | Anger | Fear | Neutral | Sadness | Contempt | # of Dialogue | # of Utterance | |
---|---|---|---|---|---|---|---|---|---|---|
Train | 127 | 1596 | 441 | 819 | 514 | 3909 | 558 | 542 | 8506 | 343 |
Dev | 21 | 228 | 66 | 118 | 88 | 633 | 126 | 74 | 1354 | 46 |
Test | 17 | 349 | 57 | 142 | 122 | 656 | 155 | 82 | 1580 | 57 |
Dialogue w/o trigger (0) | Dialogue with Trigger (1) | # of Dialogue | # of Utterance | |
---|---|---|---|---|
Train | 92235 | 6542 | 98777 | 4893 |
Dev | 7028 | 434 | 7462 | 389 |
Test | 7274 | 416 | 7690 | 385 |
Dialogue w/o trigger (0) | Dialogue with Trigger (1) | # of Dialogue | # of Utterance | |
---|---|---|---|---|
Train | 29423 | 5577 | 35000 | 4000 |
Dev | 3028 | 494 | 3522 | 389 |
Test | 7473 | 1169 | 8642 | 1002 |
The EDiReF shared task at SemEval 2024 comprises three subtasks: (i) Emotion Recognition in Conversation (ERC) in Hindi-English code-mixed conversations, (ii) Emotion Flip Reasoning (EFR) in Hindi-English code-mixed conversations, and (iii) EFR in English conversations.
The dataset for subtask 1, Emotion Recognition in Conversation (ERC) for code-mixed dialogues, consists of 11,440 dialogues derived from 446 utterances. It is segmented into training, development (dev), and test sets. Specifically, the training set comprises 8,506 dialogues from 343 utterances, the development set includes 1,354 dialogues from 46 utterances, and the test set contains 1,580 dialogues from 57 utterances. Each dialogue is annotated with one of eight emotions: anger, disgust, fear, sadness, surprise, joy, contempt, and neutral. The specifics of the dataset for this track can be found in Table 1.
Subtask 2, Emotion Flip Reasoning (EFR) in Hindi-English code-mixed conversations, focuses on identifying emotion triggers at the dialogue level. The dataset includes a total of 113,929 dialogues and 5,667 utterances across training, development, and test sets. In the training set, there are 98,777 dialogues, with 6,542 dialogues containing emotion triggers (labeled as 1), and 92,235 dialogues without triggers (labeled as 0), encompassing 4,893 utterances. The development set includes 7,462 dialogues, of which 434 dialogues contain triggers and 7,028 do not, with 389 utterances annotated. Similarly, the test set consists of 7,690 dialogues, with 416 dialogues containing triggers and 7,274 dialogues without, totaling 385 utterances. The details of the dataset of this track is available in Table 2.
Subtask 3, Emotion Flip Reasoning (EFR) in English conversations, aims to pinpoint specific utterances triggering emotional shifts in multi-party dialogues. The dataset comprises a total of 47,164 dialogues and 5,391 utterances across training, development, and test sets. The training set includes 35,000 dialogues, with 5,577 dialogues containing triggers (labeled as 1) and 29,423 dialogues without triggers (labeled as 0), covering 4,000 utterances. In the development set, there are 3,522 dialogues, with 494 containing triggers and 3,028 without, annotated with 389 utterances. The test set consists of 8,642 dialogues, with 1,169 containing triggers and 7,473 without, encompassing 1002 utterances. This dataset supports the investigation of emotional triggers in English conversations, contributing to advancements in emotion reasoning within dialogue analysis. The details of the dataset of this track is available in Table 3.
4 Experiments
In this section, we describe the experimental setup and the results obtained for the three subtasks: Emotion Recognition in Conversation (ERC) in Hindi-English code-mixed dialogues, Emotion Flip Reasoning (EFR) in Hindi-English code-mixed dialogues, and EFR in English dialogues.
The dataset for Subtask A consists of dialogues annotated with eight emotion categories: disgust, joy, surprise, anger, fear, neutral, sadness, and contempt. The training set contains 8,506 dialogues from 343 utterances, the development set includes 1,354 dialogues from 46 utterances, and the test set comprises 1,580 dialogues from 57 utterances (Table 1). We experimented with several models, including MuRIL Khanuja et al. (2021), XLM-R Conneau et al. (2019), mBERT Devlin et al. (2018), HingBERT Jain et al. (2021), and IndicBERT Kakwani et al. (2020). Our approach utilized a weighted ensemble of these models to improve performance. The results, as shown in Table 4, indicate that the weighted ensemble achieved the highest F1-scores on both the evaluation (0.83) and test (0.78) sets, outperforming individual models.
For Subtask B, the dataset consists of dialogues labeled with the presence or absence of triggers for emotion flips. The training set includes 98,777 dialogues, the development set comprises 7,462 dialogues, and the test set contains 7,690 dialogues (Table 2). We evaluated multiple models, including MuRIL Khanuja et al. (2021), XLM-R Conneau et al. (2019), mBERT Devlin et al. (2018), FLAN-T5 Raffel et al. (2020), and GPT-4 Turbo (Zero-Shot) OpenAI (2023). The weighted ensemble of these models achieved the best performance, with F1-scores of 0.81 on the evaluation set and 0.79 on the test set, as presented in Table 5.
For Subtask C, the dataset, derived from the MELD dataset Poria et al. (2019), includes dialogues annotated for emotion-flip reasoning. The training set consists of 35,000 dialogues, the development set contains 3,522 dialogues, and the test set has 8,642 dialogues (Table 3). Our experiments involved several models such as DeBERTa He et al. (2021), ELECTRA Clark et al. (2020), EmoBERTa Bao et al. (2021), FLAN-T5 Raffel et al. (2020), and GPT-4 Turbo (Zero-Shot) OpenAI (2023). As with the other subtasks, a weighted ensemble approach yielded the best results, achieving F1-scores of 0.86 on the evaluation set and 0.79 on the test set, as shown in Table 6.
The experimental results across all three subtasks demonstrate the effectiveness of ensemble models in improving performance. By integrating diverse models such as MuRIL, XLM-R, mBERT, FLAN-T5, and GPT-4 Turbo, we achieved higher F1-scores compared to individual models, showing robustness across Hindi-English code-mixed and English dialogues Goswami et al. (2023b); Raihan et al. (2023b); Ganguly et al. (2024). Previous works have also highlighted the benefits of ensemble models. The nlpBDpatriots and MasonPerplexity teams demonstrated significant improvements in emotion detection using ensemble techniques Raihan et al. (2023b); Bin Emran et al. (2024); Goswami et al. (2024). Our results reinforce the potential of ensemble methods in complex NLP tasks, advancing the state-of-the-art in emotion recognition and reasoning.
Model | Eval F1 | Test F1 |
---|---|---|
MuRIL | 0.82 | 0.76 |
XLM-R | 0.81 | 0.75 |
mBERT | 0.78 | 0.72 |
HingBERT | 0.77 | 0.69 |
IndicBERT | 0.74 | 0.67 |
Wt. Ensemble | 0.83 | 0.78 |
Model | Eval F1 | Test F1 |
---|---|---|
MuRIL | 0.78 | 0.77 |
XLM-R | 0.77 | 0.75 |
mBERT | 0.75 | 0.74 |
FLAN-T5 | 0.76 | 0.76 |
GPT4-Turbo (Zero-Shot) | 0.79 | 0.78 |
Wt. Ensemble | 0.81 | 0.79 |
Model | Eval F1 | Test F1 |
---|---|---|
DeBERTa | 0.79 | 0.72 |
ELECTRA | 0.76 | 0.70 |
EmoBERTa | 0.72 | 0.69 |
FLAN-T5 | 0.81 | 0.74 |
GPT4-Turbo (Zero-Shot) | 0.83 | 0.77 |
Wt. Ensemble | 0.86 | 0.79 |
5 Error Analysis
In our evaluation of the models across the three subtasks, we observed several patterns and areas for potential improvement.
For Subtask A (ERC in Hindi-English code-mixed dialogues), despite achieving a high F1-score with the weighted ensemble model, there were discrepancies in correctly identifying emotions such as surprise and contempt, as indicated by the lower frequencies of these emotions in the training set (Table 1). The imbalanced distribution of emotions likely contributed to the difficulty in accurate classification, as models like MuRIL and XLM-R showed higher variance in their predictions for less frequent emotions. Furthermore, code-mixed sentences often posed a challenge due to the nuances of mixed linguistic features, leading to misclassifications between similar emotional contexts.
In Subtask B (EFR in Hindi-English code-mixed dialogues), the identification of emotion triggers demonstrated robustness with a weighted ensemble model, yet the high variance in dialogue contexts occasionally led to false positives and negatives, particularly in dialogues with subtle or implicit emotional shifts. The relatively lower number of dialogues with triggers in the development and test sets (Table 2) highlighted the challenge of generalizing from a limited set of examples. Additionally, the presence of code-switching within dialogues sometimes confused the models, leading to incorrect trigger identifications.
Subtask C (EFR in English dialogues) exhibited the best performance overall, with the ensemble model achieving the highest F1-scores (Table 6). However, similar to Subtask B, errors were often associated with dialogues where the emotional triggers were subtle or context-dependent. Models like DeBERTa and ELECTRA underperformed compared to the ensemble, suggesting that leveraging multiple model architectures helps capture diverse linguistic features and contextual cues.
Overall, our error analysis indicates that while ensemble models significantly enhance performance, addressing data imbalance and improving the handling of nuanced and implicit emotional cues are crucial for further advancement.
6 Conclusion
In this paper, we presented MasonTigers’ participation in the SemEval-2024 Task 10, focusing on emotion recognition and emotion flip reasoning within both Hindi-English code-mixed and English dialogues. Through extensive experimentation, we employed various models including MuRIL, XLM-R, mBERT, HingBERT, IndicBERT, FLAN-T5, GPT-4 Turbo, DeBERTa, ELECTRA, and EmoBERTa, achieving competitive results across all subtasks.
Our weighted ensemble approach consistently outperformed individual models, achieving F1-scores of 0.83 on the evaluation and 0.78 on the test set for Subtask A, 0.81 and 0.79 for Subtask B, and 0.86 and 0.79 for Subtask C, respectively (Tables 4, 5, 6). These results underscore the effectiveness of ensemble learning in capturing diverse linguistic and contextual features, leading to improved performance in complex emotion recognition tasks.
The error analysis revealed that data imbalance and the challenges of code-mixed linguistic features contribute to misclassifications, highlighting areas for future research. Addressing these issues, along with enhancing the models’ ability to detect subtle and implicit emotional cues, will be pivotal for advancing the state-of-the-art in emotion recognition and reasoning.
Our contributions to EDiReF demonstrate the potential of combining multiple model architectures to tackle nuanced and diverse emotional contexts in conversations. We believe that our findings will serve as a foundation for further research in emotion recognition and emotion flip reasoning, ultimately contributing to more robust and contextually aware natural language processing systems.
Limitations
The task involved extensive datasets in each phase of all subtasks, leading to prolonged execution times and increased GPU usage. Additionally, the texts themselves were lengthy. Moreover, the prohibition of additional data augmentation added to the complexity of the task. The nuanced distinction between human-written and machine-generated text, which can sometimes be challenging for humans to discern, poses an even greater difficulty for models attempting to learn this differentiation. Exploring the potential of leveraging other up-to-date LLMs may show better performance in addressing these challenges, offering insights into optimization strategies and improving overall model robustness and efficiency for future tasks. This could potentially lead to more effective methods for distinguishing between human and AI-generated content, enhancing the reliability and applicability of such models across various domains.
Acknowledgements
We express our gratitude to the organizers for orchestrating this task and to the individuals who diligently annotated datasets across various languages. Your dedication has played a crucial role in the triumph of this undertaking. The meticulously designed task underscores the organizers’ dedication to advancing research, and we commend the collaborative endeavors that have enhanced the diversity and comprehensiveness of the datasets, ensuring a substantial and positive influence.
References
- Bao et al. (2021) Jerry Bao, Devamanyu Hazarika, Roger Zimmermann, and Soujanya Poria. 2021. Emoberta: Speaker-aware emotion recognition in conversation with roberta. arXiv preprint arXiv:2010.14730.
- Bedi et al. (2021) Manjot Bedi, Shivani Kumar, Md Shad Akhtar, and Tanmoy Chakraborty. 2021. Multi-modal sarcasm detection and humor classification in code-mixed conversations. IEEE Transactions on Affective Computing, 14.
- Bin Emran et al. (2024) Al Nahian Bin Emran, Amrita Ganguly, Sadiya Sayara Chowdhury Puspo, Dhiman Goswami, and Md Nishat Raihan. 2024. MasonPerplexity at ClimateActivism 2024: Integrating advanced ensemble techniques and data augmentation for climate activism stance and hate event identification. In Proceedings of CASE.
- Clark et al. (2020) Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555.
- Conneau et al. (2019) Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116.
- Cui et al. (2020) Heng Cui, Aiping Liu, Xu Zhang, Xiang Chen, Kongqiao Wang, and Xun Chen. 2020. Eeg-based emotion recognition using an end-to-end regional-asymmetric convolutional neural network. Knowledge-Based Systems, 205.
- Devlin et al. (2018) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Ekman (1992) Paul Ekman. 1992. An argument for basic emotions. Cognition and Emotion, 6.
- Fitrianie et al. (2003) Siska Fitrianie, Pascal Wiggers, and Leon JM Rothkrantz. 2003. A multi-modal eliza using natural language processing and emotion recognition. In Proceedings of TSD.
- Ganguly et al. (2024) Amrita Ganguly, Al Nahian Bin Emran, Sadiya Sayara Chowdhury Puspo, Md Nishat Raihan, Dhiman Goswami, and Marcos Zampieri. 2024. MasonPerplexity at multimodal hate speech event detection 2024: Hate speech and target detection using transformer ensembles. In Proceedings of CASE.
- Ghosal et al. (2019) Deepanway Ghosal, Navonil Majumder, Soujanya Poria, Niyati Chhaya, and Alexander Gelbukh. 2019. Dialoguegcn: A graph convolutional neural network for emotion recognition in conversation. arXiv preprint arXiv:1908.11540.
- Goswami et al. (2024) Dhiman Goswami, Sadiya Sayara Chowdhury Puspo, and Md Nishat Raihan. 2024. MasonTigers@ LT-EDI-2024: An ensemble approach towards detecting homophobia and transphobia in social media comments. LT-EDI.
- Goswami et al. (2023a) Dhiman Goswami, Md Nishat Raihan, Antara Mahmud, Antonios Anastasopoulos, and Marcos Zampieri. 2023a. OffMix-3L: A novel code-mixed test dataset in bangla-english-hindi for offensive language identification. In Proceedings of SocialNLP.
- Goswami et al. (2023b) Dhiman Goswami, Md Nishat Raihan, Sadiya Sayara Chowdhury Puspo, and Marcos Zampieri. 2023b. nlpBDpatriots at BLP-2023 task 2: A transfer learning approach towards bangla sentiment analysis. In Proceedings of BLP-2023.
- Hazarika et al. (2018a) Devamanyu Hazarika, Soujanya Poria, Rada Mihalcea, Erik Cambria, and Roger Zimmermann. 2018a. ICON: Interactive conversational memory network for multimodal emotion detection. In Proceedings of EMNLP.
- Hazarika et al. (2018b) Devamanyu Hazarika, Soujanya Poria, Amir Zadeh, Erik Cambria, Louis-Philippe Morency, and Roger Zimmermann. 2018b. Conversational memory network for emotion recognition in dyadic dialogue videos. In Proceedings of NAACL (HLT).
- He et al. (2021) Pengcheng He, Xiaodong Liu, and Jianfeng Gao. 2021. Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654.
- Ilyas et al. (2023) Abdullah Ilyas, Khurram Shahzad, and Muhammad Kamran Malik. 2023. Emotion detection in code-mixed roman urdu-english text. ACM Transactions on Asian and Low-Resource Language Information Processing, 22.
- Jain et al. (2021) Rajesh Jain, Shivani Ahuja, Pratik Yadav, Ajay Kumar, and Anupam Bhattacharya. 2021. Hingbert: A pre-trained language model for hindi-english code-switched nlp. arXiv preprint arXiv:2104.03470.
- Jiao et al. (2019) Wenxiang Jiao, Haiqin Yang, Irwin King, and Michael R Lyu. 2019. Higru: Hierarchical gated recurrent units for utterance-level emotion recognition. arXiv preprint arXiv:1904.04446.
- Kakwani et al. (2020) Diptesh Kakwani, Anoop Kunchukuttan, Vinit Ravishankar Golla, Kiran Gopalakrishnan, Pushpak Bhattacharyya, and Pratyush Kumar. 2020. Indicnlp suite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for indian languages. arXiv preprint arXiv:2004.05592.
- Khanuja et al. (2021) Simran Khanuja, Sandipan Dandapat, Tania Sikka, Anoop Kunchukuttan, Vishrav Kumar, and Sunayana Sharma. 2021. Muril: Multilingual representations for indian languages. arXiv preprint arXiv:2103.10730.
- Kumar et al. (2023a) Shivani Kumar, Shubham Dudeja, Md Shad Akhtar, and Tanmoy Chakraborty. 2023a. Emotion flip reasoning in multiparty conversations. IEEE Transactions on Artificial Intelligence.
- Kumar et al. (2022a) Shivani Kumar, Atharva Kulkarni, Md Shad Akhtar, and Tanmoy Chakraborty. 2022a. When did you become so smart, oh wise one?! sarcasm explanation in multi-modal multi-party dialogues. arXiv preprint arXiv:2203.06419.
- Kumar et al. (2023b) Shivani Kumar, Ishani Mondal, Md Shad Akhtar, and Tanmoy Chakraborty. 2023b. Explaining (sarcastic) utterances to enhance affect understanding in multimodal dialogues. In Proceedings of the AAAI.
- Kumar et al. (2022b) Shivani Kumar, Anubhav Shrimal, Md Shad Akhtar, and Tanmoy Chakraborty. 2022b. Discovering emotion and reasoning its flip in multi-party conversations using masked memory network and transformer. Knowledge-Based Systems, 240.
- Lee et al. (2010) Sophia Yat Mei Lee, Ying Chen, and Chu-Ren Huang. 2010. A text-driven rule-based system for emotion cause detection. In Proceedings of the NAACL (HLT).
- Li et al. (2007) Haifang Li, Na Pang, Shangbo Guo, and Heping Wang. 2007. Research on textual emotion recognition incorporating personality factor. In Proceedings of ROBIO.
- Li et al. (2022) Wei Li, Wei Shao, Shaoxiong Ji, and Erik Cambria. 2022. Bieru: Bidirectional emotional recurrent unit for conversational sentiment analysis. Neurocomputing, 467.
- Ma et al. (2022) Hui Ma, Jian Wang, Hongfei Lin, Xuejun Pan, Yijia Zhang, and Zhihao Yang. 2022. A multi-view network for real-time emotion recognition in conversations. Knowledge-Based Systems, 236.
- Madhu et al. (2023) Hiren Madhu, Shrey Satapara, Sandip Modha, Thomas Mandl, and Prasenjit Majumder. 2023. Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments. Expert Systems with Applications, 215.
- OpenAI (2023) OpenAI. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
- Picard (1997) Rosalind W. Picard. 1997. Affective Computing. MIT Press.
- Poria et al. (2019) Soujanya Poria, Navonil Majumder, Devamanyu Hazarika, Amir Zadeh, Erik Cambria, and Louis-Philippe Morency. 2019. Meld: A multimodal multi-party dataset for emotion recognition in conversations. arXiv preprint arXiv:1810.02508.
- Raffel et al. (2020) Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21.
- Raihan et al. (2023a) Md Nishat Raihan, Dhiman Goswami, Antara Mahmud, Antonios Anastasopoulos, and Marcos Zampieri. 2023a. SentMix-3L: A novel code-mixed test dataset in bangla-english-hindi for sentiment analysis. In Proceedings of SEALP.
- Raihan et al. (2023b) Md Nishat Raihan, Dhiman Goswami, Sadiya Sayara Chowdhury Puspo, and Marcos Zampieri. 2023b. nlpBDpatriots at BLP-2023 task 1: A two-step classification for violence inciting text detection in bangla. In Proceedings of BLP.
- Raihan et al. (2023c) Md Nishat Raihan, Umma Hani Tanmoy, Anika Binte Islam, Kai North, Tharindu Ranasinghe, Antonios Anastasopoulos, and Marcos Zampieri. 2023c. Offensive language identification in transliterated and code-mixed bangla. In Proceedings of BLP.
- Raihan et al. (2024) Nishat Raihan, Dhiman Goswami, Antara Mahmud, Antonios Anastasopoulos, and Marcos Zampieri. 2024. Emomix-3L: A code-mixed dataset for bangla-english-hindi emotion detection. LREC-COLING.
- Sasidhar et al. (2020) T Tulasi Sasidhar, B Premjith, and KP Soman. 2020. Emotion detection in hinglish (hindi+ english) code-mixed social media text. Procedia Computer Science, 171.
- Suciati and Budi (2020) Andi Suciati and Indra Budi. 2020. Aspect-based sentiment analysis and emotion detection for code-mixed review. International Journal of Advanced Computer Science and Applications, 11.
- Tu et al. (2022) Geng Tu, Jintao Wen, Cheng Liu, Dazhi Jiang, and Erik Cambria. 2022. Context-and sentiment-aware networks for emotion recognition in conversation. IEEE Transactions on Artificial Intelligence, 3.
- Wadhawan and Aggarwal (2021) Anshul Wadhawan and Akshita Aggarwal. 2021. Towards emotion recognition in hindi-english code-mixed data: A transformer based approach. arXiv preprint arXiv:2102.09943.
- Wang et al. (2022) Fanfan Wang, Zixiang Ding, Rui Xia, Zhaoyu Li, and Jianfei Yu. 2022. Multimodal emotion-cause pair extraction in conversations. IEEE Transactions on Affective Computing.
- Yang et al. (2022) Lin Yang, Yi Shen, Yue Mao, and Longjun Cai. 2022. Hybrid curriculum learning for emotion recognition in conversation. In Proceedings of the AAAI, volume 36.
- Zhong et al. (2019) Peixiang Zhong, Di Wang, and Chunyan Miao. 2019. Knowledge-enriched transformer for emotion detection in textual conversations. arXiv preprint arXiv:1909.10681.
- Zhu et al. (2022) Xun Zhu, Yinxia Lou, Hongtao Deng, and Donghong Ji. 2022. Leveraging bilingual-view parallel translation for code-switched emotion detection with adversarial dual-channel encoder. Knowledge-based systems, 235.