CRAB: Class Representation Attentive BERT for Hate Speech Identification in Social Media

Sayyed M. Zahiri
Georgia Institute of Technology
mzahiri@gatech.edu
&Ali Ahmadvand
Emory University
aahmadv@emory.edu

Abstract

In recent years, social media platforms have hosted an explosion of hate speech and objectionable content. The urgent need for effective automatic hate speech detection models have drawn remarkable investment from companies and researchers. Social media posts are generally short and their semantics could drastically be altered by even a single token. Thus, it is crucial for this task to learn context-aware input representations, and consider relevancy scores between input embeddings and class representations as an additional signal. To accommodate these needs, this paper introduces CRAB (Class Representation Attentive BERT), a neural model for detecting hate speech in social media. The model benefits from two semantic representations: (i) trainable token-wise and sentence-wise class representations, and (ii) contextualized input embeddings from state-of-the-art BERT encoder. To investigate effectiveness of CRAB, we train our model on Twitter data and compare it against strong baselines. Our results show that CRAB achieves 1.89% relative improved Macro-averaged F1 over state-of-the-art baseline. The results of this research open an opportunity for the future research on automated abusive behavior detection in social media.

1 Introduction and Related Work

Twitter is one of the popular social media platforms in which people post several hundred million tweets on daily basis. Twitter similar to other existing social networks, greatly suffers from the range of violence, hate speech and human right abuse imposed on specific groups or individuals Founta et al. (2018). Hence, it is imperative to protect the users by taking pro-active steps and develop algorithms to automatically identify hate messages, and prevent them from spreading.

Essentially, there are two steps associated with automatic hate speech detection task: (i) Annotated data collection (ii) Model development. For the first step, leveraging crowd-sourcing is one of the most common approaches; for the second one, researchers have leveraged variety of Natural Language Processing (NLP) techniques. Pereira-Kohatsu et al. gathered annotated tweets through crowd-sourcing and introduced a social network analyzer which allows researchers monitor hate speech in tweets. The authors formulated abusive tweet identification as a text classification problem and developed several NLP techniques to accomplish this goal. Similarly, in this paper, we tackle hate speech detection task in the setting of text classification task.

Text classification is one of the fundamental NLP tasks used in social media analysis. Traditional text classification task mainly relies on vector space models created on hand-crafted features such as Term Frequency-Inverse Document Frequency (TF-IDF) and n-gram Zhang et al. (2011); Wang and Manning (2012). Gaydhani et al. applied TF-IDF feature extraction technique followed by traditional machine learning models on tweets to detect hate speech. Although these techniques have been effective in social media mining, they suffer from vocabulary mismatch and ambiguity Croft et al. (2010). Later, deep neural models such as Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) Kim (2014); Zhang et al. (2015); Zahiri and Choi (2017); Ravuri and Stolcke (2015) have mitigated above-mentioned shortcomings by learning dense text representation with minimal hyper-parameter tuning. In the context of social media analysis, Gambäck and Sikdar utilized different variations of CNN based models to assign each tweet with one of the predefined labels. One crucial limitation in these classifiers is they do not take class representations into consideration. Du et al. resolved this limitation by introducing interaction mechanism which computes matching score between encoded input Qiao et al. (2018) and classes, then calculated scores were utilized to predict the class.

More recently, Vaswani et al. developed the transformer model by using stacked self-attention and fully connected layers. The authors demonstrated the effectiveness of this model in capturing long term dependencies from text sequences similar to recurrent networks. However, owing to it’s feed-forward architecture it could be trained in a more efficient way compared to typical RNNs. Inspired by this work, Devlin et al. proposed BERT language model. BERT is a multi-layer bi-directional transformer trained on a very large-scaled unlabeled corpora to learn text representations. Fine-tuned BERT encoder has improved text classification task performance by a substantial margin on the benchmark dataset Sun et al. (2019).

Refer to caption — Figure 1: Class Representation Attentive BERT (CRAB) overall architecture, L-T: Linear transformation.

Although, state-of-the-art transformer-based models have shown promising results in text classification, similar to CNN and RNN based models, they do not incorporate information embedded in the class representations. Inspired by Du et al., we introduce CRAB, an interaction-based classifier which relies on the similarity scores between encoded input and class representations. Our framework embraces three parts: input representation layer, class representation layers, and aggregation layer. The input representation layer projects tweets into token-level and sentence-level dense embedding spaces. Class representation layers map classes into latent representations and let the network interact with the encoded input and determine the similarity scores between them. In other words, these layers are trained to learn the matching scores between classes and each part of the input in an end to end fashion. Finally, aggregation layer combines the matching scores computed in previous layers and infer the class label. In this paper, we use the annotated Twitter data gathered by Founta et al. (2018). In summary, the contributions of this paper are as follows:

•

A new model which leverages matching scores between trainable class representations and encoded input data to detect hate speech.
•

We perform extensive experiments on Twitter data to show our proposed model outperforms several strong baselines.

2 Model Overview

In this section we introduce our proposed model, CRAB. The overall architecture of CRAB is illustrated in Figure 1. The objective of this model is to take the tweets and classify them into one of predefined classes (Multi-class Classification). More specifically, given the training set $D=\{(X_{n},Y_{n})\}^{N-1}_{0}$ ( $X_{n}$ is the n-th training example and $Y_{n}$ is its corresponding label structured as a one hot vector), the goal of the classifier is to learn $f:X\to Y$ such that empirical risk is minimized by N observations:

\min_{f\in F}\frac{1}{N}\sum_{i=1}^{N}L(Y_{i},f(X_{i}))

(1)

Loss function L is a continuous function that penalizes training error. In this work, we employ cross entropy loss function. CRAB is comprised of: (i) representation layer (section 2.1) (ii) token-wise class representation layer (section 2.2) (iii) sentence-wise class representation layer (section 2.3) (iv) aggregation layer (section 2.4).

2.1 Input Representation Layer

The purpose of representation learning layer is to generate a contextualized fixed-sized embedding vectors for the tweets. We utilize BERT encoder to vectorize the input data. BERT can provide a more sophisticated text representation by learning from left and right of a token’s context in all layers. CRAB takes all the BERT embedding generated from final block of the transformer. More precisely, this layer encodes the input tweets into matrix $B=(e_{0},...,e_{N-1}),B\in R^{|k|\times N}$ where $e_{i\in\{0,...,N-1\}}$ is the embedding vector for the i-th token, N is input text length, and $|k|$ is embedding dimension.

2.2 Token-wise Class Representation Layer

This layer is devised to allow the neural model to learn how the encoded classes should attend to every single token in the input. To this end, we introduce a multi-head token-wise class representation network A. Each head in this block learns the interactions between the encoded input tokens and classes independently. This layer takes $B^{\prime}=(e_{1},...,e_{N-1})$ as an input; $B^{\prime}$ is concatenation of all final layer’s hidden states except the hidden state corresponds to the first token (special token [CLS]). Similar to previous studies Du et al. (2019), this layer calculates the matching scores between classes and input data using dot product operation:

Class	Normal	Abusive	Spam	Hateful	Total
Count	53851	27150	14030	4965	99996

Table 1: The total number of tweets per class.

T_{i}=A_{i}\times B^{\prime},i\in\{0,...,m\}

(2)

Where $A=(A_{0},...,A_{m})$ , $A_{i}\in R^{c\times|k|}$ and $T=(T_{0},...,T_{m})$ , $T_{i}\in R^{c\times(N-1)}$ ; number of classes is shown as c, and m is number of class representation heads.

2.3 Sentence-wise Class Representation Layer

Sentence-wise trainable class representation matrix is depicted as $\textbf{S}\in R^{c\times|k|}$ in Figure 1. Given the sentence embedding, $\textbf{{e}}\in R^{|k|\times 1}$ as an input to this layer, S is tuned to learn sentence-level class representation during the training process. Similar to sub-section 2.2, here, we also apply dot product to compute matching scores (shown as w) between the sentence embedding e, and sentence-wise class representation S:

w=S\times e,w\in R^{c\times 1}

(3)

	Evaluation
	Accuracy	F1	R	P
Naive Bayes	74.38	63.74	65.18	63.30
RNN Lai et al. (2015)	77.10	64.40	63.90	64.84
CNN Kim (2014)	77.17	64.04	62.59	65.95
EXAM Du et al. (2019)	77.34	62.75	60.80	66.36
BERT-CLS	81.30	68.82	67.26	71.11
BERT-Avg-P	81.14	68.90	67.50	71.25
CRAB-1 (Ours)	81.51	68.85	66.98	71.80
CRAB-2 (Ours)	81.70	69.75	68.00	72.16
CRAB-4 w/o SA (Ours)	81.86	69.97	68.31	72.27
CRAB-4 (Ours)	82.03 (RI: +0.9%)	70.20 (RI: +1.89%)	68.56 (RI: +1.57%)	72.54 (RI: +1.81%)

Table 2: The performance of different models (in %) on test set. F1: macro-averaged F1 score, P: macro-averaged Precision, R: macro-averaged Recall, and RI: relative improvement between bolded and underlined scores. All the improvements are statistically significant using a one-tailed Student’s t-test with a p-value

<

0.05.

2.4 Aggregation Layer

This layer aims to fuse information learned from previous layers. Equations 4-7 describe how we aggregate the token-level interaction signals and compute the matching scores, z:

$\displaystyle T_{i}^{\prime}$	$\displaystyle=$	$\displaystyle\sigma(T_{i}W_{fc1}+b_{fc1}),i\in\{0,...,m\}$	(4)
$\displaystyle T^{\prime\prime}$	$\displaystyle=$	$\displaystyle[T_{0}^{\prime}\oplus...\oplus T_{m}^{\prime}]$	(5)
$\displaystyle V$	$\displaystyle=$	$\displaystyle\sigma(T^{\prime\prime}W_{fc2}+b_{fc2})$	(6)
$\displaystyle z$	$\displaystyle=$	$\displaystyle VW_{lin},z\in R^{c\times 1}$	(7)

where $\sigma$ is LeakyReLu activation function Xu et al. (2015) and $\oplus$ indicates concatenation operator. Also, multi-dimensional array $(T^{\prime}_{0},...,T^{\prime}_{m})$ is shown as $T^{\prime}$ in Figure 1. In above equations, $W_{fc1},W_{fc2}$ and $W_{lin}$ are trainable weights. Finally, as denoted in equation 8, token-level and sentence-level matching scores are combined and passed to a softmax layer to predict the class labels:

o=\textit{softmax}(\frac{z}{|z|}+\frac{w}{|w|})

(8)

3 Experiments

In this section, we describe our dataset, pre-processing steps, metrics, baseline models, hyper-parameter settings, and experimental procedures utilized to evaluate CRAB.

3.1 Dataset

We trained our model with tweets collected by Founta et al.¹¹1https://sites.google.com/view/icwsm2020datachallenge The corpus is comprised of 4 classes: Normal, Abusive, hateful and Spam. Class distribution is shown in Table 1; it is clear from the table that classes are heavily imbalanced. We apply stratified sampling and create train, validation and test sets with ratios of 80%,10%,10% respectively.

3.2 Data Pre-processing

Tweets are full of emojis, emoticons, hashtags, and website links. To further clean tweets and yet preserve the useful information as much as possible, we develop a pipeline which maps emotionally similar emojis and emojicons into same special tokens. Likewise, the website links are also replaced with a special token. Generally, tweets are short and people do not follow grammatical rules in them. Therefore, we do not apply stop words removal, stemming or lemmatization as these techniques are often imperfect and could lead to information loss.

3.3 Setup Details

BERT was initialized with the pre-trained weights and we fine-tuned them for our downstream task during the training process. We chose batch size of 32 and number of neurons in the transformations $W_{fc1}$ and $W_{fc2}$ were 64 and 128 respectively. Model was implemented by Pytorch Paszke et al. (2017) with a single NVIDIA P100 GPU.

3.4 Baselines and Metrics

As listed in Table 2, we compare our approach with several baselines to evaluate effectiveness of our proposed model. The input to Naive Bayes classifier is TF-IDF Zhang et al. (2011) feature vectors. As part of our neural model baselines we employ CNN, RNN as well as EXAM Du et al. (2019); the word embedding size of all aforementioned neural models are set to 200. We also include classification performances of feeding BERT’s special token [CLS] embedding (BERT-CLS) and average pooled BERT’s embedding vectors (BERT-Avg-P) to a linear classifier. In both cases the outputs of last hidden layer would be sent to the classifiers. Given that our class distribution is imbalanced, we consider Macro-Averaged F1, Precision, Recall as well as Accuracy to quantify the prediction performance of the models.

3.5 Empirical Results and Discussion

Table 2 shows performances of different variations of our model, CRAB-n, as well as the baselines. Bold numbers denote the best performance and underlined numbers are the best performance among the baselines only. Letter n in CRAB-n indicates number of heads in token-wise class representation layer. During our experiment we tried various $n\in\{1,2,4,8,16\}$ . To evaluate effectiveness of sentence-wise class representation layer, we also report performance of CRAB-4 with this layer removed (depicted as CRAB-4 w/o SA in Table 2). As shown in Table 2, CRAB-4 consistently outperformed all baselines and other CRAB variations. Our model, CRAB-4, achieved 1.89% relative improved Macro-averaged F1 score and 0.9% relative improved accuracy compared to BERT-Avg-P and BERT-CLS respectively. In terms of Macro-averaged precision and recall, there is relative improvement of 1.81% and 1.57% over BERT-Avg-P, respectively. To further analyze performance of CRAB-4 and understand the way it handles imbalanced classes, we conducted error analysis on each class. Compared to our baseline BERT models, we noticed CRAB-4 obtained 1% Macro-F1 boost in the first two major classes and for the two minor classes it gained 2% increase. We hypothesize that, minor classes benefited the most from this architecture. It is worth mentioning our model can be extended to multi-label classification by simply replacing softmax with sigmoid layer. We emphasize that in our proposed architecture, input representation layer is not just limited to BERT and any other form of transformer-based encoder can be used instead.

4 Conclusion and Future Work

In this paper, we introduced CRAB, a neural model to identify hate speech from Twitter data. CRAB incorporates both word and class information from tweets into the hate speech identification process. CRAB significantly outperformed the state-of-the-art BERT-based baseline by 1.89% on relative Macro F1. Our future work includes evaluating effectiveness of this model in the extreme multi-class and multi-label problems and adopting CRAB for other online abusive behavior detection tasks.

References

Croft et al. (2010) W Bruce Croft, Donald Metzler, and Trevor Strohman. 2010. Search engines: Information retrieval in practice, volume 520. Addison-Wesley Reading.
Devlin et al. (2018) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Du et al. (2019) Cunxiao Du, Zhaozheng Chen, Fuli Feng, Lei Zhu, Tian Gan, and Liqiang Nie. 2019. Explicit interaction model towards text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6359–6366.
Founta et al. (2018) Antigoni Maria Founta, Constantinos Djouvas, Despoina Chatzakou, Ilias Leontiadis, Jeremy Blackburn, Gianluca Stringhini, Athena Vakali, Michael Sirivianos, and Nicolas Kourtellis. 2018. Large scale crowdsourcing and characterization of twitter abusive behavior. In Twelfth International AAAI Conference on Web and Social Media.
Gambäck and Sikdar (2017) Björn Gambäck and Utpal Kumar Sikdar. 2017. Using convolutional neural networks to classify hate-speech. In Proceedings of the first workshop on abusive language online, pages 85–90.
Gaydhani et al. (2018) Aditya Gaydhani, Vikrant Doma, Shrikant Kendre, and Laxmi Bhagwat. 2018. Detecting hate speech and offensive language on twitter using machine learning: An n-gram and tfidf based approach. arXiv preprint arXiv:1809.08651.
Kim (2014) Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882.
Lai et al. (2015) Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Twenty-ninth AAAI conference on artificial intelligence.
Paszke et al. (2017) Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch.
Pereira-Kohatsu et al. (2019) Juan Carlos Pereira-Kohatsu, Lara Quijano-Sánchez, Federico Liberatore, and Miguel Camacho-Collados. 2019. Detecting and monitoring hate speech in twitter. Sensors, 19(21):4654.
Qiao et al. (2018) Chao Qiao, Bo Huang, Guocheng Niu, Daren Li, Daxiang Dong, Wei He, Dianhai Yu, and Hua Wu. 2018. A new method of region embedding for text classification. In ICLR.
Ravuri and Stolcke (2015) Suman Ravuri and Andreas Stolcke. 2015. Recurrent neural network and lstm models for lexical utterance classification. In Sixteenth Annual Conference of the International Speech Communication Association.
Sun et al. (2019) Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to fine-tune bert for text classification? In China National Conference on Chinese Computational Linguistics, pages 194–206. Springer.
Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
Wang and Manning (2012) Sida Wang and Christopher D Manning. 2012. Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th annual meeting of the association for computational linguistics: Short papers-volume 2, pages 90–94. Association for Computational Linguistics.
Xu et al. (2015) Bing Xu, Naiyan Wang, Tianqi Chen, and Mu Li. 2015. Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853.
Zahiri and Choi (2017) Sayyed M Zahiri and Jinho D Choi. 2017. Emotion detection on tv show transcripts with sequence-based convolutional neural networks. arXiv preprint arXiv:1708.04299.
Zhang et al. (2011) Wen Zhang, Taketoshi Yoshida, and Xijin Tang. 2011. A comparative study of tf* idf, lsi and multi-words for text classification. Expert Systems with Applications, 38(3):2758–2765.
Zhang et al. (2015) Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in neural information processing systems, pages 649–657.