Organized Event Participant Prediction Enhanced by Social Media Retweeting Data
Abstract
Nowadays, many platforms on the Web offer organized events, allowing users to be organizers or participants. For such platforms, it is beneficial to predict potential event participants. Existing work on this problem tends to borrow recommendation techniques. However, compared to e-commerce items and purchases, events and participation are usually of a much smaller frequency, and the data may be insufficient to learn an accurate model. In this paper, we propose to utilize social media retweeting activity data to enhance the learning of event participant prediction models. We create a joint knowledge graph to bridge the social media and the target domain, assuming that event descriptions and tweets are written in the same language. Furthermore, we propose a learning model that utilizes retweeting information for the target domain prediction more effectively. We conduct comprehensive experiments in two scenarios with real-world data. In each scenario, we set up training data of different sizes, as well as warm and cold test cases. The evaluation results show that our approach consistently outperforms several baseline models, especially with the warm test cases, and when target domain data is limited.
Index Terms:
event-based system, social media, cross-domain system, graph embedding, neural recommendationI Introduction
Many digital platforms now are offering organized events through the Internet, where users can be organizers or participants. For example, the platform Meetup111https://www.meetup.com/ allow people to organize offline gatherings through online registration. And there are flash sales platforms such as Gilt222https://www.gilt.com/ that offer limited-time product discounts. Moreover, retweeting viral messages of the moment on social media platforms such as Twitter333https://www.twitter.com can be also considered a type of event. Effectively predicting event participants can provide many benefits to event organizers and participants. For example, organizers can send out invitations more effectively [1], while potential participants can receive better recommendations [2]. Some previous researches have found that the problem of event participant prediction can be solved with recommendation techniques, such as matrix factorization [3]. Indeed if one considers events as items, and participation as users, then recommending events to users can be performed similarly as recommending products to users with an e-commerce recommender system [4]. Unlike a product-based e-commerce platform, though, which has thousands of items, each purchased by thousands of users, events are organized and participated with much smaller frequency. Therefore, one problem with many event-based platforms is that they have not collected enough data to effectively learn a model of user preferences.
On the other hand, social media platforms such as Twitter nowadays are generating huge amounts of data that are accessible publicly [5]. A particular activity, that is retweeting, in which social media users repeat a popular tweet, can be seen as a type of event participant [6]. We argue that event-based platforms can use data of such activity to support their own prediction models even though some restrictions are required. For example, due to privacy concerns, it is assumed that users in the target domain will not offer their social media account information. This condition invalidates many cross-domain recommendation solutions that rely on linked accounts [7, 8, 9]. Nevertheless, even if the users are not linked to social media accounts, we can still have some useful information from social media. For examples, the interaction data that consists of user retweeting records of past tweets, and the tweet texts that are written in the same natural language. Retweeting data are useful for event participant prediction because the act of retweeting generally reveals a user’s preference towards what is described in the tweet text [10, 11].
In this paper, we propose a method to utilize social media retweeting data during the learning of an event participant prediction model of a target domain, which has limited training data. As mentioned, we do not assume there are linkable users across social media and the target domain. Instead, we only assume that the event descriptions in the target domains are written in the same language as the social media tweets. This will become our basis for linking two domains. We generate a joint graph using data from two domains, and learn cross-domain users embeddings in the same embedding space. In this way, we can increase training data by adding social media retweeting data, and train more accurate models. To the best of our knowledge, this is the first work to use social media retweeting to enhance event participant prediction.
II Related Work
We follow the recent research trend of event participant prediction, which is identified as an important problem in event-based social network (EBSN). Previously, Liu et al. studied the participant prediction problem in the context of EBSN [12]. Their technique relied on the topological structure of the EBSN and early responded users. Similarly, Zhang et al. [13] proposed to engineer some user features and then apply machine learning such as logistic regression, decision tree, and support vector machines. Additionally, Du et al. considered the event descriptions, which were overlooked in previous works [14]. As the matrix factorization became a standard method in recommendation systems [15, 16], later works also attempted to use this method in participant prediction. For example, Jiang and Li proposed to solve the problem by engineering user features and applying feature-based matrix factorization [3]. In this paper, we propose a prediction framework build on top of a deep neural network model of matrix factorization [17]. In contrast to existing works, our framework is designed to use social media retweeting data to enhance the recommendation performance in the target domain.
Our inspiration comes from various works that use a support domain to help solve computation problems in a target domain. Especially, social media has been used in various works as the support domain. For example, Wei et al. have found that Twitter volume spikes could be used to predict stock options pricing [18]. Asur and Huberman studied if social media chatter can be used to predict movie sales [19]. Pai and Liu proposed to use tweets and stock market values to predict vehicle sales [20]. Broniatowski et al. made an attempt to track influenza with tweets [21]. They combined Google Flue Trend with tweets to track municipal-level influenza. These works, however, only used high-level features of social media, such as message counts or aggregated sentiment scores. In this work, we consider a more general setting of using retweeting as a supporting source to help participation prediction in the target domain, and users and events are transformed into embeddings for a wider applicability.
III Problem Formulation
We formulate the problem of event participant prediction leveraging social media retweeting data as the following. In the target domain, we have a set of event data , and for each event , there is a number of participants , with . In the social media retweeting data, we have a set of tweets , for , we have retweeters , with . Normally we have fewer event data in the target domain than in the retweeting data, so . We assume no identifiable common users across two domains, so . An event in the target domain is described using the same language as the tweets. Let be the words in the description of event . If and are the description vocabularies in the tweets and the target domain, then .
We can represent event descriptions and users as vector-form embeddings. Since the event descriptions in the target domain and the tweet texts are written in the same language, their embeddings can also be obtained from the same embedding space. We denote as the function to obtain embeddings for event for both the target domain events and tweets. In the target domain, we have base user embeddings available through the information provided by the platform user.
IV Entity-connected Graph for Learning Joint User Embedding
There exists a number of established techniques that learn embeddings from graphs [22]. Our method is to learn a joint embedding function for both target domain and social media users by deploying such techniques, after creating a graph that connects them. Based on the participation data, we can create four kinds of relations in the graph, namely, participation relation, co-occurrence relation, same-word relation, and word-topic relation.
The participation relation comes from the interaction data, and is set between users and words of the event. Suppose user participates in event . Then we create , for each word in .
The co-occurrence relation comes from the occurrence of words in the event description. We use mutual information [23] to represent the co-occurrence behavior. Specifically, we have , where is the frequency of co-occurrence of words and , is the total number of events, and is the frequency of occurrence of a single word . We use a threshold to determine the co-occurrence relation, such that if , we create .
Two kinds of relations mentioned above are created within a single domain. We now connect the graph of two domains using the same-word relation. We create , if a word in the target domain and a word in the retweeting data are the same word. In this way, two separate graphs for two domains are connected through entities in the event descriptions. Once we have the joint graph, we can use established graph embedding learning techniques to learn user embeddings. In our experiment, we use TransE [22] as the embedding learning technique.
V Event Participant Prediction Leveraging Joint User Embeddings
We have shown how to obtain joint user embeddings for two event domains. Now we need a method to use them for the problem we aim to solve, that is event participant prediction. In this section, we will discuss first how event participant prediction can be solved in a single domain. Then we will present our framework that leverages joint user embeddings to solve the problem.
V-A Single Domain Prediction
We find that the event participant prediction can be solved by recommendation techniques. Similar to the user and item interaction in a recommendation problem, event participation can be also treated as the interaction between users and events. After considering several options, we choose the state-of-the-art cold-start recommendation model proposed by Wang et al. [24]. It is a generalization of a neural matrix factorization (NeuMF) model [17] which originally used one-hot representation for users and items.
We aim to use the model to learn the following function:
(1) |
where and are the learned embeddings for user and event . NeuMF ensembles two recommendation models, called generalized matrix factorization (GMF) and multi-layer perceptron (MLP). Specifically, it makes prediction:
(2) |
where is a linear mapping function, and is a concatenation operation.
Since the dataset usually contains only observed interactions, i.e., user purchase records of items, when training the model, it is necessary to bring up some negative samples, for example, by randomly choosing some user-item (user-event) pairs that have no interaction. The loss function for participant prediction is defined as the following:
(3) |
where if user participated in event , and 0 otherwise. denotes observed interactions and denotes negative samples.
V-B Leveraging Joint User Embeddings
We have acquired in the previous section joint user embeddings, , from the entity-connected graph. Note that we can apply the same graph technique to learn embeddings in single domains as well, denoted as and respectively for the retweeting data and target domain. From problem formulation, we also have base user embedding for the target domain . A problem is that the graph embeddings and are only available for a small number of target domain users, because they are learned from limited participation data. When we predict participants in future events, we need to consider the majority of users who have not participated in past events. These users have base embeddings but not graph embeddings and .
We need to map base embedding to the embedding space of when making the prediction. As some previous works proposed, this can be done through linear latent space mapping [25]. Essentially it is to find a transfer matrix so that approximates , and can be found by solving the following optimization problem
(4) |
where is the loss function and is the regularization. After obtaining from users who have both base embeddings and graph embeddings, we can map the base user embedding to graph user embedding for those users who have no graph embedding.
An alternative solution would be using the base user embedding as the input for training the model. This would then require us to map graph user embedding to target domain base user embedding. Unlike mapping base embedding to graph embedding, where some target domain users have both embeddings, we do not have social media users with base embeddings. So the mapping requires a different technique. We solve it by finding the most similar target domain users for a social media user, and using their embeddings as the social media user base embedding. More specifically, we pick most similar target domain users according to the graph embedding, and take the average of their base embedding:
(5) |
where is top-k target domain users most similar to the social media user according to their graph embeddings.
V-C Base and Graph Fusion
We have shown two ways to create joint training data by mapping graph embeddings to base embeddings, and by mapping base embeddings to graph embeddings. Both embedding spaces have their advantages. The graph embeddings are taken from the interaction data, thus contain information useful for predicting participation. The base embedding contains user context obtained from the target domain, which can supply extra information. While it is possible to use the two types of embeddings separately, we would like to propose a fusion unit that leverages the advantages of both embedding spaces. We call the method base and graph embedding fusion (BGF).
After obtaining training data for two types of embeddings, we train two prediction models separately for them using the NeuMF model. The input event embeddings are the same for both models. The input user embeddings are selected depending on whether the user has a graph embedding available or not. More specifically, for graph embedding space, the input is set to if user has graph embedding, and otherwise it is set to the mapped embedding . Similarly we do for the base embedding space, and select either or depending on the availability. Then, instead of output predictions, we take the concatenation layers of two NeuMF models, produced by the concatenation in Equation (2) and concatenate them together. The prediction is made on the output of this large concatenation layer.
Following a recent trend in deep learning research, we use an attention module [26] to further refine the output of the model. An attention module is generally effective when we need to choose more important information from inputs. Since after running two prediction models, we have a large number of information units, it is suitable to apply the attention module.
The idea of attention is to use a vector query to assign weights to a matrix so the more important factors can be emphasized. The query is compared with keys, a reference source, to produce a set of weights, which is then used to combine the candidate embeddings. For the current scenario, we use the concatenated output of NeuMF as the key and the event embedding as the query. The output of the attention module is a context vector for event
(6) |
where is attention weights, and is the key. We transform the concatenated output of NeuMF into a matrix with the same number of columns as the query dimensions, and use it as the key . The attention weights can be obtained using the attention score [27].
We insert the attention module after the output of two prediction models and use the event embedding as the query to select the more important information. Empirically, we do find adding the attention module improves overall prediction accuracy.
We note that BGF can be used with a single domain. We can construct the graph for a single domain without the bridging relations, i.e., only keeping the word co-occurrence and the user participation relations. Using the described above, we can have two sets of embedding generated, from the base embedding and from the graph, and on them the BGF unit can be applied. In the empirical study to be presented later, the single domain BGF is shown to have achieved relatively high prediction accuracy.
V-D Leveraging Cross-domain Learning
We have integrated social media retweeting into the event participation data of a target domain using the method described above. Now we can simply combine the retweeting data with the event participant data, treating them as a single dataset. However, there are better ways to train the model across domains, as proposed by recent studies in transfer learning. Here we will introduce a transfer learning technique that can be used to further improve our method.
The technique is called knowledge distillation (KD) [28]. It has been shown that, when model learning is shifted from one task to another task, this technique can be used to distill knowledge learned in the previous task. The distilled knowledge becomes accessible through the KL-divergence, a measure of the difference between prediction results using the new model and the old model. Specifically, we set up a loss through KL-Divergence:
(7) |
where and are predictions made with the model learned in the new domain and the old domain, respectively, and is the point-wise KL-Divergence.
We first train the model using the retweeting data, and then shift to the target domain participation data. The single domain loss and the KD loss can be counted together in the cross-domain model learning, as
(8) |
VI Experimental Evaluation
To verify the effectiveness of our approach, we perform experiments with a public event dataset, taken from event platform Meetup. On Meetup, events are explicitly defined by organizers, and users register for participation. We use Twitter as the supporting social media source. In this section, we will discuss the dataset preparation and the experiment setup, before presenting the evaluation results.
VI-A Dataset Collection
We use a publicly available dataset444https://ieee-dataport.org/documents/meetup-dataset. The dataset was collected for the purpose of analyzing information flow in event-based social networks [12]. On Meetup, users can participate in social events, which are only active in a limited period, or they can join groups, which do not have time restriction. Events and users are also associated with tags, which are associated with descriptive English keywords. Popular event examples are language study gatherings, jogging and hiking sessions, and wine tasting workshops. The dataset contains relations between several thousands of users, events, and groups. Our interest is mostly in the user-event relation.
We prepare a corresponding Twitter retweet dataset. We monitor Twitter for tweets authored by users with the keyword “she/her” and “he/him” in their profile description, which results in more than two million tweets. While these tweets covers many topics, they are more or less gender-aware given the author profiles. We construct retweet clusters from these retweets, and obtain several thousands of retweet clusters, each retweeted at least ten times by users in the dataset.
Since our objective is to investigate the effect of adding retweets when the target domain has limited data, we generate datasets of different sizes. Specifically, we select three sizes of datasets, containing 100, 200, and 500 events. To balance the retweets with event data, we use the same number of tweets as the events. The events are randomly selected, and the tweets are also randomly selected with the restriction that their texts have common words with the event descriptions. The number of events, users, participation, tweets, Twitter users, and retweets are shown in Table I.
Events | Users | Participation | Tweets | SM Users | Retweets |
---|---|---|---|---|---|
100 | 448 | 1,792 | 100 | 1,042 | 5,960 |
200 | 898 | 3,592 | 200 | 2,255 | 12,612 |
500 | 2,460 | 9,840 | 500 | 6,599 | 36,236 |
We use pre-trained embeddings to represent event descriptions and tweets of the same language. Specifically, we use the Spacy555https://spacy.io/ package, which provides word embeddings trained on Web data, and a pipeline to transform sentences into embeddings. For our approach, we also need to provide base user embeddings. For the Meetup dataset, the users are associated with tags, which are associated with text keywords. We again use Spacy to transform user tags to embeddings, and use them as the base user embedding.
VI-B Experiment Setup
We set up two test cases, based on whether or not test data contain events in the training data. In the case where test data contain events in the training data, which is called warm test, we randomly pick up one user from each event, adding it to the test data and removing it from the training data. In the case where test data contain no event in the training data, which is called cold test, we use all data shown in Table I as the training data, and use additional 1,000 events as the test data.
We create the training dataset by random negative sampling. For every interaction entry in the training dataset, which is labeled as positive, we randomly pick four users who have not participated in the event, and label the pairs as negative. The testing is done by event. For each event in the test dataset, we label all users who participated in the event as positive. Then, for the purpose of consistent measurement, we pick users, labeled as negative, so that the total candidate is , which is set to 100. For the warm test, , while for the cold test, varies from event to event.
We predict the user preference score for all the users, rank them by the score, and measure the prediction accuracy based on top users in the rank. We measure and .
We compare our method with three baselines in the existing literature, in addition to variations of our own approaches. The baselines include:
-
•
base, which runs the recommendation model on target domain base embeddings.
-
•
BGF, the base and graph fusion model we introduced. In this variation, it is used with only the target domain data.
-
•
MIX, a variation of our approach without the knowledge distillation component. Instead, it mixes target domain participation data and the retweets as a single training dataset.
-
•
BPRMF [29], a single domain matrix factorization-based recommendation model, known for its effectiveness in implicit recommendation.
-
•
CKE [30], a knowledge graph-based recommendation model. It can be used for cross-domain prediction if the supporting domain is transformed into a knowledge graph.
-
•
KGAT [31], a state-of-the-art knowledge graph-based recommendation model. It can be used for cross-domain prediction like CKE. However, it does not deal with cold items so we skip it for the cold test.
We implement our approach and all baselines in Python and Tensorflow. We set 200 as the latent factor embedding size where it is needed.
VI-C Evaluation Results and Discussions
The experimental results are shown in Tables II, respectively. Single-domain methods are indicated by (SD) and cross-domain methods are indicated by (CD). The best results in each test are highlighted in bold font.
Meetup 100 | Meetup 200 | Meetup 500 | ||||
R@10 | P@5 | R@10 | P@5 | R@10 | P@5 | |
warm test | ||||||
(SD) base | 0.192 | 0.027 | 0.196 | 0.030 | 0.190 | 0.016 |
(SD) BPRMF | 0.135 | 0.019 | 0.098 | 0.007 | 0.061 | 0.006 |
(SD) BGF | 0.385 | 0.050 | 0.589 | 0.080 | 0.887 | 0.154 |
(CD) CKE | 0.135 | 0.012 | 0.125 | 0.011 | 0.103 | 0.010 |
(CD) KGAT | 0.250 | 0.042 | 0.116 | 0.020 | 0.090 | 0.007 |
(CD) MIX | 0.154 | 0.023 | 0.196 | 0.023 | 0.190 | 0.021 |
(CD) proposed | 0.404 | 0.073 | 0.688 | 0.105 | 0.955 | 0.172 |
cold test | ||||||
(SD) base | 0.106 | 0.054 | 0.115 | 0.055 | 0.135 | 0.067 |
(SD) BPRMF | 0.097 | 0.049 | 0.094 | 0.050 | 0.094 | 0.044 |
(SD) BGF | 0.121 | 0.055 | 0.098 | 0.042 | 0.065 | 0.023 |
(CD) CKE | 0.098 | 0.048 | 0.100 | 0.051 | 0.089 | 0.043 |
(CD) MIX | 0.106 | 0.049 | 0.120 | 0.059 | 0.065 | 0.023 |
(CD) proposed | 0.124 | 0.059 | 0.124 | 0.057 | 0.122 | 0.050 |
First we look at the warm test. We can see that the proposed method has a clear advantage over other methods, achieving the best accuracy for both scenarios and for all training data sizes. Particularly, we see that it steadily outperforms MIX method, validating the effectiveness of knowledge distillation. The second best cross-domain method is KGAT, especially for smaller training data sizes. But its performance deteriorates as training data sizes increase. The best single-domain method, BGF, outperformed cross-domain methods like KGAT and CKE when training data size is large, showing the strength of fusing graph embedding and base embedding together. The proposed method, utilizes BGF and knowledge distillation, outperformed single domain BGF by up to 66%.
Next we look at the cold test. We see that the result is more complex in the cold test. When the training data size is smaller, the proposed method generally shows some advantages. For example, when the training data size is 100, it achieves 2.4% higher Recall@10 than BGF. When the training data size is 500, the base model achieves the best accuracy.
Comparing warm and cold tests, we see that cross-domain methods have an advantage for the former, but a disadvantage for the latter. The reason is that when we already have some participant data for an event, it is easier to use external knowledge to enhance the information. However, when there is no data for a new event, the useful information is mostly from the target domain itself, and retweeting data can only add limited useful information to the model, if not noises, especially when the target domain has sufficient training data.
VII Conclusion
In this paper, we propose to use social media retweeting data as an general enhancement for event participant prediction in a target domain. Our proposed solution involves a cross-domain knowledge graph, which assumes that event descriptions are written in the same language as social media tweets. We also present a learning method that utilizes joint user embedding from the knowledge graph and makes use of knowledge distillation. We test the method with real-world event participation data, comparing it with several baselines. And we show that our proposed method has clear advantage in terms of prediction accuracy, especially for the warm tests, where some participants of events are already known. For the cold test, we reach mixed results, with our method only superior in some training data sizes.
Acknowledgement
This research is partially supported by JST CREST Grant Number JPMJCR21F2.
References
- [1] Z. Yu, R. Du, B. Guo, H. Xu, T. Gu, Z. Wang, and D. Zhang, “Who should i invite for my party? combining user preference and influence maximization for social events,” in Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 2015, pp. 879–883.
- [2] Z. Qiao, P. Zhang, C. Zhou, Y. Cao, L. Guo, and Y. Zhang, “Event recommendation in event-based social networks,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 28, no. 1, 2014.
- [3] J.-Y. Jiang and C.-T. Li, “Who should i invite: predicting event participants for a host user,” Knowledge and Information Systems, vol. 59, no. 3, pp. 629–650, 2019.
- [4] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item-based collaborative filtering recommendation algorithms,” in Proceedings of the 10th International Conference on World Wide Web. ACM, 2001, pp. 285–295.
- [5] S. Wu, J. M. Hofman, W. A. Mason, and D. J. Watts, “Who says what to whom on Twitter,” in Proceedings of the 20th International World Wide Web Conference, 2011, pp. 705–714.
- [6] S. Gao, J. Ma, and Z. Chen, “Modeling and predicting retweeting dynamics on microblogging platforms,” in Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. ACM, 2015, pp. 107–116.
- [7] Z. Deng, M. Yan, J. Sang, and C. Xu, “Twitter is faster: Personalized time-aware video recommendation from twitter to youtube,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 11, no. 2, pp. 1–23, 2015.
- [8] W. X. Zhao, S. Li, Y. He, E. Y. Chang, J.-R. Wen, and X. Li, “Connecting social media to e-commerce: Cold-start product recommendation using microblogging information,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 5, pp. 1147–1159, 2015.
- [9] G. Hu, Y. Zhang, and Q. Yang, “Conet: Collaborative cross networks for cross-domain recommendation,” in Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018, pp. 667–676.
- [10] D. Gruhl, R. Guha, R. Kumar, J. Novak, and A. Tomkins, “The predictive power of online chatter,” in Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM, 2005, pp. 78–87.
- [11] S. Atouati, X. Lu, and M. Sozio, “Negative purchase intent identification in twitter,” in Proceedings of The Web Conference 2020, 2020, pp. 2796–2802.
- [12] X. Liu, Q. He, Y. Tian, W.-C. Lee, J. McPherson, and J. Han, “Event-based social networks: linking the online and offline social worlds,” in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012, pp. 1032–1040.
- [13] X. Zhang, J. Zhao, and G. Cao, “Who will attend?–predicting event attendance in event-based social network,” in 2015 16th IEEE International Conference on Mobile Data Management, vol. 1. IEEE, 2015, pp. 74–83.
- [14] R. Du, Z. Yu, T. Mei, Z. Wang, Z. Wang, and B. Guo, “Predicting activity attendance in event-based social networks: Content, context and social influence,” in Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 2014, pp. 425–434.
- [15] W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin, “A learning-rate schedule for stochastic gradient methods to matrix factorization,” in Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 2015, pp. 442–455.
- [16] Y. Xiao, G. Wang, C.-H. Hsu, and H. Wang, “A time-sensitive personalized recommendation method based on probabilistic matrix factorization technique,” Soft Computing, vol. 22, no. 20, pp. 6785–6796, 2018.
- [17] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua, “Neural collaborative filtering,” in Proceedings of the 26th International Conference on World Wide Web. Perth, Australia: ACM, 2017, pp. 173–182.
- [18] W. Wei, Y. Mao, and B. Wang, “Twitter volume spikes and stock options pricing,” Computer Communications, vol. 73, pp. 271–281, 2016.
- [19] S. Asur and B. A. Huberman, “Predicting the future with social media,” in Web Intelligence and Intelligent Agent Technology (WI-IAT), vol. 1. IEEE, 2010, pp. 492–499.
- [20] P.-F. Pai and C.-H. Liu, “Predicting vehicle sales by sentiment analysis of twitter data and stock market values,” IEEE Access, vol. 6, pp. 57 655–57 662, 2018.
- [21] D. A. Broniatowski, M. Dredze, M. J. Paul, and A. Dugas, “Using social media to perform local influenza surveillance in an inner-city hospital: a retrospective observational study,” JMIR Public Health and Surveillance, vol. 1, no. 1, p. e5, 2015.
- [22] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko, “Translating embeddings for modeling multi-relational data,” Advances in Neural Information Processing Systems, vol. 26, 2013.
- [23] H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226–1238, 2005.
- [24] H. Wang, D. Amagata, T. Makeawa, T. Hara, N. Hao, K. Yonekawa, and M. Kurokawa, “A dnn-based cross-domain recommender system for alleviating cold-start problem in e-commerce,” IEEE Open Journal of the Industrial Electronics Society, vol. 1, pp. 194–206, 2020.
- [25] T. Man, H. Shen, X. Jin, and X. Cheng, “Cross-domain recommendation: An embedding and mapping approach.” in Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence IJCAI, vol. 17, 2017, pp. 2464–2470.
- [26] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 5998–6008.
- [27] M.-T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015, pp. 1412–1421.
- [28] G. Aguilar, Y. Ling, Y. Zhang, B. Yao, X. Fan, and C. Guo, “Knowledge distillation from internal representations,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, 2020, pp. 7350–7357.
- [29] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme, “Bpr: Bayesian personalized ranking from implicit feedback,” in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, 2009, pp. 452–461.
- [30] F. Zhang, N. J. Yuan, D. Lian, X. Xie, and W.-Y. Ma, “Collaborative knowledge base embedding for recommender systems,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 353–362.
- [31] X. Wang, X. He, Y. Cao, M. Liu, and T.-S. Chua, “KGAT: Knowledge graph attention network for recommendation,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019, pp. 950–958.