This document presents research on analyzing sentiment and affect in dark web forums related to radical groups. It aims to determine how effective automated methods are at measuring opinion polarity and emotion intensity in these forums. The researchers collected data from two forums - a more radical Al-Firdaws forum and a more moderate Montada forum. They used machine learning techniques like SVR ensembles and feature selection to analyze 500 sentences from each forum for sentiment and intensities of emotions like violence and hate. The results found the Al-Firdaws forum expressed more negative sentiment and intense negative emotions, confirming domain expert assessments. A time series analysis also examined how forum affects changed over time.
1. Sentiment and Affect analysis of Dark Web Forums: Measuring Radicalization on the Internet Hsinchun Chen, Fellow, IEEE
2. Introduction Web forums offer participants a medium to express their opinions and emotions freely in discussion. Extremist and terrorist groups also use web forums for community. Expression and dissemination of their ideologies and propaganda Such forums are often referred to as being part of Dark Web
3. Introduction Information contained within Dark Web forums represent asignificant source of knowledge for security and intelligence organizations. Theopinions and emotions expressed within these forums provide valuable insights: the nature and position of the online community Characterizing individual participants Manual analysis of the vast quantities of messages to measure the opinions and emotions expressed is often infeasible.
4. Introduction This paper presents an automated approach to sentiment and affect analysis of two Dark Web forums related to the Iraqi insurgency and Al-Qaeda. The automated approach utilizes a rich set of textual features and machine learning techniques.
5. Related Work Sentiment and affect analysis are related tasks in text mining that focus on directional text, containing opinions, emotions, and biases. [5] M. A. Hearst, “Direction-based text interpretation as an information access refinement,” In Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval. Lawrence Erlbaum Associates, 1992. [6] J. Wiebe, “Tracking point of view in narrative,” Computational Linguistics, vol. 20 (2), pg. 233-287, 1994.
6. Related Work Sentiment analysis attempt to identify, analyze, and measure opinions expressed in text. Affect analysis focuses on the emotional content of the communication. R. Agrawal, S. Rajagopalan, R. Srikant, and Y. Xu, “Mining newsgroups using networks arising from social behavior,” Proc. of the 12th Int’l WWW Conf., 2003. P. Subasic and A. Huettner, “Affect analysis of text using fuzzy semantic typing,” IEEE Trans. Fuzzy Systems, vol. 9 (4), pg. 483-496.
7. Related Work There are some important distinction between the two Affect analysis evaluates the intensity of a number of potential emotions, including happiness, sadness, anger, fear, etc Sentiment analysis considers the polarity of opinions along a positive-neutral-negative continuum. The words and phrases associated with sentiments are mutually exclusive. Segments of text can convey multiple affects
8. Related Work Researchers have utilized various machine learning approaches to perform automated sentiment and affect analysis. B. Pang, L. Lee, and S. Vaithyanathain, “Thumbs up? sentiment classification using machine learning techniques,” Proc. Empirical Methods in Natural Language Processing, pg. 79-86, 2002. R. W. Picard, E. Vyzas, and J. Healey, “Toward machine emotional intelligence: analysis of affective physiological state,” IEEE Tran. Pattern Analysis and Machine Intelligence, vol. 23 (10), pg. 1179-1191, 2001.
9. Related Work In particular, the SVM learning approach has been shown to be particularly effective in determining whether a text segment contains expression of a particular affects class. Only for discrete label. Y. H. Cho and K. J. Lee, “Automatic affect recognition using natural language processing techniques and manually built affect lexicon,” IEICE Tran. Information Systems, vol. E89 (12), pg. 2964-2971, 2006.
10. Related Work SVR is an alternate approach that is capable of predicting continuous sentiment and affect intensities while benefitting from the robustness of SVM. A. Webb, Statistical Pattern Recognition. John Wiley & Sons, 2002.
11. Research Questions In a recent book by Ryan, the author highlights the critical role that the Web forums play for militant Islamic radicalization on the Internet. Marc Sageman, an internationally renowned terrorism study consultant, also emphasizes the importance of the internet, especially forums. This paper presents our web mining research on sentiment and affect analysis of two large-scale, internal Jihadist forums.
12. Research Questions This study seeks to answer the following research questions: How effective are automated methods of sentiment and affect analysis in measuring the polarities of opinions and intensities of emotions in Dark Web forums? What insights into the Dark Web forums are gained by performing sentiment and affect analysis?
13. Data Two Dark Web forums were selected for sentiment and affect analysis Al-Firdaws (www.alfirdaws.org/vb) Montada (www.montada.com) Al-Firdaws a more radical forum considerable content dedicated to support of the Iraqi insurgency and Al-Qaeda. Montada Montada is a general discussion forum with content pertaining to a variety of social and religious issues. Domain experts consider Montada to be more moderate compared to Al-Firdaws, with less radical content.
14. Data Spidering programs were used to collect the content from the two web forums. A summary of the collection statistics is presented in Table I. Data set is larger. An older forum Al-Firdaws is too radical
15. Data Both Al-Firdawsand Montada are major forums for their respective purposes and communities, with relatively high membership levels and numerous authors.
16. Data In both cases postings are more evenly distributed across web forum threads. Although the Montada forum has a larger average number of posts per thread compared to Al-Firdaws, the median number of posts per thread is nearly equal.
17. Data 500 sentences were selected from each web forum, and scored for the intensities of sentiments and affects expressed. The affects of interest in the study included those of most interest to security and intelligence organizations including violence, anger, hate, and racism. These affects were measured on a continuous scale ranging from 0 to 1. The sentiment measurement was on a continuous scale from -1 to 1
20. Methods Annotation step Character, word, root, collocation n-grams Character and word n-grams are commonly used in text mining applications. To derive root level n-grams, Arabic words were converted to their roots using a clustering algorithm. Collocation n-grams included the Hapax and Dis collocations. Features with less than four occurrences in the test bed were excluded.
22. Methods The machine learning approach for identifying the presence and intensities of sentiments and affects in Dark Web forum sentences utilized a SVR ensemble. SVR was utilized toleverage the robustness of SVM, while accommodating the continuous intensities of sentiments and affects. Ensemble classifiers aggregate multiple independent classifiers built using different techniques or feature subsets improving performance over a single classifier.
23. Methods For the analysis of the Al-Firdaws and Montada web forums, a separate classifier was developed for each of the five sentiment and affect classes
24. Methods Feature selection Information gain (IG) heuristic Discretization of intensities were performed before IG could be applied and the relevant features selected. To compensate for the discretization, multiple iterations were performed varying the number of class bins for intensity between 2 and 10. The IG heuristic was used recursively to select relevant features in these iterations using recursive feature elimination (RFE).
26. Methods The feature selection phase resulted in a subset of the features identified in the test bed selected for each of the 5 classifiers in the ensemble. Originally 7556 features. Only 22% was selected
28. Results A sample of messages and their sentiment and affect intensities determined through automated analysis are presented inTable VII.
29. Results Results confirm the assessment of the forums by domain experts. The Al-Firdaws forum contained higher intensities of violence and hate affects with a more negative sentiment polarity
30. Results The percentage of postings containing intense levels of the four affects are greater in the Al-Firdaws forum compared to the Montada forum, as shown in Figs. 8 and 9.
31. Results The violence and hate affects were used by a relatively large percentage of Al-Firdaw authors
32. Results A time series analysis was performed to understand how forum affect intensities progressed over time