Spatial Text Mining: an Enhanced Text Mining Framework for Extracting Disaster Relevant Social Media Data
MetadataShow full item record
In the past decade, the rise in social media has led to the development of a vast number of social media services and applications. Disaster management represents one of such applications leveraging massive data generated for event detection, response, and recovery. To find disaster relevant social media data and automatically categorize them into different classes (e.g. damage or donation), current approaches utilize natural language processing (NLP) methods based on keywords, or machine learning algorithms relying on text only. However, these classification approaches have not been perfected due to the variability and uncertainty in language used on social media. Therefore, more clues or signals are necessary to improve purely text-based approaches. A disaster relevant social media post is highly sensitive to the location and time of the post. Thus, additional features related to space and time could be useful for differentiating relevant posts. However, there has been no systematic study to explore the extent of how spatial features can aid text classification. To fill the research gap, this study proposed a spatial text mining framework to incorporate spatial information derived from social media and authoritative meteorological datasets, along with the text information, for classifying disaster relevant social media posts. This approach assesses the textual content using common text mining methods and the spatiotemporal relationship of the post to the disaster event. An assessment of the framework utilized geo-tagged social media posts and meteorological data for the 2012 Hurricane Sandy disaster event. The study designed and demonstrated how diverse types of spatial features, including wind, flooding, and proximity, can be derived from the data and then used to enhance text mining. Additionally, different temporal features are also derived and integrated into text classification. This study used a common classification scheme for classifying disaster relevant social media posts into different categories. Commonly used machine learning algorithms, including Naive Bayes and Support Vector Machine classifiers, assessed the accuracy within the enhanced text-mining framework. Finally, integrating textual, spatial, and temporal features to generate different classification models identified the features with the greatest influence in the classification. The experimental results indicate that proximity (spatial), disaster status (i.e., spatiotemporal relationship of the hurricane and social media post) features help improve the overall accuracy of the classification. The results from this study address the need for incorporating spatial data when using social media in disaster management applications.