| dc.contributor.advisor | McRoy, Susan | |
| dc.creator | Auh, Yong | |
| dc.date.accessioned | 2025-10-08T18:06:46Z | |
| dc.date.issued | 2025-08 | |
| dc.identifier.uri | http://digital.library.wisc.edu/1793/95992 | |
| dc.description.abstract | In order to identify withdrawal symptoms from patient-generated online texts, the contents of the texts need to be understood in terms of symptoms and medical discontinuation events patients experience. Since withdrawal symptoms occur after discontinuation or reduction of drugs, temporal sequencing of drug discontinuation event descriptions and medical symptom descriptions in the texts need to be recognized.Applicability of conventional statistical classification algorithms is analyzed, and limitations of statistical methods for natural language texts are discussed. Misuse of statistical techniques by treating natural language tokens or n-grams as features is proved to be one of the reasons for lack of generalizability and understanding. The method of fine-tuning BERT and variations (SciBERT, and PubMedBERT) for classification is evaluated, and lack of generalizability for different types of texts, and focus on prediction based on distribution of sequences without structures have been pointed out as limitations. Processing capability of scrambled and alphabetized texts as if they are valid sequences of texts shows paradoxically the fundamental problem of this method. Although not perfect and incomplete, statistical techniques are still valuable for dependency parsing and parts of speech assignment. Utilizing Spacy’s dependency parser combined with parts of speech assignments, improvements are made to Spacy’s output for better meaningful structuring of input sentences. An application to display structures of sentences in a flexible fashion was created for better evaluation of structures. A complex structural pattern specification language (SPSL) was introduced which allows specification of hierarchical non-contiguous patterns, and a matching algorithm for complex patterns was developed. Databases of medical symptoms and medication-related events relevant for the domain of antidepressant drugs are setup, and identification of withdrawal symptoms is performed on the revised dataset PsyTAR. Theoretical background of the current approach is also discussed. The importance of multi-dialect and multilingualism is emphasized. For proper modeling of human linguistic competence, modular hierarchical modeling of linguistic knowledge and community-specific special terminology and belief database is required, and outlines of such system are presented. | |
| dc.subject | Computer science | |
| dc.subject | Mathematics | |
| dc.subject | hybrid structural representation | |
| dc.subject | social media text processing | |
| dc.subject | withdrawal symptoms | |
| dc.title | COMPUTATIONAL ANALYSIS OF SOCIAL MEDIA TEXTS WITH A CASE STUDY OF WITHDRAWAL SYMPTOMS IDENTIFICATION | |
| dc.type | dissertation | |
| thesis.degree.discipline | Computer Science | |
| thesis.degree.name | Doctor of Philosophy | |
| thesis.degree.grantor | University of Wisconsin-Milwaukee | |
| dc.contributor.committeemember | Zhao, Tian | |
| dc.contributor.committeemember | Kate, Rohit | |
| dc.contributor.committeemember | Gervini, Daniel | |
| dc.contributor.committeemember | He, Lu | |
| dc.description.embargo | 2027-06-23 | |
| dc.embargo.liftdate | 2027-06-23 | |