Background
For an emergent pandemic, such as COVID-19, the statistics of symptoms based on hospital data may be biased or delayed due to the high proportion of asymptomatic or mild-symptom infections that are not recorded in hospitals. Meanwhile, the difficulty in accessing large-scale clinical data also limits many researchers from conducting timely research.
Objective
Given the wide coverage and promptness of social media, this study aimed to present an efficient workflow to track and visualize the dynamic characteristics and co-occurrence of symptoms for the COVID-19 pandemic from large-scale and long-term social media data.
Methods
This retrospective study included 471,553,966 COVID-19–related tweets from February 1, 2020, to April 30, 2022. We curated a hierarchical symptom lexicon for social media containing 10 affected organs/systems, 257 symptoms, and 1808 synonyms. The dynamic characteristics of COVID-19 symptoms over time were analyzed from the perspectives of weekly new cases, overall distribution, and temporal prevalence of reported symptoms. The symptom evolutions between virus strains (Delta and Omicron) were investigated by comparing the symptom prevalence during their dominant periods. A co-occurrence symptom network was developed and visualized to investigate inner relationships among symptoms and affected body systems.
Results
This study identified 201 COVID-19 symptoms and grouped them into 10 affected body systems. There was a significant correlation between the weekly quantity of self-reported …