
About the Project
Immunotherapy has revolutionized cancer treatment. It was hailed as Science magazine’s “Breakthrough of the Year” in 2013. Yet behind the clinical milestones and scientific promises lies a more complex reality. How do patients actually experience these treatments in their daily lives?
The VOICE (Views On Immuno-Cancer Engagement) project applies Natural Language Processing (NLP) to analyze millions of real-world patient posts from major online cancer forums, including Facebook, Twitter, and Reddit. The dataset spans from 2002 to 2021. It captures the full evolution of cancer immunotherapy, from early immune checkpoint inhibitors to CAR T cell therapies and cancer vaccines. It also captures the authentic, unfiltered voices of patients navigating diagnosis, treatment, side effects, and survivorship.
By leveraging advanced NLP techniques, including BERTopic modeling and RoBERTa based sentiment analysis, we move beyond traditional clinical scales. We uncover what structured questionnaires often miss: the long term burdens, emotional struggles, financial toxicities, and existential reflections that truly shape a patient’s quality of life.
Motivation
Clinical trials and academic literature focus primarily on acute, life threatening side effects such as cytokine storms or severe nausea. However, patients in online communities consistently voice a different set of concerns.
These concerns include persistent, low grade toxicities. Insomnia, hot flashes, joint pain, and oral sores erode daily well being over months or years. Financial toxicity is another major concern. Treatment costs often exceed $100,000 per year, leading to nonadherence, wage loss, and deep anxiety. Systemic distress also emerges frequently. Patients express fears over insurance coverage, healthcare policies, and access to care. Finally, existential burdens weigh heavily on patients. Family guilt from genetic risks, reevaluation of life’s meaning, and the psychological weight of survivorship are often left unspoken in clinical settings.
Why does this gap matter? When healthcare providers and researchers prioritize only the clinically severe issues, they risk overlooking the very struggles that most affect patients’ quality of life. Our project aims to bridge this divide. We use data driven insights to empower more patient centric communication, better survivorship care, and a deeper understanding of what recovery truly means.
Research Questions
What can millions of real world patient posts reveal about the gap between clinical expectations and lived experiences of cancer immunotherapy?
Specifically, we investigate three questions.
- What topics dominate patient discourse, and which critical issues remain unspoken in traditional clinical settings?
- How does patient sentiment evolve over time, and what external events such as policy changes, drug approvals, or political debates trigger emotional volatility?
- Where do patient concerns diverge most sharply from academic literature, and what actionable insights can we offer to healthcare professionals?
Acknowledgement
Special thanks to Professor Yin Ting Cheung from the School of Pharmacy, Faculty of Medicine, The Chinese University of Hong Kong for providing the dataset Perception of Online Communities towards the Use of Cancer Immunotherapy: A Data Mining Study of 3.6 Million Web-based Posts from Social Media Platforms Using BERTopic and for her valuable guidance throughout this project.
Project Team
This project is conducted by a group of students in the Data Analytics Practice Opportunity 2025/26.
- Shuming JIANG (DSPS/3)
- Lewen DONG (MBTE/3)
- Hanze LIU (RMSC/2)