Statement of Purpose

30 Mar 2025

I am currently a Ph.D. student in Computer Science at the University of Hawaii at Manoa, advised by Prof. Haopeng Zhang. My research interests lie in the intersection of natural language processing, information extraction, summarization, and generative AI — particularly in how large language models (LLMs) can be adapted to real-world, domain-specific tasks.

My journey in computing began with a solid mathematical foundation, having earned a B.Sc. in Mathematics from Hefei University of Technology. That experience cultivated the analytical thinking which later proved vital during my M.Eng. studies at Nanyang Technological University (NTU), where I worked under Prof. Siu Cheung Hui. There, I explored representation learning models for biomedical event detection, contributing to the journal publication (BMC Bioinformatics).

Beyond academic training, I have been fortunate to apply research in practice. At the SPIRIT Lab in NTU and Singapore Food Agency, I led projects involving prototype frameworks for pathogen extraction using LLMs, and developed a full-stack system for food safety news monitoring. These experiences deepened my commitment to real-world NLP applications that go beyond benchmark datasets and into impactful, applied science.

Research Vision

My current research focuses on building robust, generalizable summarization and information extraction models that can handle domain shift, noisy input, and low-resource scenarios. The increasing complexity of textual data in scientific, medical, and government contexts demands systems that go beyond benchmark fine-tuning. I aim to design adaptive summarization frameworks that can dynamically identify, filter, and retain salient information across diverse domains.

In information extraction, I am particularly interested in structured prediction from unstructured text — especially biomedical and scientific literature. My recent work on span-level generative models for biomedical event extraction highlights the importance of domain-specific structures, and I plan to extend these efforts by incorporating more controllable generation and better alignment with downstream applications.

Robustness, adaptability, and factual accuracy are core challenges that I plan to address through better data truncation strategies, domain-aware architectures, and evaluation metrics grounded in both semantic fidelity and practical relevance.

Long-Term Goals

My long-term goal is to advance the field of language understanding through the design of systems that can extract, summarize, and reason over large-scale text corpora reliably. I envision myself leading research in academia or industry, focusing on interpretable and deployable NLP solutions that help users interact meaningfully with complex information sources — especially in high-stakes domains like biomedicine, law, and public safety.

I also aspire to bridge the gap between foundation models and domain-specific applications by creating methodologies that make LLMs more aligned, efficient, and controllable. Whether in academic mentorship or applied research leadership, I hope to contribute tools and insights that shape the next generation of robust NLP systems.