Toward Automating the Summarization of Cancer Pathology Reports Using Large Language Models to Improve Clinical Usability.
Liu Y, John J, Sarkar S, Zakkar A, Kinkopf P, Teo PT, Abazeed ME
Abstract
Purpose Reviewing pathology reports requires physicians to integrate complex histopathologic, immunohistochemical, and molecular findings from multiple reports and institutions, often under time constraints that increase the risk of error and fatigue. Large language models (LLMs) offer a potential solution by generating concise, coherent summaries from complex pathology data. Methods Patients who underwent initial consultation in a thoracic clinic between January 2019 and July 2023 were included. Original pathology reports and corresponding physician pathology summaries from consultation notes were extracted and anonymized. Six open-source LLMs (Llama 3.0, Llama 3.1, Llama 3.2, Mistral, Gemma, and DeepSeek-R1) generated pathology summaries directly from the original reports. Objective and subjective evaluations were performed using the original reports as the ground truth. LLM-generated summaries were compared with physician summaries for correctness, completeness, and conciseness. Additional subjective assessments with multiple evaluators were conducted for Llama 3.1. Results Ninety-four cases met the eligibility criteria. Using the original pathology reports as the ground truth, the LLM-generated summaries achieved higher scores across all objective evaluation metrics compared with physician pathology summaries ( P P = .017, P P P P = 1.000). The results remained consistent in additional subjective analyses involving multiple evaluators for Llama 3.1. Conclusion LLM-generated summaries demonstrated better performance in objective metrics and greater completeness in subjective evaluations compared with physician summaries. These results highlight the potential of LLMs as valuable tools for enhancing clinical documentation and workflow efficiency in oncology practice.