Original articles / Online first

Practical use case of natural language processing for observational clinical research data retrieval from electronic health records: AssistMED project

Cezary Maciejewski, Krzysztof Ozierański, Mikołaj Basza, Adam Barwiołek, Michalina Ciurla, Aleksandra Bożym, Maciej J. Krajsman, Piotr Lodziński, Grzegorz Opolski, Marcin Grabowski, Andrzej Cacko, Paweł Balsam
Published online: March 19, 2024

Abstract

Introduction: Electronic health records (EHR) contain data valuable for clinical research but in textual format, requiring encoding to databases by a human- a lengthy and costly process. Natural language processing (NLP) is a computational technique that allows text analysis.

Objectives: To demonstrate a practical use case of NLP for a large retrospective study cohort characterization and compare it to a human retrieval.

Patients and methods: Anonymized discharge documentation of 10314 patients from the cardiology tertiary care department was analyzed for inclusion in the CRAFT registry (NCT02987062) of patients with atrial fibrillation (AF). Extensive clinical characteristics regarding concomitant diseases, medications, daily dosage and echocardiography were collected manually and through NLP.

Results: There were 3030 and 3029 patients identified by human and NLP-based approaches, respectively, reflecting 99.93% accuracy of NLP in detecting AF. Comprehensive baseline patient characteristics by NLP was faster than human analysis (3 hours and 15 minutes vs 71 hours and 12 minutes). The calculated CHA2DS2VASc and HAS-BLED scores based on both methods did not differ (human vs NLP; median, IQR, P value): 3 (2–5) vs 3 (2–5) P = 0.74 and 1 (1–2) vs 1 (1–2) P = 0.63. For most data, an almost perfect agreement between NLP and human retrieved characteristics was found; daily dosage identification was the least accurate NLP feature. Similar conclusions on cohort characteristics would be made; however, daily dosage detection for some drug groups would require additional human validation in the NLP-based cohort.

Conclusions: NLP utilization on EHR may accelerate acquisition and provide accurate data for a retrospective study.

Full-text article available only as a pdf file for download

Download article