Left ventricular ejection fraction assessment: artificial intelligence compared with echocardiography expert and cardiac magnetic resonance measurements

Mołek-Dziadosz, Patrycja; Woźniak, Aleksandra; Furman-Niedziejko, Anna; Pieszko, Konrad; Szachowicz-Jaworska, Joanna; Miszalski-Jamka, Tomasz; Krupiński, Maciej; Dweck, Marc; Nessler, Jadwiga; Gackowski, Andrzej

Original articles

Left ventricular ejection fraction assessment: artificial intelligence compared with echocardiography expert and cardiac magnetic resonance measurements

Patrycja Mołek-Dziadosz¹^,², Aleksandra Woźniak¹^,², Anna Furman-Niedziejko¹^,², Konrad Pieszko³, Joanna Szachowicz-Jaworska⁴, Tomasz Miszalski-Jamka⁴, Maciej Krupiński⁴, Marc R. Dweck⁵, Jadwiga Nessler¹^,², Andrzej Gackowski¹^,²
¹ Department of Coronary Artery Disease and Heart Failure, Jagiellonian University Medical College, Kraków, Poland

² Department of Coronary Artery Disease and Heart Failure, St. John Paul II Hospital, Kraków, Poland

³ Department of Interventional Cardiology and Cardiac Surgery, Collegium Medicum, University of Zielona Gora, Zielona Góra, Poland

⁴ Department of Radiology, St. John Paul II Hospital, Kraków, Poland

⁵ British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, Edinburgh, United Kingdom

DOI: 10.20452/pamw.17104

Published online: September 1, 2025.

Key words: artificial intelligence, cardiac magnetic resonance, echocardiography, left ventricular ejection fraction
CC BY-NC-SA 4.0

In this article

What's new?Introduction

Patients and methods

Results

Discussion

Supplementary material

Abstract

Introduction: Cardiac magnetic resonance (CMR) is the gold standard for assessing left ventricular ejection fraction (LVEF). Artificial intelligence (AI)-based echocardiographic analysis is increasingly utilized in clinical practice.

Objectives: This study aimed to compare the results of LVEF echocardiography (ECHO) assessed by experts and automated AI, with CMR as the reference standard.

Patients and methods: We retrospectively analyzed 118 patients who underwent both CMR and ECHO within 7 days. LVEF measured on CMR was compared with the results obtained from an AI‑based software, which automatically analyzed all stored digital imaging and communications in medicine loops (multiloop AI analysis) on ECHO. Additionally, AI results were repeated using only 1 best‑quality loop for 2‑chamber view and 1 for 4‑chamber view (single‑loop AI analysis) on ECHO. These results were further compared with standard ECHO analysis performed by 2 independent experts. Agreement was investigated using the Pearson correlation and Bland–Altman analysis as well as the Cohen κ and concordance for categorization of LVEF into subgroups (≤30%, 31%–40%, 41%–50%, 51%–70%, and >70%).

Results: Both experts demonstrated strong inter‑reader agreement (R = 0.88; κ = 0.77), and their assessment correlated well with CMR‑assessed LVEF (expert 1, R = 0.86; κ = 0.74; expert 2, R = 0.85; κ = 0.68). The results of the multiloop AI analysis correlated strongly with those of CMR (R = 0.87; κ = 0.68) and the experts (R = 0.88–0.9; κ = 0.77). The single‑loop AI analysis demonstrated numerically higher concordance with CMR‑assessed LVEF (R = 0.89; κ = 0.75) than the multiloop AI analysis and expert analysis.

Conclusions: AI‑based analysis showed similar LVEF assessment results as human expert analysis in comparison with CMR. AI‑based ECHO analysis is a promising approach, but its results should be interpreted with caution.

What's new?

This study is the first to compare 2 artificial intelligence (AI)-based strategies for echocardiographic assessment of left ventricular ejection fraction (LVEF), that is, fully automated multiloop analysis and expert‑guided single‑loop analysis, against both expert echocardiographers and cardiac magnetic resonance (CMR) as a reference standard, in a real‑life clinical population. Both AI approaches demonstrated strong agreement with CMR and expert readers; notably, the expert‑guided single‑loop method achieved the highest concordance. These results highlight the potential for AI to serve as a reliable copilot for LVEF assessment in real‑world settings, offering robust performance and greatest accuracy when expert input is available, while still supporting consistent assessment in scenarios where expert interpretation may be limited.

Mistakes made by artificial intelligence in the assessment of ejection fraction (EF); A, B – overestimation of EF due to foreshortening of the left ventricular apex and significant shortening of the longitudinal axis of the left ventricle during systole; C – end-systolic volume measured during an extrasystole (circle and arrow); D – a 3-chamber view misinterpreted as a 4-chamber view due to insufficient scan depth; E – foreshortening of the left ventricular apex due to suboptimal imaging; F – incorrect contouring of the end-diastolic volume due to poor endocardial visibility

Introduction

Left ventricular ejection fraction (LVEF) remains a critical parameter for decision‑making in cardiovascular clinical practice.¹ Cardiac magnetic resonance (CMR) imaging is the gold standard for LVEF quantification due to its superior LV myocardial border definition and 3‑dimensional volumetric capabilities.^2-5 However, echocardiography (ECHO) is the most accessible and widely utilized modality in clinical practice, despite known limitations in reproducibility and interobserver variability.³ Artificial intelligence (AI)-driven algorithms may improve echocardiographic analysis by enhancing measurement consistency, reducing variability, and increasing diagnostic accuracy.^4-6

A few previous studies indicated that AI assessment of LVEF may be helpful in clinical practice.^6-10 He et al⁷ showed in a single‑blind, randomized clinical trial that AI‑guided initial evaluation of LVEF was superior to sonographer‑guided initial evaluation, with less variability when compared with assessments by cardiologists. The proportion of ECHO examinations with a substantial discrepancy in LVEF (more than 5 percentage points) between the initial AI or sonographer assessment and the final cardiologist assessment was 16.8% in the AI group and 27.2% in the sonographer group.⁶ Moreover, it was shown that AI‑assisted assessment of LVEF improved the reliability and accuracy of LVEF assessment by level 1 readers (beginners).⁶ Limited data exist on the validation of AI‑based LVEF assessment against CMR as the gold standard. Sveric et al¹¹ compared a novel fully‑automated AI system for echocardiographic LVEF assessment using a modified biplane Simpson (MBS) method with CMR. Their findings showed a strong correlation between AI‑ECHO, MBS‑ECHO, and CMR (R = 0.89 for both AI and MBS vs CMR). Notably, AI‑ECHO exhibited excellent interobserver correlation (1), as compared with MBS‑ECHO (<⁠0.91), and demonstrated lower test‑retest variability (2.5% vs 7.9% for MBS‑ECHO).⁷

The aim of our study was to compare LVEF assessment in a real‑life population using AI and AI with human supervision against ECHO expert evaluation, using CMR as the gold standard reference. Additionally, we explored the potential prognostic value of different LVEF assessment methods for all‑cause mortality.

Patients and methods

Study population

We retrospectively analyzed scans from 134 patients admitted to the Department of Coronary Artery Disease and Heart Failure who underwent both echocardiographic and CMR examinations between 2015 and 2024. All included patients were adults and had a CMR scan performed within 7 days of their echocardiographic examination. The patients were included, if they had clinical indications for CMR, which mainly comprised suspected myocarditis, evaluation of heart failure (HF), and assessment of cardiomyopathies. Hemodynamically unstable patients, as well as those with a history of heart valve replacement or repair procedures, were excluded. We collected patient data on sex, age, height, and weight as key demographic parameters. Additionally, data on the prevalence and type of atrial fibrillation (AF; paroxysmal, long‑standing, or persistent) and atrial flutter (AFl) were recorded, along with the cardiac rhythm at the time of ECHO, categorized as sinus rhythm, AF, or AFl. The presence of previous myocardial infarction (MI), hypertension, hypercholesterolemia, type 2 diabetes mellitus (T2DM), chronic kidney disease, and obesity, and prior thromboembolic events were also documented. Furthermore, we classified pathologies of the LV into the following types: none, ischemic, postinflammatory, dilated, tachyarrhythmic, peripartum, hypertrophic, mixed, LV noncompaction, and idiopathic. Significant valvular diseases were assessed, including mitral regurgitation, tricuspid regurgitation, aortic stenosis, and aortic regurgitation, each graded as moderate or severe. Finally, we documented the presence of pacing leads or other implants. Mortality data for long‑term follow‑up were obtained from the Centre for Information Technology at the Polish Ministry of Digital Affairs, at an official data request.

The study was conducted at St. John Paul II Hospital, Kraków, Poland, in accordance with the Declaration of Helsinki, and was approved by the local Ethics Committee (118.6120.203.2023).

Echocardiography

Echocardiographic imaging and measurements were analyzed from retrospective, digital imaging and communications in medicine (DICOM) recordings acquired by cardiologists and cardiologists in training with at least 1 year of experience. All echocardiographic examinations were performed following standard clinical protocols using the EPIQ CVx or Affinity 75 devices (both Philips Healthcare, Andover, Massachusetts, United States), with the electrocardiogram tracing, and each loop comprised 3 cardiac cycles. All recordings were stored in the picture archiving and communication system (PACS) and downloaded retrospectively for blinded assessment. Among other standard views, the dataset included 1 or more image loops assessing LV contractility in both apical 4‑chamber (4C) and 2‑chamber (2C) views.

Artificial intelligence–based echocardiographic image analysis

All echocardiographic recordings acquired by the echocardiographers were analyzed using the Ligence Heart software, version 3.42.0 (Ligence, Vilnius, Lithuania). The analysis was performed in 2 consecutive steps to assess LV end‑diastolic volume (LVEDV), LV end‑systolic volume (LVESV), and LVEF, using the biplane Simpson formula from apical 4C and 2C views. In the first step, the software performed a fully automated analysis, in which all registered DICOM loops for a given patient were analyzed (multiloop AI analysis). This approach provided averaged values of biplane LVEF, based on all recorded cardiac cycles, ensuring a comprehensive assessment of LV function.

In the second step, an experienced echocardiographer manually selected the single optimal DICOM loop from the original PACS recording, ensuring the best image quality and minimal foreshortening. The selected loop was documented in the PACS system for subsequent manual analysis. The Ligence Heart software then reanalyzed this specific loop to obtain LVEDV, LVESV, and LVEF values, using the same biplane method from both apical 4C and 2C views (single‑loop AI analysis). For the single‑loop AI analysis, we documented the total number of excluded cardiac cycles and the reasons for their rejection, that is, poor quality (reduced visibility of a single ventricular wall, incorrectly acquired LV view, foreshortening of the LV, misleading view), taking 2C and 3C as a 4C view or taking 4C or 3C as a 2C view, presence of an appropriate view loop in PACS that was not utilized by the AI without known reason, higher‑quality loops available but not selected by the AI, and measurements performed on a ventricular ectopic beat.

Manual assessment of left ventricular ejection fraction

The manual evaluation of LVEF using the biplane Simpson formula was performed by 2 independent expert echocardiographers certified in ECHO by the European Association of Cardiovascular Imaging (EACVI). Both experts were blinded to previous echocardiographic results as well as to CMR findings to ensure unbiased assessment. The measurements were performed in both the apical 4C and 2C views. Additionally, biplane measurements values were recorded for each case.¹²

To assess the image quality, each loop was graded based on endocardial visibility using a 4‑point scale: very poor (LV endocardial border barely visible or invisible), poor (endocardial border partially visible with significant uncertainty in tracing), moderate (endocardial border mostly visible with minor uncertainties), good (clear and well‑defined endocardial border throughout the cycle).

Cardiac magnetic resonance assessment of left ventricular ejection fraction

CMR images were acquired using a 1.5 T scanner (Siemens Magnetom Avanto, Erlangen, Germany) with dedicated array coil. The obtained sequences included cine gradient echo (steady‑state free precision gradient echo technique, 8‑mm slice thickness) from the level of atrioventricular valves to the apex. Image analysis was performed with Syngo.VIA software version VB40 (Siemens, Erlangen, Germany) in accordance with the EACVI guidelines.¹³ The previously described CMR protocol¹² followed a standardized approach for anatomical and functional assessment, incorporating cine imaging in a short‑axis stack along with 2C, 3C, and 4C long‑axis views. LV volumes and masses were measured by precisely delineating the endocardial and epicardial contours of the myocardium in the short‑axis stack from the base to the apex of the heart at both end‑diastolic and end‑systolic phases. Papillary muscles and trabeculations were included in the LV volume contour. LVEF was calculated using the following formula: (EDv – ESv) / EDv × 100, where EDv denoted end‑diastolic volume and ESv end‑systolic volume, based on LV, ensuring high reproducibility.

Statistical analysis

Qualitative variables were expressed as median and interquartile range (IQR), and the medians between the groups were compared using the Wilcoxon rank sum test. The Shapiro–Wilk test was used to assess the normality of data distributions. The categorical variables were expressed as number and percentage, and the differences in frequencies were studied using the Fisher exact test. Continuous variables were expressed as median and interquartile range (IQR). Correlation coefficients were calculated using the Pearson correlation. The Bland–Altman plots were created for inter‑reader and interscan comparisons. The Bland–Altman statistics (bias with 95% CI and limits of agreement [LOA]) were calculated on a per‑patient basis. The mean absolute inter‑reader differences in LVEF (bias) between the ECHO scans analyzed by the experts and AI and CMR results were compared with the paired t test. P values below 0.05 were considered significant. A formal power analysis was conducted on 118 paired measurements, yielding 80% power to detect effect sizes of the Cohen d equal to or above 0.26, with an observed effect size of d = 0.074 between the compared methods. R studio version 2025, Posit Software, R version 4.4.3 (R Foundation for Statistical Computing, Vienna, Austria) was used for the analysis.

Results

Baseline characteristics of the patients

Clinical characteristics of the studied cohort are summarized according to the LVEF below 50% and equal to or above 50% assessed by CMR in Table 1. Median (IQR) age of the included patients was 54 (37–67) years, with 38 patients (32%) being women. The patients with LVEF below 50% on CMR more frequently had hypercholesterolemia and T2DM. At the time of ECHO and CMR examinations, 11 patients (9.3%) showed AF rhythm. No patients had pacing leads or other implants. Reduced LVEF (<⁠50%) had various etiologies, that is, idiopathic in 23 patients (40%), dilated cardiomyopathy in 9 (15.2%), ischemic in 8 patients (13.7%), mixed in 7 (12%), postinflammatory in 3 (5.2%), hypertrophic cardiomyopathy in 3 (5.2%), tachyarrhythmic in 2 (3.5%), peripartum in 2 (3.5%), and LV noncompaction in 2 (3.4%).

Table 1. Baseline characteristics of the analyzed patients

Characteristic	Overall (n = 118)^a	CMR LVEF <⁠50% (n = 59)^a	CMR LVEF ≥50% (n = 59)	P value
Data are presented as number (percentage) or median (interquartile range). Abbreviations: AF, atrial fibrillation; AFl, atrial flutter; AR, aortic regurgitation; AS, aortic stenosis; BMI, body mass index; CMR, cardiac magnetic resonance; LVEDV, left ventricular end‑diastolic volume; LVEF, left ventricular ejection fraction; MI, myocardial infarction; MR, mitral regurgitation; SR, sinus rhythm; TR, tricuspid regurgitation
Age, y	54 (37–67)	56 (40–66)	45 (34–66)	0.35
Women	38 (32)	16 (27)	22 (37)	0.21
LVEDV on CMR, ml	166 (129–238)	218 (174–303)	133 (106–156)	<⁠0.001
LVEF on CMR, %	50 (31–61)	31 (17–40)	61 (57–66)	<⁠0.001
BMI, kg/m²	26 (23.4–29.7)	25.8 (23.1–29.7)	26.7 (23.8–29.7)	0.73
AF	11 (9.3)	5 (8.5)	6 (10)	0.81
Previous MI	28 (24)	17 (29)	11 (19)	0.25
Heart failure	63 (53.4)	59 (100)	4 (6.8)	<⁠0.001
Hypertension	77 (57)	37 (55.2)	40 (60.6)	0.65
Hypercholesterolemia	94 (69.1)	53 (79.1)	41 (62.1)	0.05
Type 2 diabetes mellitus	25 (18.5)	15 (22.4)	10 (15.2)	0.01
Chronic kidney disease	19 (14.1)	10 (14.9)	9 (13.6)	>0.99
Obesity	83 (61.1)	41 (61.2)	42 (63.6)	0.91
Prior thromboembolic event	21 (15.6)	12 (17.9)	9 (13.6)	0.66
Valvular disease
Moderate MR	4 (3.4)	4 (6.8)	0	0.12
Severe MR	5 (4.2)	5 (8.5)	0	0.05
Moderate TR	4 (3.4)	3 (5.1)	1 (1.7)	0.61
Severe TR	4 (3.4)	3 (5.1)	1 (1.7)	0.26
Severe AS	2 (1.6)	1 (1.7)	1 (1.7)	>0.94
Moderate AR	1 (0.8)	0	1 (1.7)	0.28
Arrhythmia
Paroxysmal AF	18 (15)	11 (19)	7 (12)	0.33
Persistent AF	3 (2.5)	3 (5.1)	0	0.29
Permanent AF	4 (3.4)	2 (3.4)	2 (3.4)	0.95
Persistent AFl	1 (0.8)	1 (1.7)	0	0.93

Inclusion feasibility of the images in artificial intelligence–based analysis

Out of 134 patients, 118 were included in the multiloop AI analysis. In 16 cases (12%), LVEF measurements were not incorporated into the automatic analysis by AI. The most common reasons for excluding cine loops to select the optimal loop for single‑loop AI analysis were poor image quality, LV apical foreshortening, or misleading anatomical views (Figure 1). Notably, in some cases, multiple exclusion criteria coexisted for a single loop; detailed information is provided in Supplementary material, Table S1.

Intermodality concordance matrices and Bland–Altman plots for expert 1 and CMR (A) and expert 2 and CMR (B) LVEF assessment. The Bland–Altman plots are composed using the percentage of difference vs mean to better illustrate the agreement in LVEF range. LVEF categories: ≤30%, 31%–40%, 41%–50%, 51%–70%, and >70%
Abbreviations: see Table 1 — Figure 1 Mistakes made by artificial intelligence in the assessment of ejection fraction (EF); A, B – overestimation of EF due to foreshortening of the left ventricular apex and significant shortening of the longitudinal axis of the left ventricle during systole; C – end‑systolic volume measured during an extrasystole (circle and arrow); D – a 3‑chamber view misinterpreted as a 4‑chamber view due to insufficient scan depth; E – foreshortening of the left ventricular apex due to suboptimal imaging; F – incorrect contouring of the end‑diastolic volume due to poor endocardial visibility

Inter‑reader expert echocardiographic assessment

Among the 118 echocardiograms undergoing quality assessment analysis, the expert 1 rated 16 scans (13.6%) as very poor and 30 scans (25.4%) as poor, whereas the expert 2 graded 13 scans (11%) as very poor and 28 scans (23.7%) as poor. For higher‑quality images, the expert 1 classified 34 scans (28.8%) as good and 38 scans (32.2%) as very good, while the expert 2 assessed 54 scans (45.8%) as good and 23 (19.5%) as very good. The correlation between the expert assessment of LVEF was strong (R = 0.88). The interobserver concordance index and the Cohen κ coefficient between the experts were similar, at 0.73 and 0.77, respectively (95% CI, 0.69–0.84; Supplementary material, Table S2 and Figure S1). The results for both experts demonstrated a minimal bias of 0.18, with LOA ranging from –16.2 to 16.56 (Supplementary material, Table S2).

Intermodality assessment (echocardiography expert vs cardiac magnetic resonance)

The expert 1 and expert 2 assessment demonstrated a strong correlation with CMR‑derived LVEF results (R = 0.86 and R = 0.85, respectively). Intermodality agreement metrics favored the expert 1 against CMR (Figure 2A), over the expert 2 (Figure 2B; Supplementary material, Table S2). The Bland–Altman analysis showed a mild systematic overestimation of LVEF by both experts relative to CMR: the expert 1 showed a mean bias of 0.65% (LOA, –18.61 to 19.92), and the expert 2 of 0.48% (LOA, –19.61 to 20.56) in comparison with CMR (Supplementary material, Table S2). No differences in mean absolute LVEF differences were observed between the experts (P = 0.63) or between the expert 1 and CMR (P = 0.63).

Intermodality concordance matrices and Bland–Altman plots for single-loop AI analysis and CMR (A) and multiloop AI analysis and CMR (B) LVEF assessment. Bland–Altman plots are composed using the percentage of difference vs mean to better illustrate the agreement in LVEF range.
LVEF categories: ≤30%, 31%–40%, 41%–50%, 51%–70%, and >70%
Abbreviations: AI, artificial intelligence; others, see Table 1 — Figure 2 Intermodality concordance matrices and Bland–Altman plots for expert 1 and CMR (A) and expert 2 and CMR (B) LVEF assessment. The Bland–Altman plots are composed using the percentage of difference vs mean to better illustrate the agreement in LVEF range. LVEF categories: ≤30%, 31%–40%, 41%–50%, 51%–70%, and >70%
Abbreviations: see Table 1

Artificial intelligence vs echocardiography expert and artificial intelligence vs cardiac magnetic resonance variability

Multiloop artificial intelligence analysis

Multiloop AI analysis demonstrated robust correlation with reference standards for LVEF quantification, showing strong agreement with CMR (R = 0.87), expert 1 (R = 0.88), and expert 2 (R = 0.9). Inter‑reader agreement analysis revealed moderate concordance with human experts: the concordance index and Cohen κ between the multiloop AI analysis and the expert 1 were 0.73 and 0.74, respectively (95% CI, 0.65–0.82), and the values for the expert 2 were similar (concordance index = 0.73; κ = 0.76; 95% CI, 0.67–0.84; Supplementary material, Table S2). The Bland–Altman analysis indicated minimal systematic bias, with multiloop AI analysis underestimating LVEF relative to the expert 1 (0.62%; LOA, –17.1 to 15.9) and expert 2 (0.44%; LOA, –14.7 to 13.8; Supplementary material, Table S2, Figure S2A and S2B). Intermodality agreement with CMR yielded a concordance index of 0.62 and Cohen κ of 0.68 (95% CI, 0.59–0.75; Figure 3A, Supplementary material, Table S2). No differences emerged in mean absolute LVEF differences between the multiloop AI analysis vs CMR and expert 1 vs CMR (P = 0.42), and the multiloop AI analysis vs CMR, and expert 2 vs CMR (P = 0.51). A subgroup analysis in the patients with CMR‑derived LVEF below 30% confirmed strong agreement between the multiloop AI analysis and CMR, with a mean bias of +5.5% and narrow LOA (–6.1 to 17), indicating a minimal systematic difference and robust precision in this high‑risk cohort.

Comparison of mean absolute difference of LVEF comparing single-loop AI analysis and CMR vs multiloop AI analysis and CMR
Abbreviations: see Table 1 and Figure 3 — Figure 3 Intermodality concordance matrices and Bland–Altman plots for single‑loop AI analysis and CMR (A) and multiloop AI analysis and CMR (B) LVEF assessment. Bland–Altman plots are composed using the percentage of difference vs mean to better illustrate the agreement in LVEF range. LVEF categories: ≤30%, 31%–40%, 41%–50%, 51%–70%, and >70%
Abbreviations: AI, artificial intelligence; others, see Table 1

Single‑loop artificial intelligence analysis

The single‑loop AI analysis demonstrated a strong agreement with the reference methods, showing high correlation coefficients for LVEF quantification, as compared to CMR (R = 0.89), expert 1 (R = 0.89), and expert 2 (R = 0.92). Inter‑reader agreement between the single‑loop AI analysis and human experts yielded a concordance index of 0.75 with both experts (Supplementary material, Table S2). The Bland–Altman analysis showed a minimal systematic bias and narrow LOA, with single‑loop AI analysis slightly underestimating LVEF relative to the expert 1 (mean bias, –0.58%; LOA, –16.07 to 14.91) and expert 2 (mean bias, –0.33%; LOA, –13.18 to 12.51; Supplementary material, Figure S3A and S3B). The intermodality agreement between single‑loop AI analysis and CMR surpassed other comparisons, achieving a concordance index of 0.7 and Cohen κ = 0.75 (95% CI, 0.66–0.82; Figure 3B). The κ values obtained from the single‑loop AI analysis were numerically higher than those observed for the multiloop AI analysis vs CMR (κ = 0.68) and expert vs CMR pairings (expert 1, κ = 0.71; expert 2, κ = 0.73; Supplementary material, Table S2). Mean absolute LVEF differences were similar in the single‑loop AI analysis and either human expert analysis (vs expert 1, P = 0.45; vs expert 2, P = 0.95) or the multiloop AI analysis (P = 0.59; Figure 4).

Kaplan–Meier curves for mortality during follow-up with regard to LVEF assessed by Expert 1 CMR (A) and multiloop AI analysis (B)
Abbreviations: see Table 1 and Figure 3 — Figure 4 Comparison of mean absolute difference of LVEF comparing single‑loop AI analysis and CMR vs multiloop AI analysis and CMR
Abbreviations: see Table 1 and Figure 3

Source of measurement bias

When compared with CMR, both AI algorithms and the ECHO experts tended to overestimate LVEF in the low range of values, whereas in the group of patients with LVEF above 50%, they tended to underestimate it (P for bias <⁠0.001). Specifically, the mean absolute differences (low LVEF vs preserved LVEF) were: expert 1 vs CMR (4.53 vs –3.2), expert 2 vs CMR (4.87 vs –3.92), multiloop AI analysis vs CMR (4.22 vs –4.11), and single‑loop AI analysis vs CMR (4.4 vs –3.82). Interestingly, the mean absolute LVEF differences were similar when comparing the patients with very poor to poor visibility and those with good to very good visibility for any of the assessment methods (expert 1 vs CMR, P = 0.35 and expert 2 vs CMR, P = 0.44; multiloop AI analysis vs CMR, P = 0.56; and single‑loop AI analysis vs CMR, P = 0.71).

Predictive value of left ventricular ejection fraction assessment by cardiac magnetic resonance, experts, and artificial intelligence

During a median (IQR) follow‑up of 543 (220–2001) days, death occurred in 20 patients (16.9%; 5.1% per year). In a survival analysis, LVEF equal to or above 50% assessed by CMR was associated with lower overall mortality during follow‑up (hazard ratio [HR], 0.37; 95% CI, 0.14–0.95; P = 0.04). The same was found for LVEF equal to or above 50% in the multiloop AI analysis (HR, 0.33; 95% CI, 0.12–0.96; P = 0.042; Figure 5A and 5B). LVEF below 50% assessed by the experts and single‑loop AI analysis did not predict mortality in the survival analysis.

Figure 5 Kaplan–Meier curves for mortality during follow‑up with regard to LVEF assessed by Expert 1 CMR (A) and multiloop AI analysis (B)
Abbreviations: see Table 1 and Figure 3

Discussion

This study is the first to compare LVEF measurements obtained from AI alone and AI‑assisted human interpretation with CMR‑derived LVEF. We showed that both multiloop and single‑loop AI analysis exhibited strong correlations with CMR‑derived LVEF and expert assessments, with the latter achieving slightly higher agreement with CMR and lower proportional bias than the former. Moreover, we demonstrated that reduced LVEF detected on both multiloop AI analysis and CMR was potentially associated with an increased risk of mortality during follow‑up. These results highlight the potential of AI‑driven echocardiographic analysis as a copilot to enhance the reproducibility and efficiency of LVEF assessment, particularly in settings where expert interpretation may be limited.

The findings of our study align with prior research demonstrating the effectiveness of AI in echocardiographic LVEF assessment, while providing new insights into its application in clinical practice. Yamaguchi et al¹⁴ demonstrated that AI‑assisted LVEF estimates displayed on echocardiographic screens significantly improved reliability and accuracy of assessments by level 1 readers, reducing interinstitutional variability to levels comparable with expert readers—findings that correspond with our results showing robust reproducibility across different AI approaches. The WASE (World Alliance Societies of Echocardiography) COVID‑19 study¹⁵ showed that AI‑based analysis of LVEF had similar feasibility as manual analysis but minimized variability, and consequently increased statistical power to predict mortality, paralleling our findings of consistent performance across varying image qualities. Additionally, AI‑based measures have consistently shown lower inter‑operator variability of echocardiographic measurements, thereby improving consistency and reliability,¹⁵ which aligns with the high concordance indices observed for the AI methods and expert assessments in our analysis. Furthermore, studies have shown that AI‑assisted LVEF assessments provide highly reproducible estimations, as compared with standard ECHO measurements, regardless of the user experience level,¹⁶ supporting our findings regarding minimal bias in AI‑derived measurements. Our study uniquely extends these findings by comparing 2 distinct AI methodologies within the same patient cohort, and revealing that single‑loop AI analysis achieves slightly superior agreement metrics with CMR despite theoretical expectations favoring the multiloop AI analysis. This comparison has not been addressed in prior studies, making our findings particularly novel.

When comparing our findings with previous CMR validation studies, particularly by Sveric et al,¹¹ several important distinctions emerge. Sveric et al¹¹ compared a novel fully automated AI system for echocardiographic LVEF assessment with the MBS method and CMR, reporting high correlations for both methods with CMR (R = 0.89). Our correlation coefficients were similarly robust (R = 0.87–0.89), and our study evaluated a more diverse real‑world clinical population including patients with various cardiac pathologies, such as cardiomyopathies and valvular disease. Sveric et al¹¹ reported that their AI system slightly underestimated LVEF (bias, 2.2%) while MBS overestimated it (bias, –2.2%), which is consistent with our findings, where both AI methods showed a slight underestimation. However, our study pointed out that this bias pattern varies according to LVEF ranges, with larger absolute differences in the patients with reduced LVEF below 50% (4.22%) than in those with preserved LVEF (–4.11%; P <⁠0.001), which is an important clinical distinction not highlighted in the discussed work. Furthermore, while Sveric et al¹¹ demonstrated excellent interobserver correlation for AI‑ECHO (1) vs MBS‑ECHO (<⁠0.91) and lower test‑retest variability (2.5% vs 7.9%), our study is the first to compare 2 different AI methodologies within the same patient cohort, finding that single‑loop AI analysis achieves higher agreement with CMR (concordance index, 0.7 vs 0.62), despite theoretical advantages of the multiloop AI analysis. Our research represents the largest inter‑reader analysis conducted by 2 expert readers alongside AI methods and CMR, providing a more nuanced understanding of AI’s potential as a copilot in echocardiographic LVEF assessment. We also showed that LVEF below 50% assessed by both CMR and multiloop AI is potentially associated with lower overall mortality during follow up.

In our study, single‑loop AI analysis demonstrated slightly better concordance with CMR than the multiloop AI analysis, although there was no mean absolute LVEF difference between the AI methods. We believe that selecting a single, best‑quality loop for analysis is more appropriate for routine clinical practice than including all available loops to assess LVEF. We found that in the multiloop AI analysis, there were isolated cases where LVEF values deviated markedly from the CMR‑derived ones (absolute difference >10%). These outliers were typically due to erroneous inclusion of incorrect projections, such as misclassifying a 3C view as a 2C view, or the analysis of very poor‑quality images. This underscores the limitations of automated averaging in the multiloop AI analysis when suboptimal loops are included. On the other hand, it was the multiloop AI analysis that demonstrated a tendency to predict mortality during long‑term follow‑up, suggesting that averaging across multiple loops may have a prognostic value, especially in patients with arrhythmias or dynamic hemodynamics. Therefore, we recommend a hybrid strategy in which AI algorithms systematically evaluate all available loops and human experts exclude those with technical inadequacies (eg, apical foreshortening, poor endocardial border delineation) to retain only the best 2–3 high‑quality loops for final analysis. This approach could optimize accuracy by combining the efficiency of automation with selective quality control, minimizing errors from isolated poor‑quality acquisitions.

Limitations

Several limitations of our study must be acknowledged. First, technical limitations of AI‑based LVEF assessment included misinterpretation of extrasystoles and suboptimal view selection, indicating a need for improved rhythm detection and image recognition.^14,18 Second, the sample size was moderate, and further studies are needed to confirm our results. Third, the study population was too small to conduct a mortality analysis; therefore, the analysis should be considered exploratory. Fourth, the AI software we tested (Ligence Heart) has not been validated in nonspecific populations, such as patients with noncompaction cardiomyopathy.¹⁹ Fifth, although our 118‑patient study had limited power (12.5% for Cohen d = 0.074) to detect subtle LVEF differences, such as the nonsignificant comparison between the multiloop AI vs CMR and expert 1 vs CMR (P = 0.42), it was well‑powered (80%) for effects of d equal to or above 0.26 (equivalent to 2–3 LVEF points). The lack of significant findings suggests any true systematic variation is likely below this clinically meaningful threshold, supporting acceptable agreement for practical use. Finally, successful adoption will require standardization, validation across populations, and thoughtful integration into clinical workflows to avoid misinterpretation.¹⁹

Conclusions

We evaluated the performance of single‑loop and multiloop AI analysis for the assessment of LVEF in comparison with expert echocardiographic analysis and CMR. Both AI models demonstrated a strong correlation with CMR‑derived LVEF, as well as with expert assessments. The single‑loop AI analysis exhibited a slightly stronger agreement with CMR and expert interpretations than the multiloop AI analysis, with higher inter‑reader concordance indices and Cohen κ coefficients. Although both AI models tended to underestimate LVEF, as compared with expert assessments, the bias was minimal and within clinically acceptable limits. No significant differences were observed in the mean LVEF bias between expert assessments and AI‑derived measurements. These findings support the feasibility of AI‑driven echocardiographic analysis for LVEF assessment, highlighting the single‑loop AI analysis as a reliable tool for automated LVEF estimation. Further studies are warranted to assess its clinical utility in diverse patient populations and across varying image quality conditions.

SUPPLEMENTARY MATERIAL

Supplementary material.pdf

Download

ARTICLE INFORMATION

Acknowledgments: None.

Funding: This work was supported by the Jagiellonian University Medical College (grant no. N41/DBS/001453; to AG) and by the science fund of the St. John Paul II Hospital, Kraków, Poland (no. FN/11/2025; to AG).

Contribution statement: PM‑D and AG conceived the concept of the study. PM‑D, AG, and KP contributed to the design of the research. All authors were involved in data collection. PM‑D and KP analyzed the data. AG coordinated funding for the project. All authors edited and approved the final version of the manuscript.

Conflict of interest: None declared.

AI statement: Artificial intelligence model (ChatGPT) was used to improve and check English spelling and grammar in the manuscript.

References

McDonagh TA, Metra M, Adamo M, et al. 2021 ESC Guidelines for the diagnosis and treatment of acute and chronic heart failure: developed by the Task Force for the diagnosis and treatment of acute and chronic heart failure of the European Society of Cardiology (ESC) with the special contribution of the Heart Failure Association (HFA) of the ESC. Rev Esp Cardiol. 2022; 75: 523‑523. | Crossref
Parlati ALM, Nardi E, Marzano F, et al. Advancing cardiovascular diagnostics: the expanding role of CMR in heart failure and cardiomyopathies. J Clin Med. 2025; 14: 865‑865. | Crossref
Niedziela JT, Gąsior M, Budnik M, et al. Clinical pathways of patients with heart failure with preserved ejection fraction hospitalized for acute heart failure: insights from the National Multi‑Centre HF‑POL Registry. Kardiol Pol. 2024; 82: 1131‑1134. | Crossref
Orłowska‑Baranowska E, Nieznańska M, Marczak M, et al. Late gadolinium enhancement in aortic stenosis: is it an indication for surgical treatment in asymptomatic patients? Kardiol Pol. 2024; 82: 1211‑1219. | Crossref
Kozieł-Siolkowska M, Boidol J, Miszalski‑Jamka K, et al. Echocardiographic predictors of positive left ventricular remodeling in patients with a history of active myocarditis. Pol Arch Intern Med. 2024; 134: 16640. | Crossref