Artificial intelligence applications in neurology: an umbrella review

Abstract

Artificial intelligence (AI) is being widely applied in the medical field, and neurology is no exception. In this review, we aimed to systematically analyze the applications of AI in neurological subspecialties, including stroke, dementia, movement disorders, neuro-oncology, epilepsy, multiple sclerosis, neuromuscular disorders, headache, and neurocritical care. This umbrella review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. Both authors independently searched PubMed and Scopus databases up to May 31, 2025. English-language systematic reviews and meta-analyses related to adult neurology and its subspecialties were included. A Measurement Tool to Assess Systematic Reviews, version 2 guidelines were used to assess the quality of the analyzed studies. A total of 58 studies were included, most of which were related to stroke. The majority of them were of low and critically low quality. The main AI applications in neurological subspecialties encompassed analysis of brain imaging, disease diagnosis and classification, as well as outcome and prognosis prediction. The reported accuracies in model predictions ranged from 70% to over 90%, but with substantial concerns for methodological flaws and reproducibility. AI is being widely applied in neurological research, although the lack of model external validation and sparsity of high-quality, diversified datasets hinder broader AI implementation into neurology.

Introduction

Applications of artificial intelligence (AI) to the medical field have been gaining increasing attention recently. Bibliometric analyses have demonstrated an explosive trend in AI-related published studies since 2019.¹ Dedicated medical journals^2,3 or journal sections⁴ devoted to AI in medicine have emerged. Hospitals adopt large language models (LLMs) in an unprecedented way for a variety of tasks.⁵ Neurology as a medical field is no exception. Historically, neuroscience is considered to have inspired the invention of artificial neural networks,⁶ and the neurological community understands the need for the adoption of AI into their daily practice, minding ethical, safety, and equity risks.⁷ To facilitate navigation in the complex glossary of terms and techniques used in the AI field, several neurological journals have published “primers” on AI.^8,9 Others focused on narrative^7,10 or scoping¹¹ reviews. To systematically analyze the current impact of AI on neurology, we performed an umbrella review on this topic.

Methods

The review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines¹² (Figure 1) and methodology established specifically for umbrella reviews,¹³ as opposed to any other type of reviews (scoping or narrative), to identify and synthesize systematic reviews.

Search strategy and eligibility criteria

The authors independently searched the PubMed database for relevant studies up to May 31, 2025, using the following search terms: (“artificial intelligence”[MeSH Terms] OR (“artificial”[All Fields] AND “intelligence”[All Fields]) OR “artificial intelligence”[All Fields]) AND (“neurology”[MeSH Terms] OR “neurology”[All Fields] OR “neurology s”[All Fields]) AND “review”[Publication Type]; and the Scopus database using the search term: “TITLE-ABS-KEY (artificial AND intelligence AND neurology) AND ALL (review).” Forward and backward citation searching of the included studies was performed to identify additional reviews not captured during the database searches. Only English-language articles were included. The titles and abstracts were screened using the following inclusion criteria: systematic reviews and meta-analyses, as well as articles related to adult neurology and its subspecialties, including ischemic and hemorrhagic stroke, epilepsy, multiple sclerosis (MS), dementia, movement disorders (including Parkinson disease [PD]), neuromuscular diseases, headache, neurocritical care, and neuro-oncology. Systematic reviews were defined as reviews in which specific search criteria were applied. Narrative and scoping reviews (without specified systematic search criteria), original papers, bibliometric analyses, book chapters or conference papers, and articles in languages other than English were excluded.

Assessment of study quality

The assessment of study quality was performed using A Measurement Tool to Assess Systematic Reviews, version 2 guidelines.¹⁴ This tool has been selected, as it is specifically designed for critical appraisal of systematic reviews, which include nonrandomized studies.¹⁴ The methodological quality of the studies was rated as high, moderate, low, or critically low, based on the identification of critical and noncritical flaws.

Data extraction

The manuscripts were manually reviewed, and pertinent information was extracted and summarized. We focused on reporting: 1) sensitivity: a ratio of true positive predictions to all positive instances; 2) specificity: a ratio of true negative predictions to all negative instances; 3) accuracy: a ratio of correct machine learning (ML) model’s predictions; 4) area under the curve (AUC): a plot of true positive and false positive ratio across different thresholds; and 5) F-score: a harmonic mean of precision (positive predictive value) and recall (sensitivity). Where appropriate, other indicators, such as the interclass correlation coefficient (ICC) or Dice coefficient, were reported. It should be noted that metrics such as accuracy can be misleading in the context of class imbalance, which is why we prioritized reporting imbalance-robust metrics, such as AUC, sensitivity, and specificity, where available. We did not plan quantitative analysis of the manuscripts due to the expected high heterogeneity of the data. We also extracted reference lists from all manuscripts and crosschecked them to determine the citation overlap of primary studies, using the corrected covered area (CCA) metric.¹⁵ Methodological papers (reporting guidelines, quality assessment tools) were excluded from the overlap calculations.

Results

A total of 58 studies were eligible for further analysis. The most relevant findings are summarized in Table 1. There was a slight citation overlap (CCA, 1.86%). Of the 3432 primary studies (unique citations), 258 (7.5%) appeared in at least 2 reviews.

**Table 1**. Summary of the most relevant findings
Neurological subspecialty	Relevant artificial intelligence applications
Stroke	Detection of early ischemic lesions on CT¹⁶; Detection of LVO¹⁹; CT perfusion assessment¹⁷; Detection of stroke¹⁸ and DWI/FLAIR mismatch¹⁹ on MRI; ICH detection²⁶; Prediction of stroke hemorrhagic transformation,²⁴ cerebral edema,²⁷ and clinical outcomes³⁰
Dementia	Prediction of conversion of MCI to AD³⁴; Distinguishing AD from healthy controls or MCI patients⁴⁰; Differentiating dementias, particularly FTLD⁴³; Aiding the caregivers in their daily tasks⁴¹
Movement disorders	Differentiating PD patients from healthy controls⁴⁴ and atypical parkinsonian syndromes⁴⁵; Video-based assessment of movement disorders⁴⁶; Cognitive impairment prediction in PD⁴⁷; STN detection before DBS⁴⁸; Classification of hyperkinetic movement disorders⁴⁹
Neuro-oncology	Interpretation of histopathological slides⁵⁰; Differentiation of glioma, lymphoma, and metastasis⁵¹; Prediction of cognitive decline after radiation⁵⁵
Epilepsy	Epileptiform discharge detection⁵⁶; NLP-based extraction of EHR data⁵⁷; Prediction of antiseizure medication response⁵⁸ and surgical outcome⁵⁹
MS	Diagnosis of MS⁶³; MR lesion segmentation⁶³; MS classification⁶¹; Prediction of conversion of CIS to MS, cognitive outcome, and disability⁶⁴
Neuromuscular disorders	ALS diagnosis, classification,⁶⁶ and prognosis⁶⁷; EMG signal classification⁶⁸; Muscle segmentation and classification of myopathies based on muscle ultrasound and MRI⁶⁹
Headache	Extraction of data from EHRs⁷⁰; Headache diagnosis and classification⁷¹; Incident headache prediction⁷⁰
Neurocritical care	Prediction of neurological outcome following cardiac arrest⁷³
Abbreviations: AD, Alzheimer disease; ALS, amyotrophic lateral sclerosis; CIS, clinically isolated syndrome; CT, computed tomography; DBS, deep brain stimulation; DWI, diffusion-weighted imaging; EHR, electronic health record; EMG, electromyography; FLAIR, fluid-attenuated inversion recovery; FTLD, frontotemporal lobar degeneration; ICH, intracranial hemorrhage; LVO, large-vessel occlusion; MCI, mild cognitive impairment; MRI, magnetic resonance imaging; MS, multiple sclerosis; NLP, natural language processing; PD, Parkinson disease; STN, subthalamic nucleus

Stroke

As many as 17 systematic reviews were related to stroke. Of those, 6 discussed ischemic stroke detection on imaging,^16-21 4 hemorrhagic transformation prediction,^22-25 1 intracranial hemorrhage (ICH) detection,²⁶ 2 cerebral edema prediction,^27,28 3 stroke outcome prediction,^29-31 and 1 identification of time from symptom onset.³²

In terms of article quality, 4 works were considered of high, 1 moderate, 6 low, and 6 critically low quality (Supplementary material, Table S1).

The classical approach to stroke detection on computed tomography (CT) images is the automated Alberta Stroke Program Early CT Score (ASPECTS)³³. This method showed moderate (ICC, 0.54) and good (ICC, 0.72) reliability between automated and expert readings, and between automated predictions and the reference standard, respectively.¹⁶ This translated into mean (range, 45%–98%) sensitivity of 68% and mean (range, 57%–95%) specificity of 81%.¹⁷ A more novel approach includes AI-based analysis of CT perfusion scans, with the accuracy above 80%,¹⁷ as well as the analysis of magnetic resonance imaging (MRI), which demonstrated a pooled sensitivity and specificity both amounting to 93%, with half of the studies showing a low risk of bias.¹⁸ Interestingly, the time from stroke symptom onset could be inferred based on imaging with 79% accuracy.³² In turn, stroke with unknown time from onset (diffusion-weighted imaging / fluid-attenuated inversion recovery mismatch) was detected by AI with sensitivity and specificity of 85% and 84%, respectively.¹⁹ In another task relevant to mechanical thrombectomy, large vessel occlusion automatic detection, ML models demonstrated up to 85% sensitivity.¹⁷ Notably, the Viz.ai model (Viz.ai Inc., San Francisco, California, United States) showed 96% specificity in this task, which significantly improved all workflow metrics; however, it did not have an impact on patients outcomes.²⁰ Regarding the more difficult problem of the occlusion of M2 segment of the middle cerebral artery, AI platforms were equally specific (97%), but not very sensitive (64%) across 8 heterogenous studies.²¹

Concerning stroke hemorrhagic transformation, ML outperformed traditional models, demonstrating overall median AUC of up to 0.91^22,23 and 0.95 in patients undergoing thrombolysis.²⁴ For automated ICH detection, the accuracy ranged from 81% to almost perfect, 99%.²⁶ In turn, cerebral edema was predicted by ML models with the AUC of 0.84,²⁷ whereas malignant edema, with the AUC of 0.94.²⁸

With respect to clinical outcome prediction, the accuracy of AI models was good, with the AUC reaching 0.92 for algorithms using radiomics-based features.³⁰ More specifically, the outcome after mechanical thrombectomy was predicted with a slightly lower pooled AUC of 0.85.³¹

Dementia

We identified 11 systematic reviews related to dementia. Of those, 1 was of general scope,³⁴ 3 focused on progression from mild cognitive impairment (MCI) to Alzheimer disease (AD),^35-37 4 dealt with neuroimaging,^38-41 1 covered the assistance to caregivers,⁴² 1 discussed neuropsychiatric symptoms,⁴³ and 1 was related to frontotemporal lobar degeneration (FTLD).⁴⁴

In terms of article quality, none were deemed high quality. One was deemed moderate, 4 low, and 6 critically low (Supplementary material, Table S1).

The majority of the studies were based on the Alzheimer’s Disease Neuroimaging Initiative datase⁴⁵; only about half of them used a hold-out test set, and only 17 out of 92 articles performed an external validation.³⁴ About 67% of the studies used imaging alone, whereas almost all used imaging in conjunction with other parameters, such as demographics, comorbidities, laboratory, genetic, neurophysiological, neuropsychological, and ophthalmological examinations, as well as acoustic and semantic speech parameters. Imaging was performed mostly with MRI, either feeding the whole image into the model (here 3-dimensional rather than 2-dimensional data) or extracting features from voxel-based (volume) or vertex-wise (surface morphometry) analysis. Interpretability of the models by clinicians was achieved by ranking the features or visualization of brain regions contributing to the output (class-activation mapping). Not unexpectedly, the brain region which was consistently reported as most informative in classifying AD vs healthy patients was the hippocampus.³⁴

Regarding the progression of MCI to AD, the use of ¹⁸F-fluorodeoxyglucose positron emission tomography (¹⁸F-FDG-PET) or cognitive measures were the most important factors that improved model’s performance. Interestingly, the type of algorithm (most frequently support vector machine [SVM]) or the dataset size did not influence model’s performance.³⁵ The accuracy of published models was in the range of 66.1%–96.3%. However, most studies had a high risk of bias.³⁷

With regards to discrimination between AD vs healthy controls and AD vs MCI patients based on neuroimaging, accuracies up to 91% and (balanced) 83%, respectively, have been reported.⁴¹ More recent analysis of the same task using vision transformers demonstrated similar accuracy (pooled AUC of 0.92), although with a substantial heterogeneity across the studies.³⁹ The task of phenotyping AD from other dementias was performed with the accuracy of up to 97%. Specifically, Wu et al⁴⁴ found that FTLD could be distinguished from healthy and AD patients with pooled sensitivity of 86% and 84%, and pooled specificity of 89% and 81%, respectively.

Of equal importance is the use of AI technology to aid the caregivers of patients with dementia. Such tools include social or assistive robots that facilitate social interaction and help with daily tasks, smart home environment that ensures safety, and educational programs that provide cognitive stimulation. There are also models that may predict falls or detect incorrect dressing events with accuracies ranging from 23% to 98%.⁴² Unfortunately, the majority of the studies included in this review were qualitative, and the major identified gap was the lack of systematic design and evaluation of new technologies in everyday life of a patient with AD.

Movement disorders

In this subspecialty, we identified 6 systematic reviews, of which 2 discussed PD diagnosis and parkinsonian syndrome differentiation,^46,47 1 video analysis of movement disorders,⁴⁸ 1 cognitive impairment prediction in PD,⁴⁹ 1 subthalamic nucleus (STN) localization for deep brain stimulation (DBS) procedures,⁵⁰ and 1 focused specifically on hyperkinetic movement disorders.⁵¹

In terms of article quality, 2 were considered of moderate, 2 low, and 2 critically low quality (Supplementary material, Table S1).

With respect to PD diagnosis, most studies focused on differentiating between PD and healthy controls. In this task, AI models achieved up to 100% accuracy; however, an alarming 80% of the studies failed to pass minimal quality standards of AI reporting. The major reasons for that were: circular reasoning (inclusion of a modality which was used to stratify patients into the model), data leakage, data imbalance, and a lack of feature importance reporting or external validation.⁴⁶ Concerning the differentiation between PD and parkinsonian syndromes, ¹⁸F-FDG-PET seemed the most promising with the AUC of up to 0.98.⁴⁷ Moreover, video-based assessment of parkinsonian symptoms including tremor, gait (also freezing of gait), dyskinesia, and hypomimia achieved moderate or good results.⁴⁸ With reference to cognitive impairment, ML models commonly utilized both clinical and neuroimaging features, attaining an AUC of 0.83.⁴⁹ Interestingly, various ML models have been used to detect STN before DBS, with the hidden Markov model achieving the best result of diagnostic odds ratio of 838.⁵⁰ Finally, the classification of hyperkinetic movement disorders, including ataxia, dystonia, or chorea, using ML (with features including accelerometer, imaging, video, and electrophysiology data) has also been the subject of a systematic review.⁵¹ The accuracies of detection ranged from 54% to 100%; however, there were no studies with external validation, and only 5 out of 55 had a low risk of bias.

Neuro-oncology

We identified 6 systematic review articles devoted to neuro-oncology, mainly concerning classification and grading of brain tumors. One was related exclusively to histopathological diagnosis,⁵² 1 examined models combining imaging and histopathology,^53,54 2 discussed imaging only,^55,56 and 1 was related to prediction of cognitive functioning after brain radiation.⁵⁷

In terms of article quality, 5 were of low and 1 of critically low quality (Supplementary material, Table S1).

In the assessment of histopathological slides, ML models were trained to detect specific pathology, such as microvascular proliferation (AUC, 0.99), quantify immunohistochemical staining (accuracy of up to 97%), or provide tumor classification (accuracy, 85%–100%). However, all examined studies displayed a high risk of bias.⁵² Interestingly, after pooling the results of histopathological and imaging studies, ML models achieved excellent metrics in differentiating glioma from lymphoma (AUC, 0.99), low- from high-grade glioma (AUC, 0.89), and primary from metastatic tumors (sensitivity, 0.89; specificity, 0.87).⁵³ Regarding radiation-induced cognitive decline, ML models had a high risk of bias and only modest performance (AUC, 0.78).⁵⁵

Epilepsy

In the field of epilepsy, we identified 5 systematic reviews. Surprisingly, only 1 article focused directly on epileptiform discharge detection,⁵⁸ whereas others discussed: natural language processing (NLP)-based data extraction for epilepsy research,⁵⁹ aid in predicting antiseizure medication (ASM) response,⁶⁰ and general patient management.⁶¹ One article reviewed unsupervised ML, which is characterized by the lack of external guidance or “labels,” letting the model learn from the data.⁶²

In terms of article quality, 1 was of moderate, 1 low, and 3 critically low quality (Supplementary material, Table S1).

Epileptiform discharge detection accuracy ranged from 74% to 97% on a level of 1 electroencephalography (EEG) window, depending on the architecture used. On the patient level, accuracies were lower (85%–90%).⁵⁸ Regarding other AI applications in epilepsy, an NLP-based approach has been shown to be feasible in extracting various information from electronic health records (EHRs), including the type of epilepsy (F-score of up to 0.86), the presence of psychogenic nonepileptic seizures (0.67–0.96), sudden unexpected death in epilepsy risk (0.86), and identification of surgical candidates (0.94).⁵⁹ In ASM response prediction, the models included clinical, imaging, and EEG-extracted features, and presented an AUC of 0.45–0.97.⁶⁰ Drug-resistant epilepsy was predicted with an AUC in the range of 0.76–0.83, whereas surgical outcome was determined with up to 96% accuracy.⁶¹ Unsupervised models also performed very well in seizure detection, prediction, signal propagation, as well as seizure localization and classification, with accuracies of over 90% for all but the latter task (for which the accuracy ranged from 80% to 90%).⁶²

Multiple sclerosis

We identified 5 studies related to multiple sclerosis. Two articles discussed general ML applications in MS,^63,64 1 the diagnosis,⁶⁵ 1 prognosis prediction,⁶⁴ and 1 biomarkers other than neuroimaging.⁶⁷

In terms of article quality, 1 was of low and 4 critically low quality (Supplementary material, Table S1).

The main applications of ML in the field of MS involve establishing the diagnosis, classifying disease subtypes, and predicting the outcome.⁶³ The ML models used are mainly decision trees and SVMs, and involve MRI-based features, followed by optical coherence tomography, blood and cerebrospinal fluid biomarkers, as well as neurophysiological studies.⁶⁵ Using this multimodal approach, the accuracy in diagnosing MS using ML ranged from 81% to 100%.⁶³ It is worth noting that MS lesion segmentation on MRI scans has been a topic of extensive research; however, as of the date of the last systematic review, it still appears to be a challenging task, in which ML models perform worse than human experts.⁶⁵ Using non-MRI–based biomarkers, the accuracy of MS diagnosis was slightly lower, but still above 90%.⁶⁷ The accuracy in subtype classification ranged from 71% to 96%.⁶³ In contrast, conversion from clinically isolated syndrome (CIS) to MS could be predicted with the accuracy of 65%–92% (cognitive outcome, 72%–82%; disability, 42%–79%).⁶⁴

Neuromuscular disorders

In this subspecialty, we identified 4 articles. Two examined amyotrophic lateral sclerosis (ALS),^68,69 1 classification of electromyographic (EMG) signals,⁷⁰ and 1 was of general scope.⁷¹

In terms of article quality, 3 were of low and 1 of critically low quality (Supplementary material, Table S1).

ML models used in ALS detection and classification were based on gait, EMG, and MRI data. In the tasks of ALS detection and classification, the pooled sensitivities were as high as 94.3% and 90%, respectively, and specificities, 98.9% and 92.3%, respectively.⁶⁸ However, there were concerns as to the methodological quality of the studies. In terms of ALS prognosis, only 1 out of 16 ML models reported an AUC of 0.78 in a model that utilized clinical characteristics to predict survival without tracheostomy or mechanical ventilation.⁶⁹ Regarding EMG signals, most studies relied on ML, with only 8% incorporating deep learning (DL) algorithms. Only 2 out of 51 studies classified signals at rest. Regrettably, although the reported accuracies ranged up to 100%, the methodological limitations rendered the existing models unable to be incorporated into clinical practice.⁷⁰ Other applications of ML in the field of neuromuscular disease included muscle ultrasound segmentation with accuracies up to 88%,⁷¹ and myopathy classification based on muscle ultrasound texture parameters (accuracy of 76%).⁷¹ In muscle MRI, ML models were used to estimate water and fat fraction from conventional MRI sequences, segment muscle tissue (the Dice coefficient of up to 0.88), and classify dystrophinopathies (accuracy of 91%–96%).⁶⁹

Headache

In this field, 3 systematic reviews were identified. One was of general scope,⁷² 1 was related to diagnostic tools,⁷³ and 1 focused on headache classification.⁷⁴

In terms of article quality, 1 was of low and 2 critically low quality (Supplementary material, Table S1).

As in epilepsy, 1 of the applications of AI (NLP) to headache is the extraction of data from EHRs, including headache frequency, with some further potential to differentiate migraine from cluster headache based on self-reported patients’ narrative.⁷² The most extensively studied applications of AI in the headache field are classification and diagnosis. Many digital tools are currently available, and their performance reaches 87%–90% in terms of concordance, sensitivity, and specificity.⁷³ Most classify different primary and secondary headaches, relying on questionnaires and diagnostic criteria (which might be problematic due to circular reasoning); however, some utilize data from MRI, magnetoencephalography, or EEG.⁷² Interestingly, AI has also been used to predict incident headaches, with a modest AUC of 0.62, as well as forecast treatment response, with the AUC ranging from 0.62 to 0.98.⁷²

Neurocritical care

In this field, only 1 critically low-quality systematic review was identified,⁷⁴ in which EEG together with EHR data were used to predict neurological outcome following cardiac arrest. The most commonly used ML technique was random forest with the AUC in the range of 0.8–0.97, whereas in the scope of DL—a convolutional neural network with the AUC of 0.7–0.92.⁷⁵

Discussion

In this study, we identified systematic reviews on the applications of AI in neurological subspecialties, such as stroke, dementia, movement disorders, neuro-oncology, epilepsy, MS, neuromuscular disorders, headache, and neurocritical care. We reviewed these applications, which included not only diagnosis, prognosis, imaging and, signal interpretation, but also data extraction from EHRs and caregiver support. Finally, we recognized the main obstacles in AI implementation—the lack of external validation and diversified datasets, which collectively compromise the generalizability of the models.

Given the rapid AI development and increased use, there have already been some attempts to narratively summarize AI applications in neurology. Some authors described the most popular tools,⁷ while others delved into more technical⁷⁶ or regulatory¹¹ details. In this manuscript, we adopted a different approach consisting in a methodological study of all systematic reviews on AI applications in neurology. Understandably, by doing so, we might have omitted emerging tools or solutions described in original studies, which is a limitation of this study. However, we have probably better captured the well-established trends, which have made their way to reviews.

Stroke is the subspecialty with the greatest number of systematic high-quality reviews. Most of them focused on imaging, and some described models used for many years by clinicians, incorporated into the stroke guidelines (such as the ASPECTS score). Importantly, several tools have already received regulatory clearance (eg, Food and Drug Administration’s approval of RapidAI [San Mateo, California, United States] or Viz.ai). Together with the widespread hospital adoption, this reflects the maturity of the field unmatched in other neurological subspecialties, which cannot claim the same number of systematic reviews and, consequently, AI applications. Perhaps EEG signal analysis is getting close to being incorporated into clinical practice; however, this was not captured by our umbrella review due to the novelty of this information.⁷⁷ Some of the described applications, vital to the neurological field, such as conversion from MCI to AD or CIS to MS, ALS prognosis, headache diagnosis, or PD differentiation, still remain in the research realm. The reason for it is most likely the fact that, although the reported accuracies in some tasks reached 90%–100%, most studies had a high risk of bias and inherent flaws, such as the lack of external validation or diversified datasets. Interestingly, even some systematic reviews that identified limitations in the reviewed studies were of low or critically low quality. This highlights the fact that the strength of the AI models lies not merely in achieving the highest metrics, but in comprehensive validation across diverse dataset using rigorous methodology. This cannot be achieved without collaboration between clinicians, AI engineers, and policymakers, grounded in the patient needs and guided by ethical frameworks that balance innovation with safety.

An interesting finding is the application of NLP techniques in neurology, which has been greatly eased by the advent of LLMs. NLP has been applied in data extraction from EHRs in a variety of fields,⁷⁸ and in neurology, its use seems highly justified for several reasons. Firstly, in some areas, such as headache or epilepsy, the diagnosis relies heavily on anamnesis; therefore, extraction and analysis of the patient’s history would be often the most contributive to the diagnosis (eg, whether the patient fulfills migraine criteria or not). In other subspecialties relying more on neurological examination, such as stroke, NLP can help extract information about stroke severity from clinical notes, achieving near-registry-level agreement.⁷⁹ Secondly, NLP models could assess the patient’s speech (after speech processing techniques had been applied), looking for grammatical errors, anomia, paraphasias, or errors that might suggest cognitive dysfunction. Such emergent applications include AD prediction using speech⁸⁰ and patient monitoring using multimodal data integration.⁸¹ Finally, neurological patients often require caregiver support as well as social, occupational, and rehabilitation arrangements. NLP models might better “understand” their functional status and needs, possibly facilitating further care or transitions in care.

Our umbrella review comprehensively identified and analyzed current systematically reviewed AI applications in neurology. As AI is gaining traction, novel applications emerge, such as smartphone and wearable-based telemonitoring systems for movement disorders,⁸² wearable seizure detection devices,⁸³ AI-driven radiogenomics for noninvasive molecular characterization and intraoperative guidance systems,⁸⁴ as well as uncertainty quantification, synthetic imaging,⁸⁵ and federated learning⁸⁶ in the field of MS. Another example of a novel, though not yet systematically reviewed technique in neuromuscular disorders, is computer vision for muscle strength quantification.⁸⁷ In neurocritical care, externally validated tools for intracranial pressure prediction have emerged, although their widespread implementation is hindered by integration challenges, data fragmentation, and limited clinician trust.⁸⁸ The absence of systematic reviews on these applications likely reflects their recent emergence rather than lack of clinical promise.

The limitation of the current review is that, although numerical citation overlap was low, we were unable to detect the overlap of populations included in the primary studies. This might have led to biased synthesis and overstatement of AI generalizability in neurology. Exclusion of non–English-language articles and narrative reviews might have caused publication and topic bias; however, we attempted to mitigate this by stratifying the findings by application type (diagnostic, prognostic, or interventional), neurological subspecialty, and data modality (imaging, neurophysiology, and laboratory data). Of note, some reviews reported only raw accuracy, which is error-prone in imbalanced datasets and can lead to erroneous conclusions when comparing ML models’ performance. Also, our reliance on systematic reviews, while methodologically rigorous, might underestimate the real-world impact of recently deployed AI technologies, given the rapidly evolving nature of the field.

Conclusions

To conclude, AI applications encompass the entire range of neurological subspecialties, but are most comprehensively utilized in the field of stroke. The use of AI in other subspecialties involves diagnosis establishment, classifying disease subtypes, prognosticating, and analyzing imaging and signals. Although the reported model metrics seem promising, most of the studies carry methodological limitations and bias. To facilitate real-world AI adoption in neurology, we propose the following framework: 1) standardization of reporting, including patient-level data partitioning, preregistration, and public availability of protocols; 2) external validation requirements on at least 1 independent external dataset; and 3) clinical outcome reporting including real-world metrics, time- and cost-effectiveness, and patient-centered outcomes.

Supplementary material.pdf

Correspondence to

Michał Błaż, MD, PhD, Department of Neurology, St. John Paul II Hospital, ul. Prądnicka 80, 31-202 Kraków, Poland, phone: +48 12 614 27 32, email: m.blaz@szpitaljp2.krakow.pl

Received

September 2, 2025.

Revision accepted

January 15, 2026.

Published online

January 22, 2026.

Acknowledgments

None.

Funding

None.

Contribution statement

MM: data acquisition and analysis, writing the original draft. MB: data acquisition and analysis, conceptualization, study design, writing the original draft. Both authors edited and approved the final version of the manuscript.

Conflict of interest

None declared.

AI statement

Artificial intelligence was not used in the preparation of this manuscript.

How to cite

Michalska M, Błaż M. Artificial intelligence applications in neurology: an umbrella review. Prz Lek Jagiellonian Med Rev. 2026; 78: 20028. doi:10.20452/jmr.2026.20028

1.: Xie Y, Zhai Y, Lu G. Evolution of artificial intelligence in healthcare: a 30-year bibliometric study. Front Med. 2025; 11: 1505692.Crossref
2.: Kahn CE. Artificial intelligence, real radiology. Radiol Artif Intell. 2019; 1: e184001.Crossref
3.: Paton C. Welcome to BMJ Digital Health & AI. BMJ Digit Health. 2025; 1: e000004.Crossref
4.: Kohane IS. Injecting artificial intelligence into medicine. NEJM AI. 2024; 1.Crossref
5.: Zeng D, Qin Y, Sheng B, Wong TY. DeepSeek’s “low-cost” adoption across China’s hospital systems: too fast, too soon? JAMA. 2025; 333: 1866-1869.Crossref
6.: Zador A, Escola S, Richards B, et al. Catalyzing next-generation artificial intelligence through NeuroAI. Nat Commun. 2023; 14: 1597.Crossref
7.: Bösel J, Mathur R, Cheng L, et al. AI and neurology. Neurol Res Pract. 2025; 7: 11.Crossref
8.: Dumkrieger GM, Chiang C, Zhang P, et al. Artificial intelligence terminology, methodology, and critical appraisal: a primer for headache clinicians and researchers. Headache. 2025; 65: 180-190.Crossref
9.: Jing Yeo CJ, Ramasamy S, Joel Leong F, et al. A neuromuscular clinician’s primer on machine learning. J Neuromuscul Dis. 2025; 1: 22143602251329240.Crossref
10.: Au Yeung J, Wang YY, Kraljevic Z, Teo JTH. Artificial intelligence (AI) for neurologists: do digital neurons dream of electric sheep? Pract Neurol. 2023; 23: 476-488.Crossref
11.: Voigtlaender S, Pawelczyk J, Geiger M, et al. Artificial intelligence in neurology: opportunities, challenges, and policy implications. J Neurol. 2024; 271: 2258-2273.Crossref
12.: Sarkis-Onofre R, Catalá-López F, Aromataris E, Lockwood C. How to properly use the PRISMA Statement. Syst Rev. 2021; 10: 117.Crossref
13.: Lunny C, Pieper D, Thabet P, Kanji S. Managing overlap of primary study results across systematic reviews: practical considerations for authors of overviews of reviews. BMC Med Res Methodol. 2021; 21: 140.Crossref
14.: Shea BJ, Reeves BC, Wells G, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017; 358: j4008.Crossref
15.: Pieper D, Antoine SL, Mathes T, et al. Systematic review finds overlapping reviews were not mentioned in every other overview. J Clin Epidemiol. 2014; 67: 368-375.Crossref
16.: Adamou A, Beltsios ET, Bania A, et al. Artificial intelligence-driven ASPECTS for the detection of early stroke changes in non-contrast CT: a systematic review and meta-analysis. Neurointerv Surg. 2023; 15: e298-e304.Crossref
17.: Murray NM, Unberath M, Hager GD, Hui FK. Artificial intelligence to diagnose ischemic stroke and identify large vessel occlusions: a systematic review. J Neurointerv Surg. 2020; 12: 156-164.Crossref
18.: Bojsen JA, Elhakim MT, Graumann O, et al. Artificial intelligence for MRI stroke detection: a systematic review and meta-analysis. Insights Imaging. 2024; 15: 160.Crossref
19.: Offersen CM, Sørensen J, Sheng K, et al. Artificial intelligence for automated DWI/FLAIR mismatch assessment on magnetic resonance imaging in stroke: a systematic review. Diagnostics. 2023; 13: 2111.Crossref
20.: Sarhan K, Azzam AY, Moawad MHED, et al. Automated emergent large vessel occlusion detection using viz.ai software and its impact on stroke workflow metrics and patient outcomes in stroke centers: a systematic review and meta-analysis. Transl Stroke Res. 2025; 16: 2258-2271.Crossref
21.: Ghozy S, Azzam AY, Kallmes KM, et al. The diagnostic performance of artificial intelligence algorithms for identifying M2 segment middle cerebral artery occlusions: a systematic review and meta-analysis. J Neuroradiol. 2023; 50: 449-454.Crossref
22.: Issaiy M, Zarei D, Kolahi S, Liebeskind DS. Machine learning and deep learning algorithms in stroke medicine: a systematic review of hemorrhagic transformation prediction models. J Neurol. 2025; 272: 37.Crossref
23.: Wang Y, Zhang Z, Zhang Z, et al. Traditional and machine learning models for predicting haemorrhagic transformation in ischaemic stroke: a systematic review and meta-analysis. Syst Rev. 2025; 14: 46.Crossref
24.: Jiang YL, Zhao QS, Li A, et al. Advanced machine learning models for predicting post-thrombolysis hemorrhagic transformation in acute ischemic stroke patients: a systematic review and meta-analysis. Clin Appl Thromb Hemost. 2024; 30: 10760296241279800.Crossref
25.: Wang B, Jiang B, Liu D, Zhu R. Early predictive accuracy of machine learning for hemorrhagic transformation in acute ischemic stroke: systematic review and meta-analysis. J Med Internet Res. 2025; 27: e71654.Crossref
26.: Yeo M, Tahayori B, Kok HK, et al. Review of deep learning algorithms for the automatic detection of intracranial hemorrhages on computed tomography head imaging. J Neurointerv Surg. 2021; 13: 369-378.Crossref
27.: Deng Q, Yang Y, Bai H, et al. Predictive value of machine learning models for cerebral edema risk in stroke patients: a meta‐analysis. Brain Behav. 2025; 15: e70198.Crossref
28.: Shafieioun A, Ghaffari H, Baradaran M, et al. Predictive power of artificial intelligence for malignant cerebral edema in stroke patients: a CT-based systematic review and meta-analysis of prevalence and diagnostic performance. Neurosurg Rev. 2025; 48: 318.Crossref
29.: Yang Y, Tang L, Deng Y, et al. The predictive performance of artificial intelligence on the outcome of stroke: a systematic review and meta-analysis. Front Neurosci. 2023; 17: 1256592.Crossref
30.: Dragoș HM, Stan A, Pintican R, et al. MRI radiomics and predictive models in assessing ischemic stroke outcome—a systematic review. Diagnostics. 2023; 13: 857.Crossref
31.: Teo YH, Lim ICZY, Tseng FS, et al. Predicting clinical outcomes in acute ischemic stroke patients undergoing endovascular thrombectomy with machine learning: a systematic review and meta-analysis. Clin Neuroradiol. 2021; 31: 1121-1130.Crossref
32.: Feng J, Zhang Q, Wu F, et al. The value of applying machine learning in predicting the time of symptom onset in stroke patients: systematic review and meta-analysis. J Med Internet Res. 2023; 25: e44895.Crossref
33.: Barber PA, Demchuk AM, Zhang J, Buchan AM. Validity and reliability of a quantitative computed tomography score in predicting outcome of hyperacute stroke before thrombolytic therapy. ASPECTS Study Group. Alberta Stroke Programme Early CT Score. Lancet. 2000; 13: 1670-1674.Crossref
34.: Martin SA, Townend FJ, Barkhof F, Cole JH. Interpretable machine learning for dementia: a systematic review. Alzheimers Dement. 2023; 19: 2135-2149.Crossref
35.: Ansart M, Epelbaum S, Bassignana G, et al. Predicting the progression of mild cognitive impairment using machine learning: a systematic, quantitative and critical review. Med Image Anal. 2021; 67: 101848.Crossref
36.: Ahmadzadeh M, Christie GJ, Cosco TD, et al. Neuroimaging and machine learning for studying the pathways from mild cognitive impairment to Alzheimer’s disease: a systematic review. BMC Neurol. 2023; 23: 309.Crossref
37.: Vermeulen RJ, Andersson V, Banken J, et al. Limited generalizability and high risk of bias in multivariable models predicting conversion risk from mild cognitive impairment to dementia: a systematic review. Alzheimers Dement. 2025; 21: e70069.Crossref
38.: Bevilacqua R, Barbarossa F, Fantechi L, et al. Radiomics and artificial intelligence for the diagnosis and monitoring of Alzheimer’s disease: a systematic review of studies in the field. J Clin Med. 2023; 12: 5432.Crossref
39.: Mubonanyikuzo V, Yan H, Komolafe TE, et al. Detection of Alzheimer disease in neuroimages using vision transformers: systematic review and meta-analysis. J Med Internet Res. 2025; 27: e62647.Crossref
40.: Borchert RJ, Azevedo T, Badhwar A, et al. Artificial intelligence for diagnostic and prognostic neuroimaging in dementia: a systematic review. Alzheimers Dement. 2023; 19: 5885-5904.Crossref
41.: Wen J, Thibeau-Sutre E, Diaz-Melo M, et al. Convolutional neural networks for classification of Alzheimer’s disease: overview and reproducible evaluation. Med Image Anal. 2020; 63: 101694.Crossref
42.: Xie B, Tao C, Li J, et al. Artificial intelligence for caregivers of persons with Alzheimer’s disease and related dementias: systematic literature review. JMIR Med Inform. 2020; 8: e18189.Crossref
43.: Shah J, Rahman Siddiquee MM, et al. Neuropsychiatric symptoms and commonly used biomarkers of Alzheimer’s disease: a literature review from a machine learning perspective. J Alzheimers Dis. 2023; 92: 1131-1146.Crossref
44.: Wu Q, Kiakou D, Mueller K, et al. Boostering diagnosis of frontotemporal lobar degeneration with AI-driven neuroimaging - a systematic review and meta-analysis. Neuromage Clin. 2025; 45: 103757.Crossref
45.: Petersen RC, Aisen PS, Beckett LA, et al. Alzheimer’s Disease Neuroimaging Initiative (ADNI): clinical characterization. Neurology. 2010; 74: 201-219.Crossref
46.: Dzialas V, Doering E, Eich H, et al. Houston, we have AI problem! Quality issues with neuroimaging‐based artificial intelligence in Parkinson’s disease: a systematic review. Mov Disord. 2024; 39: 2130-2143.Crossref
47.: Zhao T, Wang B, Liang W, et al. Accuracy of ¹⁸F-FDG PET imaging in differentiating Parkinson’s disease from atypical parkinsonian syndromes: a systematic review and meta-analysis. Acad Radiol. 2024; 31: 4575-4594.Crossref
48.: Tang W, Van Ooijen PMA, Sival DA, Maurits NM. Automatic two-dimensional & three-dimensional video analysis with deep learning for movement disorders: a systematic review. Artif Intell Med. 2024; 156: 102952.Crossref
49.: Wu Y, Cheng Y, Xiao Y, et al. The role of machine learning in cognitive impairment in Parkinson disease: systematic review and meta-analysis. J Med Internet Res. 2025; 27: e59649.Crossref
50.: Inggas MAM, Coyne T, Taira T, et al. Machine learning for the localization of subthalamic nucleus during deep brain stimulation surgery: a systematic review and meta-analysis. Neurosurg Rev. 2024; 47: 774.Crossref
51.: Vizcarra JA, Yarlagadda S, Xie K, et al. Artificial intelligence in the diagnosis and quantitative phenotyping of hyperkinetic movement disorders: a systematic review. J Clin Med. 2024; 13: 7009.Crossref
52.: Jensen MP, Qiang Z, Khan DZ, et al. Artificial intelligence in histopathological image analysis of central nervous system tumours: a systematic review. Neuropathol Appl Neurobiol. 2024; 50: e12981.Crossref
53.: Silva Santana L, Borges Camargo Diniz J, Mothé Glioche Gasparri L, et al. Application of machine learning for classification of brain tumors: a systematic review and meta-analysis. World Neurosurg. 2024; 186: 204-218.e2.Crossref
54.: Cassinelli Petersen GI, Shatalov J, Verma T, et al. Machine learning in differentiating gliomas from primary CNS lymphomas: a systematic review, reporting quality, and risk of bias assessment. AJNR Am J Neuroradiol. 2022; 43: 526-533.Crossref
55.: Jekel L, Brim WR, Von Reppert M, et al. Machine learning applications for differentiation of glioma from brain metastasis—a systematic review. Cancers. 2022; 14: 1369.Crossref
56.: Sohn CK, Bisdas S. Diagnostic accuracy of machine learning-based radiomics in grading gliomas: systematic review and meta-analysis. Contrast Media Mol Imaging. 2020; 2020: 2127062.Crossref
57.: Tohidinezhad F, Di Perri D, Zegers CML, et al. Prediction models for radiation-induced neurocognitive decline in adult patients with primary or secondary brain tumors: a systematic review. Front Psychol. 2022; 13: 853472.Crossref
58.: Nhu D, Janmohamed M, Antonic-Baker A, et al. Deep learning for automated epileptiform discharge detection from scalp EEG: a systematic review. J Neural Eng. 2022; 19: 051002.Crossref
59.: Yew ANJ, Schraagen M, Otte WM, Van Diessen E. Transforming epilepsy research: a systematic review on natural language processing applications. Epilepsia. 2023; 64: 292-305.Crossref
60.: Abdaltawab A, Chang LC, Mansour M, Koubeissi M. How accurate are machine learning models in predicting anti-seizure medication responses: a systematic review. Epilepsy Behav. 2025; 163: 110212.Crossref
61.: Smolyansky ED, Hakeem H, Ge Z, et al. Machine learning models for decision support in epilepsy management: a critical review. Epilepsy Behav. 2021; 123: 108273.Crossref
62.: Tautan AM, Andrei AG, Smeralda CL, et al. Unsupervised learning from EEG data for epilepsy: a systematic literature review. Artif Intell Med. 2025; 162: 103095.Crossref
63.: Vázquez-Marrufo M, Sarrias-Arrabal E, García-Torres M, et al. A systematic review of the application of machine-learning algorithms in multiple sclerosis. Neurología (Engl Ed). 2023; 38: 577-590.Crossref
64.: Hartmann M, Fenton N, Dobson R. Current review and next steps for artificial intelligence in multiple sclerosis risk research. Comput Biol Med. 2021; 132: 104337.Crossref
65.: Nabizadeh F, Masrouri S, Ramezannezhad E, et al. Artificial intelligence in the diagnosis of multiple sclerosis: a systematic review. Mult Scler Relat Disord 2022; 59: 103673.Crossref
66.: Yousef H, Malagurski Tortei B, Castiglione F. Predicting multiple sclerosis disease progression and outcomes with machine learning and MRI-based biomarkers: a review. J Neurol. 2024; 271: 6543-6572.Crossref
67.: Hossain MZ, Daskalaki E, Brüstle A, et al. The role of machine learning in developing non-magnetic resonance imaging-based biomarkers for multiple sclerosis: a systematic review. BMC Med Inform Decis Mak. 2022; 22: 242.Crossref
68.: Umar TP, Jain N, Papageorgakopoulou M, et al. Artificial intelligence for screening and diagnosis of amyotrophic lateral sclerosis: a systematic review and meta-analysis. Amyotroph Lateral Scler Frontotemporal Degener. 2024; 25: 425-436.Crossref
69.: Xu L, He B, Zhang Y, et al. Prognostic models for amyotrophic lateral sclerosis: a systematic review. J Neurol. 2021; 268: 3361-3370.Crossref
70.: De Jonge S, Potters WV, Verhamme C. Artificial intelligence for automatic classification of needle EMG signals: a scoping review. Clin Neurophys. 2024; 159: 41-55.Crossref
71.: Piñeros-Fernández MC. Artificial intelligence applications in the diagnosis of neuromuscular diseases: a narrative review. Cureus. 2023; 15: e48458.Crossref
72.: Stubberud A, Langseth H, Nachev P, et al. Artificial intelligence and headache. Cephalalgia. 2024; 44: 03331024241268290.Crossref
73.: Woldeamanuel YW, Cowan RP. Computerized migraine diagnostic tools: a systematic review. Ther Adv Chronic Dis. 2022; 13: 20406223211065235.Crossref
74.: Daripa B, Lucchese S. Artificial intelligence-aided headache classification based on a set of questionnaires: a short review. Cureus. 2022; 14: e29514.Crossref
75.: Chen CC, Massey SL, Kirschen MP, et al. Electroencephalogram-based machine learning models to predict neurologic outcome after cardiac arrest: a systematic review. Resuscitation. 2024; 194: 110049.Crossref
76.: Rizzo M. AI in neurology: everything, everywhere, all at Once PART 2: speech, sentience, scruples, and service. Ann Neurol. 2025; 98: 431-447.Crossref
77.: Tveit J, Aurlien H, Plis S, et al. Automated interpretation of clinical electroencephalograms using artificial intelligence. JAMA Neurol. 2023; 80: 805-812.Crossref
78.: Hossain E, Rana R, Higgins N, et al. Natural language processing in electronic health records in relation to healthcare decision-making: a systematic review. Comput Biol Med. 2023; 155: 106649.Crossref
79.: Fernandes M, Westover MB, Singhal AB, Zafar SF. Automated extraction of stroke severity from unstructured electronic health records using natural language processing. J Am Heart Assoc. 2024; 13: e036386.Crossref
80.: Amini S, Hao B, Yang J, et al. Prediction of Alzheimer’s disease progression within 6 years using speech: a novel approach leveraging language models. Alzheimers Dement. 2024; 20: 5262-5270.Crossref
81.: Lee B, Song HJ, Park YJ, Kang BO. Multimodal Alzheimer’s disease recognition from image, text and audio. Sci Rep. 2025; 15: 29038.Crossref
82.: Adams JL, Kangarloo T, Tracey B, et al. Using a smartwatch and smartphone to assess early Parkinson’s disease in the WATCH-PD study. NPJ Parkinsons Dis. 2023; 9: 64.Crossref
83.: Newton TJ, Frankel MA, Tosi Z, et al. Validation of a discrete electrographic seizure detection algorithm for extended-duration, reduced-channel wearable EEG. Epilepsia. 2025; 66: 2433-2443.Crossref
84.: Evangelou K, Kotsantis I, Kalyvas A, et al. Artificial intelligence in the diagnosis and treatment of brain gliomas. Biomedicines. 2025; 13: 2285.Crossref
85.: Werthen-Brabants L, Dhaene T, Deschrijver D. The role of trustworthy and reliable AI for multiple sclerosis. Front Digit Health. 2025; 7: 1507159.Crossref
86.: Pirmani A, De Brouwer E, Arany Á, et al. Personalized federated learning for predicting disability progression in multiple sclerosis using real-world routine clinical data. NPJ Digit Med. 2025; 8: 478.Crossref
87.: Noteboom L, Hoozemans MJM, Veeger HEJ, Van Der Helm FCT. Feasibility and validity of a single camera CNN driven musculoskeletal model for muscle force estimation during upper extremity strength exercises: proof-of-concept. Front Sports Act Living. 2022; 4: 994221.Crossref
88.: Fong N, Feng J, Hubbard A, Dang LE, et al. Intracranial pressure prediction algorithm using machine learning (I-CARE): training and validation study. Crit Care Explor. 2023; 6: e1024.Crossref